Taking stock: What we have learned and where we are heading ?
Publication: Liber Amicorum in honor of Peter S.H. Leeflang.
Rijksuniversiteit Groningen 2011
A marketeer does not need market research to find out whether ex-factory volumes are increasing or decreasing. He needs market research to find out what his position is on the overall market, who is benefiting/suffering on the basis of current developments and what is causing these developments. All this in order to prepare to take adequate action.
Taking this into account, it is acceptable to think constructively about the extent to which
“non-probability river sampling” can be incorporated in the already impressive set of data gathering methodologies available. Far more experiments and scientific research will be necessary before river recruitment will be able to claim its place. But it is inevitable that this will be the way forward.
Cost-efficiency and the flexibility of river sampling in particular will boost its popularity when it comes to marketing decision-making in a transparent online world. This makes non-probability river sampling a future winner and as social scientists, we should not fight battles that we cannot win, but we should encourage users of river sampling based research data not to neglect the aspects of data quality. It is after all entirely in their own interest.
Generalisations about a defined universe are only valid when based on probability samples. This is one of the first lessons we learned concerning the statistical fundaments of market research as an applied science.
A probability sampling method can be defined as being any method of sampling that utilizes some form of random selection. In order to have a random selection method, it is necessary to set up some process or procedure, which ensures that all respondents within your population have known probabilities of being chosen (Trochim, 2000).
This may be true from a scientific perspective, but when it comes to commercially applied market research, we find that non-probability sampling has been widely used as an acceptable alternative to probability sampling.
The difference between non-probability and probability sampling is that non-probability sampling does not involve random selection and probability sampling does. This does not imply that non-probability samples are not representative of the population; it does mean however that non-probability samples cannot depend upon the rationale of the probability theory.
We expect the usage of non-probability sampling to increase dramatically, as a result of the substantial impact of internet technology on data collection options for market research.
In this article, we shall draw up an inventory of the potential suitability of river sampling based data collection.
River sampling is an online sampling method that recruits respondents by means of a survey invitation while he/she is engaged in some other online activity.
2 Coverage bias
Thirty years ago, Leeflang and Olivier (1980) showed empirical evidence that, despite tremendous efforts, both retail-audit research agencies and consumer-audit research agencies were not able to prevent or eliminate serious levels of coverage bias.
This empirical evidence compared panel audit figures with actual ex-factory figures and proved the existence of both undercoverage and overcoverage. Non-response was the main underlying factor causing this bias.
In consumer panels, the cheaper-priced products were over-represented and the high priced products were under-represented. In retail panels, an opposite relation was visible.
Further analyses revealed that it was not price that was causing the coverage bias.
The coverage bias was the consequence of existing non-response effects that occurred during the, probability sampling based, panel recruitment activities.
It transpired that the level of coverage bias could be estimated by means of marketing modelling. Leeflang and Olivier (1982) and further studies proved that the coverage bias issue could be improved by Lisrel-based modelling (Plat, 1988). However, as a result of the complexity of the used algorithms, this method of coverage correction failed to become popular among users of audit data.
It is a fact that even 30 years ago, major corporations often preferred the use of quota sampling when conducting surveys. In quota sampling, the sampling procedure selects people on a non-random basis according to a fixed quota (Trochim 2000).
The popularity of this non-probability sampling technique is based on the ability to select marketing based sub-groups at reasonable costs. Moreover, due to the homogeneity of the target population, it turned out that the results of quota sampling and probability sampling very often did not differ substantially when comparing research outcomes.
Each sampling method has its own set of shortcomings. Without doubt, the probability-based sample from the “PTT afgiftepuntenbestand” (Dutch mail delivery address file) with revisits after absence-based non-response is the best available sample selection method.
However, the selection costs of this procedure are high compared to other sampling alternatives and the increase in reliability is limited. Therefore, most often, it is simply not worthwhile pursuing this quality level.
Increased non-response at respondent level, and an increased interest of marketing in narrowly defined marketing target groups has rendered this method particularly outdated.
As stated, each of the other available sampling methods also has its own individual shortcomings.
A ‘street selection’ will easily lead to an under-representation of employed respondents and ‘random digit dialling’ by phone will lead to an under-representation of outgoing respondents, many of them being potential respondents in the younger age brackets. An “online approach” or a “mobile phone” approach can have the opposite effect.
The traditional solution is found in working with socio-demographic stratified sampling techniques, and to some extent this is an acceptable remedy. However, empirical data from the consumer audit study (Leeflang and Olivier, 1980) demonstrated that the representativeness of a participating group decreases exponentially when non-response levels increase.
Another interesting solution can be found in the usage of mixed mode approaches, both offline as well as online, in order to neutralize the shortcomings in each of the individual methods (De Leeuw and Hox, 2010).
3 Internet Surveys
From the outset, the market research industry has been keen to make use of internet technology as a data-gathering tool. Although some probability-sampling adherents argued that the research outcomes were only valid if the aim was to measure the characteristics or opinions within an online population.
Of course, this argument was valid as long as online penetration levels remained low, but it was clear penetration levels would boom in subsequent years.
A whole range of studies were published which demonstrated that many research outcomes did not differ significantly when telephone research was compared with similar online panel research (Walker et al., 2005).
Even the most positive adherents of “online internet surveys” realised there was little knowledge available about the validity of this new methodology, so these early adaptors preferred working on the basis of online panels in order to maximize the statistical comfort zone. Using an online panel makes solid probability sampling possible after a careful selection of panel members,
Moreover a panel approach offers the option to aim for improved representativeness by using refined stratified samples. Last but not least, it became possible to offer client response rates which could not be generated by telephone surveys (see also Stoop (2005)).
Whether or not this is of genuine benefit is highly questionable, as pointed out by De Leeuw (2009). The non-response when online panels are recruited is extremely high, and there is hardly any knowledge available with respect to the consequences of this initial non-response on panel characteristics.
The AAPOR (2010) established an “Opt-In Online Panel Task Force” and charged it with reviewing current empirical findings related to opt-in online panels utilized for data collection. Based on their findings, the task force developed an interesting set of recommendations.
One of the conclusion of the task force was that there are substantial and unexpected differences between research outcomes based on probability sampling and research outcomes, which are based on non-probability sampling. As a consequence, non-probability samples are not suitable for census research or any other research that aims to measure facts at a high accuracy level. However, AAPOR also states that in many other situations, non-probability sampling will be accurate enough. The AAPOR (2010) states the following:
“Not all survey research is intended to produce precise estimates of population values. For example, a good deal of research is focused on improving our understanding of how personal characteristics interact with other survey variables such as attitudes, behaviors and intentions. Non-probability online panels also have proven to be a valuable resource for methodological research of all kinds. Market researchers have found these sample sources to be very useful in testing the receptivity of different types of consumers to product concepts and features. Under these and similar circumstances, especially when budget is limited and/or time is short, a non-probability online panel can be an appropriate choice.”
We are put in mind of strategic-oriented survey monitors, which are conducted to find or follow market trends or research programs that are carried out in order to compare the outcome of marketing decisions with previously measured benchmarks, ergo: horses for courses.
The AAPOR taskforce also arrived at the conclusion that there are also significant differences in the composition and practices of individual panels, which can affect survey results. Those panels using probability-based selection procedure methods in particular are likely to be more accurate than those using non-probability-based methods. Other panel management practices such as recruitment sources, incentive programmes, and maintenance practices can also have major impacts on survey results. This conclusion is fully in line with the research outcomes published by Leeflang and Olivier (1980,1982).
4 Representativeness and socio-demographic variables
One could raise the question why measuring the representativeness of a sample is so often based on, census based, comparisons between the socio-demographic characteristics such as gender, age, and state.
The panels on which Leeflang and Olivier (1980) based their conclusions were examples of perfectly representative and probability-based samples. Nevertheless it turned out that more/other variables were needed to improve the reliability of the produced data.
With this knowledge, the recent suggestion of Huizing and Van Ossenbruggen (2007) was to use Propensity Sampling and Propensity Weighting as a powerful tool to improve representativeness.
Propensity sampling basically means that lots are drawn for every respondent to determine whether or not he or she will be included in a random sample. The chance of being chosen for the random sample varies per respondent. The probability is determined on the basis of the background characteristics of the respondent where norms and values become more important than the traditional socio demographic characteristics. These achieved samples are usually adjusted through weighting procedures which is called “propensity weighting”. This seems a logical suggestion. However it is good to realise that the earlier mentioned Lisrel-based weighting turned out to be too complex to be implemented by marketing experts and there is a serious risk that propensity weighting will also be too complex and confusing for the daily users of panel data.
We estimate that only, research agency based, corrections at the level of panel respondents and not at the level of research outcomes will stand a chance of becoming successfully implemented in the market place.
5 River Sampling as an alternative to online panels
River sampling is one of the many methods that utilises non-probability sampling.
At this moment there is no generally accepted classification for non-probability sampling methods, so various authors come to different overviews. Moreover the number variations in non-probability sampling methods seems limitless. For an interesting overview of non-probability sampling techniques, we refer to Trochim (2000). For an interesting overview of online sampling techniques we refer to Bradley (1999).
In river sampling, there are two non-probability sampling methods that are relevant:
Stratified River Sampling
River samples are samples that are created in real-time from online promotions using methods such as banners, pop-ups, and hyperlinks. The agency carefully selects the websites on the bases of available statistics about the visits/visitors of the website.
Convenience River Sampling
Links are placed on various websites without a prior analysis of the background of its visits/visitors. The main purpose is to collect maximum data at minimal cost and this is taken into account when selecting the feeder websites.
5.1 Stratified River Sampling
After the respondent’s self-recruitment they are redirected towards a portal, where they are screened for participation in a specific research project. Knowledge about each site‘s viewers and the response patterns of their visitors is a key piece of information needed for effective recruiting. Companies that want to conduct a river sampling based survey seldom have access to the full range of sites they need or the detailed demographic information regarding the visitors to those sites, and they are therefore forced to work through carefully selected intermediaries.
Additionally, when evaluating river sampling, the breadth, stability and relevance of promotions used should be assessed. This in addition to checks with respect to security and quota controls during the creation of the sample. Multiple participation must be prevented by using software able to process digital fingerprinting.
We do not expect it to be easy to reach acceptable levels of representativeness, but there is no reason to assume this is not feasible. The criticasters of today may well become reluctant future users. This scenario is highly likely. Especially if we accept that each research goal has its own minimum standards when it comes to sample representativeness, and realise that it is the client’s responsibility to decide.
Hitherto, there have been only a limited number of scientific studies available concerning the reliability of river-sample recruitment. A couple of commercial agencies (such as DMS Research) have published the results of validation studies. Although DMS claims to master the complex route of deriving valid samples from website selections, they also have the opinion that this is a proprietary technique, which is not open for publication.
To some extent, river recruitment can be compared to street recruitment: respondents on the digital highway are given the opportunity to participate in the research project.
The representativeness of street recruitment is reduced by non-response bias; the representativeness of river recruitment is biased by a self-selection bias, which can be seen as a specific form of non-response.
More important than the comparison between street selection and river recruitment is the question as to what extent online panel research findings are similar to the findings, which stem from river recruitment.
Brien at al., (2008) have published a validation study, which, in their opinion, demonstrates that river recruitment and panel recruitment render similar research outcomes. Does this contradict the conclusion from the AAPOR study? Not in the slightest. Most studies that investigated the differences between methodologies demonstrated that, although a remarkable number of outcomes are comparable, some research findings are different (Walker et al., 2005; Van Westerhoven, 1978).
We adhere to the viewpoint that it is of course extremely encouraging if 80% of survey results prove comparable. However, if there is no insight into when and why discrepancies occur and to what extent they occur, what remains can be considered little more than slumber-inducing pseudo-validity.
It is down to a researcher’s expertise to know where deviations between data can be expected, and to ascertain the cause of these dissimilarities.
The most interesting conclusions from the study by Brien et al. (2008) are those more technical differences, which indicate the accuracy with which the questionnaires are completed.
The most relevant outcomes are presented below in Table 1.
The main conclusion is that river-recruited respondents are less experienced in completing surveys, which means it takes them longer to fill out the survey. Many other variables, such as straight lining (rating all parts of a question with the same value) and inaccuracies were quite similar. The low level of thoughtful response in the CATI study illustrates the lower level of interest of the respondent to participate in a telephone survey.
5.2 Convenience River Sampling
The second method of online non-probability sampling is “convenience river sampling”.
As stated earlier, links are placed on various websites without a prior analysis of the background of its visitors. The main purpose is to collect maximum data at minimal cost. This method is often used for promotional purposes by non-researchers as a ‘do-it-yourself ’ research project, without the involvement of professional market research agencies or experts.
In order to boost participation rates, users are inclined to place the banners on topic-related sites. A well-known example was the research conducted by a Dutch body for combating alcohol abuse that wanted to gauge the excessive use of Bacardi Breezers among young graduates.
These results were convenience river sampling based, as links were placed on alcohol abuse-related sites. It illustrates how erroneous research findings can be, when carried out by non-professional organisations.
The self-selection bias was the reason behind this over-representation of alcohol/Breezer consumption of young graduates. The national newspapers ignored this bias, which was reported as technical background information, and widely published the research results.
Convenience river sampling will only lead to acceptable results if strict rules are taken into account. We refer to section 6.2 for a personal opinion with respect to some actual recent Dutch research projects.
6 The accuracy of research findings
It must be stated that the accuracy of research findings is jeopardized to a significantly greater degree by self-selection bias than by non-response bias and making use of convenience river sampling therefore limits the scope of research goals that can be achieved and can only be used under strict conditions.
This makes it essential to formulate a number of recommended rules, which must be taken into account when using a river sampling based technique.
6.1 Representativeness and universe
The first rule is to downscale the pretensions to more realistic levels. Convenience river sampling can never be representative for a the total population of a country or the average consumer. It is very tempting to ignore this rule, especially for journalists who are keen to publish scoops and have no idea about the looming research pitfalls.
A full overview of the websites from where respondents arrive when they log in to participate in the survey is an absolute must in order to get a grip on the potential biases, which are caused by this selection method. The information can be obtained either via a direct request, or by an automated hyperlink origin-tracing software application.
6.2 Self-selection bias by hyperlink origin
It is also very tempting to boost response by using topic-related websites in order to generate traffic. A good example of this is a recent study conducted by the ANWB. The Dutch equivalent of the AAA (the American Automobile Association). Hyperlinks that led to the website of the research agency were placed on the ANWB website and generated a substantial number of visitors. The combination of sample size and ANWB authority suggested that this was a representative research amongst Dutch inhabitants about the introduction of toll fees for using the motorway. Both the newspapers and the Minister of Transport were under this impression, and the ANWB had difficulty reducing the expectation levels of their survey to a more realistic level to the extent they would have liked.
It was clear that vehicle owners self-selected themselves as respondents in order to maximize political pressure.
Traffic-generating links can only be used if there is no bias-causing coherence between the sending and receiving website. In will be interesting for commercial decision support research in particular as to the balance between what is scientifically appropriate and what is commercially viable.
Even if this influence is expected to be none or minimal, and acceptable for its purpose, using traffic generating links on websites can only be done if it is possible to check the validity of this assumption.
In the table below we will give some examples of research setups which were actually used recently. The question whether a method is acceptable or not is often open for discussion and the outcome of the discussion depends on the estimate of the bias which results from the selection method and the goal of the research project.
Table 2 : Acceptability of recruitment methods for river sampling
6.3 Preventing multiple participation
It must not be made possible under any circumstances for a respondent to take part twice in a survey. It has been proved in particular that when surveys are short, external parties attempt to influence the outcome by way of multiple participations.
Using digital fingerprinting software offers a good method for preventing individual respondents disproportionately influencing research outcomes.
6.4 Target Group definition
When presenting a survey, it is relevant to know which segment of the total universe was selected as survey target group.
The consequence is that during the survey, part of the data collection must deal with the question of whether the respondent actually fits within the target group.
This becomes extremely relevant when part of the respondents receive an invitation to participate. If a survey is conducted among dentists, then participation should be solely by members of the dentistry profession.
6.5 Popularity polling
Popularity pollings are not suitable for an online convenience sampling approach.
In the Netherlands we have a case where the Bible won the 2005 Dutch Railways readership contest as the most popular book to read during travelling by train. The outcome was the result of a successful online snowballing campaign by the Dutch Bible Association with the aim to draw attention to a just published renewed translation of the Bible.
A well-known exception is the popularity contest, where snowballing and multiple participation are promoted as part of the contest. This is a contest and has nothing to do with research.
6.6 Dealing with self-selection
It is clear that online non-probability sampling leads to a serious degree of self-selection.
So, as a part of each survey, it must be clearly stated what policies were in place in order to reduce/prevent self-selection, and to what extent the validity of the research findings are endangered by the remaining self-selection bias.
We already mentioned the overview of feeder sites and the comparison between socio-demographic characteristics and census data, or other available reliable statistics.
We suggested that the validity of the research findings should be checked by a representative control sample. The sample size can be smaller and the questions asked are only used for validation or reweighting purposes.
This brings us back to the conclusion of Leeflang and Olivier (1980), that the published audit data can be seriously biased by the effects of both undercoverage and overcoverage.
For many users this was no reason to refrain from using audit data. If a user knows about the existence of this bias and knows about the where, when and why there is no reason why the usage of these data should be discontinued
We have reached a tipping point in marketing where internal process data have become more important than market research data. A marketeer does not need market research data to find out whether ex-factory volumes are increasing or decreasing. He needs market research to find out what his position is on the overall market, who is benefiting/suffering on the basis of current developments and what is causing these developments. This in order to prepare to take adequate action.
A marketeer will be satisfied as long as he is capable of doing this, by using data that is adequate. Adequate means that data must be current, affordable, solid, and familiar (Naert and Leeflang,1977). This does not jeopardise the relevance of market research data at all, but will change the character of market research projects drastically. Additional Speed is preferred above an accuracy overshoot.
Taking this into account, it is acceptable to think constructively about the extent to which
non-probability river sampling can be incorporated in the already impressive set of data gathering methodologies available. Far more experiments and scientific research will be necessary before river recruitment will be able to claim its place. But it’s inevitable that this must be the way forward. Cost-efficiency and the flexibility of river sampling in particular will boost its popularity when it comes to marketing decision-making in a transparent online world where decisions can be implemented instantly. This makes non-probability river sampling a secure future winner and as social scientists we should not fight battles that we cannot win, but we should encourage people not to neglect the aspects of data quality. It is after all entirely in their own interest.