B.I.S.S Research White Papers

Big Data, Small Data, or Both?

Kevin Gray

Kevin Gray

I draw upon advanced methods from diverse fields such as Econometrics, Biostatistics, Psychometrics and Machine Learning extensively - for details see http://cannongray.com/methods.
Historically, marketing research has been mostly confined to consumer surveys, focus groups and in-depth interviews. At least this is the view of some in the business, and many marketing researchers are now struggling to get their heads around what’s now vaguely called data science. The terms big data, AI and machine learning are also used in hazy ways, which can add to their confusion.

By Kevin Gray and Koen Pauwels

Data Sources

Over the years clients, consultants, specialist agencies and academics have actually drawn upon many sources, as have “traditional” marketing researchers. For example, in some industries such as retailing, financial services, travel and hospitality, extensive customer level data have been available and used in decision making, as well as in the design of marketing research studies. The downside to a bounty of individual-level transaction data on current customers, though, is often a lower focus on understanding and reaching potential customers. Such tradeoffs continue to play a role in today’s connected world. 

Primary Vs. Secondary

Since big and small are relative to computer technology at a given point in time, perhaps a more useful distinction is primary versus secondary data. Primary data are collected for specific purposes related to particular research questions or decisions. Secondary data are collected for reasons not directly related to these questions or decisions. 

Data a client collects for operational purposes, for example, may be helpful for marketing, but there are nearly always important gaps in it. An obvious one is that similar data for competitors are not available. Social media data, in general, cannot be linked to detailed information about specific individuals and, therefore, can only provide a high-altitude perspective. These are just two examples, but secondary data can easily raise as many questions as it answers. 

Primary research can come to the rescue by plugging many of these gaps. There is synergy between primary and secondary data, however, and the latter, including published data from government sources, can help us design more rigorous and useful primary studies. So, in this sense, data from multiple sources should be a part of any marketing research project. 

Data Blending

What about data blending? This has also been called data integration and data fusion and, put simply, implies that data from more than one source are combined in a single data file for analytic purposes. These analytics may be descriptive analyses or more sophisticated analytics such as segmentation or marketing mix modeling. 

The simplest example of blended segmentation would be when a sample of customers are surveyed, and their answers merged with their transactional data. Marketing mix modeling requires data from many sources such as sales, competitor marketing activity, and sentiment scores from social media. Continuous tracking data from consumer surveys can also be included in these models. 

There are numerous other examples but, to reiterate, none of this is entirely new. The amount and variety of data have greatly expanded in the past decade as have our analytic capabilities, however. There are countless textbooks, publications and other sources one can consult for more details about both the IT and analytics sides of data science. 

Marketing is now more complicated too so, for any marketing research project, we need to be very clear about what questions we’re trying to answer and, just as importantly, make certain these are the questions we should be asking. Focus is necessary and smorgasbord research can squander precious time and budget and reinforce perceptions that marketing research just causes indigestion. Conversely, a very narrow focus can mislead by preventing us from seeing the bigger picture and encouraging short-term thinking. The best combination would be to actively look for diverse data sources to “triangulate the evidence” or, if you are into Popper, to “falsify the hypothesis.” 

Don’t Commit to Just One Source

Overreliance on any single data source also is hazardous, for several reasons. One reason, as noted earlier, is that there is no master data file that “tells it all.” Different data, including qualitative, may also tell us very different stories. A bias we should be especially wary of is confirmation bias, the tendency to search for, interpret, favor, and recall information in a way that affirms our prior beliefs or hypotheses. This is a natural human proclivity and not necessary irrational – it has survived millions of years of evolution, after all. 

However, increasing data overload and mounting pressure to make decisions more quickly can worsen this inclination. Making a bad decision faster does not make it a good decision, unfortunately. Instead, we recommend using all data which are potentially relevant to guide hypothesis building and testing. Multiple analytic approaches can also provide unique perspectives of the same data and, in fact, it is not unusual for statisticians to spend considerable time exploring data with a variety of methods, including multivariate analysis. 

It would be advantageous if firms incorporated and adapted some components of the lean start-up methodology. This would encourage them to conduct experiments that test hypotheses generated by any data or research, including qualitative. Hypotheses can also be assessed against data from new sources which may become available. Both discourage confirmation bias. Existing marketing programs should also be evaluated against data from various sources and, when appropriate, with blended data. 

Note that testing hypotheses and evaluating current policy does not imply being data-driven in the sense of letting software make our decisions for us. There seems to be some confusion about this in the marketing research and larger business communities. While certain tasks can be delegated to machines, marketing management still requires considerable human thinking and effort. The challenge is to make our decisions more scientific and less based on gut feel, while recognizing that human interpretation is part of science. After all, our clients are human and won’t be comfortable putting their company and/or their career on the line if they don’t understand how we arrived at our recommendations. 

Distraction by KPI

One common example of bad business practice which can stem from confirmation bias is closely monitoring KPIs that actually have no measurable connection to the bottom line. Logically and intuitively, they may seem important and perhaps have been plugged by business gurus but have never been tested against empirical evidence. Developing KPIs that really matter to the business requires a systematic process of trial-and-error. 

The illusion of control – the tendency for people to overestimate their ability to control events – is another important bias. On the one hand, managers should regularly update data and analyses to ensure that they capture changes and trends early so that managers can spot threats and opportunities. On the other hand, real-time metric updating is more likely to generate confusion than clarity, making it harder to see the forest through the trees. All too often, credit is taken (or blame assigned) for small movements in KPIs which are mere flukes.

Keep up to Date with Metrics

Managers should ask themselves how often they truly need to see metric updates. For instance, many brand health metrics only change slowly – with the exception of a sudden drop in case of crisis. This is similar to our cars’ engines – they typically run fine and their metrics don’t show up in the driver’s dashboard – with the exception of a red warning light when something is going wrong. Likewise, many marketing metrics should not take up valuable space on the opening page of the marketing dashboard, but can be flagged when they are going down. 

More sophisticated ways of analyzing data of any size and from any source can work for or against better decision-making. Managers and marketing researchers will need a greater degree of statistical literacy to better understand and interact with multiple data sources and interpret the results of advanced analytics. Communication, which is not cool-looking graphics and slick command of buzz phrases, is more crucial than ever. 

This has just been a snapshot of a big and complicated topic, but we hope you’ve found it interesting and helpful.


Kevin Gray is President of Cannon Gray, a marketing science and analytics consultancy. 

Koen Pauwels is Distinguished Professor of Marketing at Northeastern University.

This article was first published in GreenBook on October 29, 2019.

Kevin Gray

Kevin Gray

I draw upon advanced methods from diverse fields such as Econometrics, Biostatistics, Psychometrics and Machine Learning extensively - for details see http://cannongray.com/methods.