B.I.S.S Research White Papers

Demystifying Data Science

Kevin Gray

Kevin Gray

I draw upon advanced methods from diverse fields such as Econometrics, Biostatistics, Psychometrics and Machine Learning extensively - for details see http://cannongray.com/methods.
Marketing scientist Kevin Gray asks Dr. Randy Bartlett of Blue Sigma Analytics what Data Science really is and how it can help decision-makers.

How would you define Statistics, IT, Data Science, & AI in simple, layperson’s terms? 

Statistics is the scientific collection, organization, analysis, and interpretation of data. IT involves the development, maintenance, and use of computer systems, software, and networks. Based upon the colloquial usage, I would define Data Science as a catch-all term for everything to do with data. DS = IT + Stat. AI is software to support and make decisions.  

We need to define the fields of application by their problems, rather than types of tools. There are two broad and distinct sets of data problems: managing the data (data management; IT Data Science) and extracting information from it (data analysis; Statistical Data Science). Supplying one name for two completely different fields has led to the misunderstanding that the skills, thinking, and software are compatible, even transferable. Instead and contrary to claims by talking heads, these fields are contradictory, yet complementary.  

What’s the difference between a Statistical Data Scientist and an IT Data Scientist?  

An Applied Statistician/Statistical Data Scientist collects, organizes, analyzes, and interprets data in the field. IT Data Scientists develop, maintain, and leverage computer systems, software, and networks. A Data Scientist does one or the other, and not both. 

How would you compare today’s mischaracterizations of statistics to those of the past? 

Today’s mischaracterizations about statistics are much more hateful than in the Six Sigma days and before. They are still born of the same lazy ignorance, as mischaracterizations of old.  

What sorts of aptitudes and skills do you need to work in Data Science? What are the best ways to become a Data Scientist?  

For a Statistical Data Scientist/Applied Statistician you need to think stochastically, instead of deterministically, and you need an understanding of the underlying theories — the obverse of theory is a set of assumptions.   

The best way to become a Statistical Data Scientist is to earn a quantitative degree and then work with applied statisticians in the field; read the applied books; learn the field of application; master the software; work toward a PSTAT; and build the required professional skills. Right now, we are seeing a flood of statistical malfeasance in the field because the software has become user friendly without corresponding protections. One tell-tale sign that there is a massive problem is the increase in statistical denial.  

The route to becoming an IT Data Scientist exhibits a preference for a computer science degree or similar and then working with other IT professionals in the field; reading the IT manuals; mastering the software; working toward a CAP; and building the required professional skills.   

Setting aside the hype about Data Science and Big Data, how much are data and analytics really used by business people to make decisions? Does this depend on the country, industry sector or other factors?

This is a great question! As discussed in chapters three and four of my book, corporations are strong at leveraging the less complicated data analyses and our capabilities get spotty as the complexity heightens.  

The ability to recognize and solve advanced statistics problems varies by country and by industry. Based on biased observations from my workshops, it is difficult to confirm the expectation that countries with more technically trained staff have an advantage. It is clear that the third world is watching this technology closely and is capable of learning quickly. Industries tend to master a few advanced statistical problems and not others. Manufacturing is strong at Quality Control, Process Control, and Design of Experiments. They do not use predictive modeling very often. Banking is great at predictive modeling and does not use Quality Control, Process Control, or Design of Experiments very often. For the most part though, management is too political to make fact-based decision making the overriding priority.   

It seems corporations are held back by two things: culture and integration. My book addresses how to change your culture and I spent a chapter (CH 6) explaining how to plan for and integrate data science/business analytics/applied statistics into the business. When I discuss this in workshops, I see that organizations have far to go.  

As for Big Data, shortly after Y2K, the cost of collecting and storing data became more affordable and many organizations collected large amounts of mostly ‘convenience data’ that they were denied in the past. This gradually took on the term, ‘Big Data.’ Though there was never a plan for how to use this data, it appears that there is promise for better leveraging it.  

Lastly, what impact do you think Artificial Intelligence, automation, and IoT will have on Data Science and the future in the next 10-15 years?  

I see AI as decision-making software. Chess computers are examples of early AI. Automation is the replacement of human work with machines or software. The evolution of automation has developed within factories for a very long time. The IoT is the networking of machines working cooperatively. This is the next phase in automation.  The internet is, perhaps, the first example; now many different types of machines will be added to ‘the collective.’   

All three have tremendous promise, both utopian and dystopian. On the utopian side, they can replace human toil, end hunger, etc. On the dystopian side, they might undermine equality, freedom, etc. AI should have much greater ramifications. Combining AI with automation and IoT gives it legs. All three should help grow both the IT and Statistical sides of decision science initially. In the distant future, AI will replace much of IT Data Science and then Statistical Data Science. However, by this time AI is set to replace just about everything else too.  

I believe the current trajectory is for AI to be used to advance private interests ahead of the greater public ones, and to oppress and manipulate large populations. One hopeful approach toward greater equality is to insist that AI be developed and transparently managed in the public sector. That said, technology can beget serendipity. Triremes and muskets unexpectedly enabled greater equality. By educating the world population, AI could discourage selfishness and conflict.        

Automation is replacing jobs faster than new ones can be created. The pace of this is accelerating. Our current consumer-driven crony capitalism depends on people buying things, often things that they do not need and cannot afford. Past consumption was facilitated by loaning money to consumers. Possible solutions include Guaranteed Minimum Income and a massive jobs program, as in FDR’s New Deal. Alternatively, there will be massive unemployment; a great deal of social unrest with a possible violent revolution; and a deepening of the current economic depression.   

IoT is set to remove human intervention from many activities. E.g., a robot combine in a field in Iowa gets a flat tire and calls the local tractor supply to have a robot truck diagnose and repair the problem. The combine pays with its credit card. Initially there will be grave dangers. We will employ IoT as soon as it is profitable and before we can anticipate many downstream ramifications—just like every technology before it.

‘The major cause of problems are solutions’

Eric Sevareid
Kevin Gray

Kevin Gray

I draw upon advanced methods from diverse fields such as Econometrics, Biostatistics, Psychometrics and Machine Learning extensively - for details see http://cannongray.com/methods.