I’ve already posted two arguments on Data Science and whether it’s worth going into (see here and here). Here, Ryan Swanstrom adds his two cents: the difference between what a statistician and what a data scientist does.
Data Science is more than just Statistics
I occasionally get comments and emails similar to the following question:
Should I attend a graduate program in data science or statistics?
I believe there is some concern about the buzzword data science. People are unsure about getting a degree in a buzzword. I understand that. However, whether the term data science lasts or not, the techniques in data science are not going away.
Anyhow, this post is not intended to argue the merits of the term data science. This post is about the comparison of statistics to data science. They are not the same thing. The approach to problems is different from the very beginning.
This is a common approach to a statistics problem. A problem is identified. Then a hypothesis is generated. In order to test that hypothesis, data needs to be collected via a very structured and well-defined experiment. The experiment is run and the hypothesis is validated or invalidated.
On the other hand, the data science approach is slightly different. All of this data has already been collected or is currently being collected, what can be predicted from that data? How can existing data be used to help sell products, increase engagement, reach more people, etc.
Overall, statistics is more concerned with how the data is collected and why the outcomes happen. Data science is less concerned about collecting data (because it usually already exists) and more concerned about what the outcome is? Data science wants to predict that outcome.
Thus, if you just want to do statistics, join a statistics graduate program. If you want to data science, join a data science program.
What are your thoughts? Agree/Disagree?
With permission from Ryan Swanstrom, original post can be found on his blog Data Science 101.