Tag Archives: guest post

Guest Post: Ryan Swanstrom on Stats and Data

I’ve already posted two arguments on Data Science and whether it’s worth going into (see here and here). Here, Ryan Swanstrom adds his two cents: the difference between what a statistician and what a data scientist does.

Data Science is more than just Statistics

I occasionally get comments and emails similar to the following question:

Should I attend a graduate program in data science or statistics?

I believe there is some concern about the buzzword data science. People are unsure about getting a degree in a buzzword. I understand that. However, whether the term data science lasts or not, the techniques in data science are not going away.

Anyhow, this post is not intended to argue the merits of the term data science. This post is about the comparison of statistics to data science. They are not the same thing. The approach to problems is different from the very beginning.

Statistics

This is a common approach to a statistics problem. A problem is identified. Then a hypothesis is generated. In order to test that hypothesis, data needs to be collected via a very structured and well-defined experiment. The experiment is run and the hypothesis is validated or invalidated.

Data Science

On the other hand, the data science approach is slightly different. All of this data has already been collected or is currently being collected, what can be predicted from that data? How can existing data be used to help sell products, increase engagement, reach more people, etc.

Conclusion

Overall, statistics is more concerned with how the data is collected and why the outcomes happen. Data science is less concerned about collecting data (because it usually already exists) and more concerned about what the outcome is? Data science wants to predict that outcome.

Thus, if you just want to do statistics, join a statistics graduate program. If you want to data science, join a data science program.

Thoughts/Questions

What are your thoughts? Agree/Disagree?

____________________

With permission from Ryan Swanstrom, original post can be found on his blog Data Science 101.

Advertisements

Guest Post: John Foreman giving hope for Data Scientists

John Foreman is a  chief data scientist at MailChimp and has done a lot of analytic work for large companies. He argues that a skilled data scientist’s work will cost more than $30 per hour.

 

The $30/hr Data Scientist

Yesterday a journalist asked me to comment on Vincent Granville’s post about the $30/hr data scientist for hire on Elance. What started as a quick reply in an email, spiraled a bit, so I figured I’d post the entire reply here to get your thoughts in the comments.When we ask the question, “Can someone do what a data scientist does for $30/hr?” we first need to answer the question, “What does a data scientist do?” And there are a multitude of answers to that question.

 

If by data scientist, we mean ” a person who can perform a data summary, aggregation or modeling task that has been well-defined for them in advance” then it is by no means a surprise that there are folks who can do this at a $30/hr price point. Indeed, there’ll probably come a day where that task can be completed for free by software without the freelancer. This is similar to the evolution of web development freelancing.The key phrase though is “task that has been well-defined.”

The types of data scientists who command large salaries seem to meet two very different definitions than what a freelancer at $30/hr can meet:

1) There’s the highly-technical engineer. Someone who is knowledgeable and skilled enough to select the correct tools and infrastructure in the polluted big-data landscape to solve a specific, highly-technical data problem. Often these folks are working on problems that haven’t been solved before or if they have there are only a few poorly documented examples. Because these tasks might not even be solvable, they’re certainly not “well-defined.” A business wouldn’t trust important bits of infrastructure to $30/hr.

2) There’s the data scientist as communicator/translator. This person is someone who knows data science techniques intimately but whose strength is actually in the nontechnical — this person thrives on taking an ambiguous business situation and distilling it into a data science solution. Often managers and executives don’t know what’s possible. They know what problems they have, but they don’t know how or even if data science can solve those problems. These folks can’t hire someone halfway across the globe at $30/hr to figure that out for them. No, they need someone who’s deeply technical but also deeply personable in the office to talk things through with them and guide them.

All of the hype around data science is generating a lot of these articles about automating or replacing the role. But

I think it’s important to realize that just like “doctor,” “lawyer,” “consultant,” “developer,” etc. the “data scientist” is more of a spectrum or category than a single role.A data scientist is not someone putting doors on an automobile in a factory. Some of them might be doing just that, i.e. rote modeling tasks. But not all of them. I believe that MOOCs will excel at teaching up an army of these lower-paid data scientists. And that’s great. They’ll fill a need. Kinda like the need in the 90s for people with basic COMPTIA certifications and the most basic of Cisco certs.

However, there will always be a place for those who excel at solving ambiguous technological & business problems. And they’ll cost more than $30/hr.

_______________________________________________

With permission from John Foreman, original post can be found on here on his blog.