Tag Archives: jobs

Guest Post: Miko Matsumura on why Data Science is DEAD

This guest post provides an opposite opinion to John Foreman’s (See previous post). Miko Matsumura is a VP at Hazelcast who does not have a high opinion of data scientists in general, he writes: “You [data scientist] will be replaced by a placid and friendly automaton.” 

So what do you think? Are Data Scientists turning into poor substitutes for future software? Or will they constantly remain ahead of it in their ability to cohesively combine data, science, business and a host of other fields?

Data Science is Dead


Fun fact: nothing on this blackboard makes any sense.


Science creates knowledge via controlled experiments, so a data query isn’t an experiment. An experiment suggests controlled conditions; data scientists stare at data that someone else collected, which includes any and all sample biases.

Now, before you drag out the pitchforks: I’m not a query hater. You won’t see me standing outside the Oracle Open World conference with a sign that says “NO SQL” on it. Queries are fine. Smart people don’t always have the right answer, but they need to ask the right questions. Yes, building a query is like “forming a hypothesis,” but at that point we enter the realm of observational or “soft” science. Yes, by this standard, Astronomy and Social Sciences are also not sciences. I have no idea what Computer Science is, but no, it’s not a science either.

Oh what’s that? Your kind of “Data Science” includes things such as A|B Testing, and your “experiments” actually involve executing designs that affect the world? Allow me to retort: that’s not Data Science, that’s actually doing a job. You might have a job title like Product Management or Marketing. But if your job title is “Data Scientist,” you are effectively removing yourself from the actual creation of data.

I do sympathize. I appreciate that it’s no longer sexy to be a Database Administrator, and I guess the term “Business Analyst” is a bit too 1980’s. Slapping “Data Warehousing” on a resume is probably not going to land you a job, and it’s way down there with “Systems Analyst” on the cool-factor scale. If you’re going to make up a cool-sounding job title for yourself, “Data Scientist” seems to fit the bill. You can go buy a lab coat from a medical-supply surplus store and maybe some thick glasses from a costume shop. And it works! When you put “Data Scientist” on your LinkedIn profile, recruiters perk up, don’t they? Go to the Strata conference and look on the jobs board—every company wants to hire Data Scientists.

OK, so we want to be “Data Scientists” when we grow up, right? Wrong. Not only is Data Science not a science, it’s not even a good job prospect. In the immortal words of Admiral Akbar: “It’s a trap.”

These companies expect data scientists to (from a real job posting): “develop and investigate hypotheses, structure experiments, and build mathematical models to identify… optimization points.” Those scientists will help build “a unique technology platform dedicated to… operation and real-time optimization.”

Well, that sounds like a reasonable—albeit buzzword-filled—job description, no? There is going to be a ton of data in the future, certainly. And interpreting that data will determine the fate of many a business empire. And those empires will need people who can formulate key questions, in order to help surface the insights needed to manage the daily chaos. Unfortunately, the winners who will be doing this kind of work will have job titles like CEO or CMO or Founder, not “Data Scientist.” Mark my words, after the “Big Data” buzz cools a bit it will be clear to everyone that “Data Science” is dead and the job function of “Data Scientist” will have jumped the shark.

Yes, more and more companies are hoarding every single piece of data that flows through their infrastructure. As Google Chairman Eric Schmidt pointed out, we create more data in a single day today than all the data in human history prior to 2013.

Unfortunately, unless this is structured data, you will be subjected to the data equivalent of dumpster diving. But surfacing insight from a rotting pile of enterprise data is a ghastly process—at best. Sure, you might find the data equivalent of a flat-screen television, but you’ll need to clean off the rotting banana peels. If you’re lucky you can take it home, and oh man, it works! Despite that unappetizing prospect, companies continue to burn millions of dollars to collect and gamely pick through the data under respective roofs. What’s the time-to-value of the average “Big Data” project? How about “Never”?

If the data does happen to be structured data, you will probably be given a job title like Database Administrator, or Data Warehouse Analyst.

When it comes to sorting data, true salvation may lie in automation and other next-generation processes, such as machine learning and evolutionary algorithms; converging transactional and analytic systems also looks promising, because those methods deliver real-time analytic insight while it’s still actionable (the longer data sits in your store, the less interesting it becomes). These systems will require a lot of new architecture, but they will eventually produce actionable results—you can’t say the same of “data dumpster diving.” That doesn’t give “Data Scientists” a lot of job security: like many industries, you will be replaced by a placid and friendly automaton.

So go ahead: put “Data Scientist” on your resume. It may get you additional calls from recruiters, and maybe even a spiffy new job, where you’ll be the King or Queen of a rotting whale-carcass of data. And when you talk to Master Data Management and Data Integration vendors about ways to, er, dispose of that corpse, you’ll realize that the “Big Data” vendors have filled your executives’ heads with sky-high expectations (and filled their inboxes with invoices worth significant amounts of money). Don’t be the data scientist tasked with the crime-scene cleanup of most companies’ “Big Data”—be the developer, programmer, or entrepreneur who can think, code, and create the future.


With permission from Miko Matsumura, original post can be accessed here on Dice.

Guest Post: John Foreman giving hope for Data Scientists

John Foreman is a  chief data scientist at MailChimp and has done a lot of analytic work for large companies. He argues that a skilled data scientist’s work will cost more than $30 per hour.


The $30/hr Data Scientist

Yesterday a journalist asked me to comment on Vincent Granville’s post about the $30/hr data scientist for hire on Elance. What started as a quick reply in an email, spiraled a bit, so I figured I’d post the entire reply here to get your thoughts in the comments.When we ask the question, “Can someone do what a data scientist does for $30/hr?” we first need to answer the question, “What does a data scientist do?” And there are a multitude of answers to that question.


If by data scientist, we mean ” a person who can perform a data summary, aggregation or modeling task that has been well-defined for them in advance” then it is by no means a surprise that there are folks who can do this at a $30/hr price point. Indeed, there’ll probably come a day where that task can be completed for free by software without the freelancer. This is similar to the evolution of web development freelancing.The key phrase though is “task that has been well-defined.”

The types of data scientists who command large salaries seem to meet two very different definitions than what a freelancer at $30/hr can meet:

1) There’s the highly-technical engineer. Someone who is knowledgeable and skilled enough to select the correct tools and infrastructure in the polluted big-data landscape to solve a specific, highly-technical data problem. Often these folks are working on problems that haven’t been solved before or if they have there are only a few poorly documented examples. Because these tasks might not even be solvable, they’re certainly not “well-defined.” A business wouldn’t trust important bits of infrastructure to $30/hr.

2) There’s the data scientist as communicator/translator. This person is someone who knows data science techniques intimately but whose strength is actually in the nontechnical — this person thrives on taking an ambiguous business situation and distilling it into a data science solution. Often managers and executives don’t know what’s possible. They know what problems they have, but they don’t know how or even if data science can solve those problems. These folks can’t hire someone halfway across the globe at $30/hr to figure that out for them. No, they need someone who’s deeply technical but also deeply personable in the office to talk things through with them and guide them.

All of the hype around data science is generating a lot of these articles about automating or replacing the role. But

I think it’s important to realize that just like “doctor,” “lawyer,” “consultant,” “developer,” etc. the “data scientist” is more of a spectrum or category than a single role.A data scientist is not someone putting doors on an automobile in a factory. Some of them might be doing just that, i.e. rote modeling tasks. But not all of them. I believe that MOOCs will excel at teaching up an army of these lower-paid data scientists. And that’s great. They’ll fill a need. Kinda like the need in the 90s for people with basic COMPTIA certifications and the most basic of Cisco certs.

However, there will always be a place for those who excel at solving ambiguous technological & business problems. And they’ll cost more than $30/hr.


With permission from John Foreman, original post can be found on here on his blog.