Interview with BBC’s senior data architect: Jeremy Tarling

Last week I attended an interesting conference at UCL: ‘Taming the News Beast‘ with experts discussing different ways technology and the digital age can help journalists deal with the influx of data and text. You can find the liveblog of the event here.

Jeremy Tarling, senior data architect at the BBC, was there to discuss Storylines. I interviewed him afterwards on this project, coding and trainee journalists.

Listen to the highlights here, or read the full interview below.

Q: So first, can you give me a short description of what Storyline is and where it is in its production or creation?

A: The Storyline Ontology was actually a collaborative piece of work that we, the BBC took part in along with a newspaper, the Guardian; a wire service, the Press Association; a search engine, Google and ourselves as a broadcaster. The idea was that all four of those organisations represent the different types of organisations that would be interested in developing a data model for digital storytelling. The problem that the BBC was particularly interested in solving was: as we move from a world of long form articles online to something that’s more suitable for the mobile and tablet audience and for the social media using audience who refer bits of stories to each other; we wanted to find a way to make sure the bits of stories that people were sharing linked up through metadata or put into context perhaps I should say. Where the context would be the wider narrative arc so what came before, what came after, that kind of thing. And also to encourage journalists to make better use of short form content rather than always producing a long form article and then repeating most of the same things the day after.

Q: So it’s started already?

A: Yes, the project itself has been running now for about 6 months. During that time we’ve made some modifications to our content production system to allow semantic annotations with Storylines.

Q: What does that mean?

A: So for example we have a desktop content production system called CPS which is kind of like Dreamweaver, to create an article for the website, a type of tool used by journalists at the BBC. So we’ve added a new module to that tool that lets them search for current story lines or older ones. If they find any, they can attach them or tag their content with that storyline. Likewise, for topics, people, place and organisations. We’re doing the same with our video production tools as well. So we have a video production system called Jupiter- it’s based loosely on Final Cut Pro. And again, with the ability to search for current story lines and add them as metadata annotations to pieces of video.

Q: Correct me if I’m wrong, but you’ve written online that you’re not the best computer coder out there, that there might be mistakes in what you’re doing but it still works. How much coding do you know and how do you know it?

A: My background is actually not in pure computer science or even information science. My degree was in communications theory and I’ve always been a self taught programmer. My hacking language of choice is currently Ruby but it’s varied over the years. And I’ve been fortunate, in that the jobs I’ve been employed in have allowed me to develop my skills as part of my employment. But for a long time I’ve been attracted to semantic technology, ever since the early days of RDF and metadata as a way of publishing data online and linking it up. I was fortunate to be working for one of the first sites in the UK to make use of that publishing technology. It was called national curriculum online, it was funded by the department of education. It was an attempt to allow teachers to annotate teaching resources with sections of the national curriculum. Really, all I’ve done since is kind of repeat the same idea in different contexts.

Q: So it was basically learning on the job.

A: Yes, learning on the job, definitely.

Q: How important is coding for journalists, or data journalists?

A: I think it’s an emerging thing. There’s lots of talk in the industry about the importance of data journalism. And actually, if you look carefully, there isn’t a lot of good quality public data out there that’s the sort of thing that a journalist may get their teeth into and create their story from. So I think it’s not just about skill with coding or skill with statistics. There’s also the process side of data journalism which involves things like Freedom of Information requests, knowing how to get information out of public bodies, making information available in queryable form. If you can do those things then you can make your nice graphs and APIs and let people discover stories. I must admit, I’m a bit of a data journalism sceptic at the moment. In the sense that, I think, for all of the excitement about it as an emerging branch of journalism – the number of actual really good data journalism stories is still quite few. You know, you can look at the work the Guardian or Telegraph have done with Wikileaks or Snowden… is that actually data journalism or is that just a really great story with a data element to it?

Q: So how would you describe data journalism if it’s not a really great story with a data element?

A: I think data journalism is the subject of some debate. And different people use the term in different ways. I’ve heard data journalism used to describe the process of going after data, something more akin to investigative journalism. And that’s where the whole kind of FOI request and that sort of stuff comes into it. And then I think there is another school of thought that says data journalism is largely about statistics. It’s about a journalist being able to look at a large set of data and do some number crunching. Maybe some kind of statistical regression, those sort of things, to work out patterns in the data that can then be used as basis for a story.

Q: What’s your favourite current tool that you’ve used or created with regards to data to manipulate it, to play around with it? Do you have a program or an app that you would recommend?

A: No, I’m a bit of a hacker really, I don’t mean that in a bad sense, in the old school sense of the word hacker. My tools are things like the UNIX command line tools so things like scripting. I suppose, for a long time I’ve worked with Perl. And now, more recently with Ruby, but no, I don’t have any favourite kind of nice, user friendly apps.

Q: Any advice for young journalists entering the field?

A: I may be in a minority on this but my advice would be: don’t get too hung up on learning to be a programmer or developer. I mean, if you want to do that, great, that’s a career for sure, it probably pays better than journalism to be honest. But if you want to be a data journalist, the thing that drives it is good stories and an enquiring mind. And you can probably do quite well by pairing up with a statistician or a mathematician or someone that can do a little bit of coding for you, maybe graphing and those kind of things without necessarily investing a lot of time and money in training yourself up in coding courses, only to find that actually its good stories that sell. And good stories may be based on exciting data but those exciting data sets are few and far between. And probably the big guns are going to be going for them as well. I think it’s a challenging thing, I would be cautious of advising trainee journalists to kind of focus exclusively on technical skills in the interest of data journalism.





3 minutes with Jonathan Stray

Last week’s Polis journalism conference around transparency was a medley of leading men and women in the field. I managed to catch Jonathan Stray for a few minutes after his great talk with Lyra McKee and Paul Bradshaw on reducing the costs of investigations. Jonathan Stray, both journalist and computer scientist, is the founder of the Overview Project which helps journalists find stories.

Hear more on the Overview project, tips for starting journalists and what data can do here!

Taming the News Beast Liveblog!

The International Society for Knowledge Organisation (ISKO) is hosting Taming the News Beast at UCL.

Click here to see the liveblog of the event from 14:00 on the 1st April.

This event will discuss how to manage the information overload digital world offers. Leading practitioners will discuss the intersections with technology and news ways journalists can handle large quantities of text and data.

Speakers include Data Architect Jeremy Tarling from the BBC and Pete Sowerbutts from the Press Association.


13:00 ISKO UK AGM – for members – observers welcome
13:30 Registration for Taming the news beast
14:00 Welcome and introduction
Stella Dextre Clarke, ISKO-UK Chair
Helen Lippell, Meeting Chair
14.10 BBC News Labs – the Newsroom of Things
Matt Shearer, BBC
14.35 Storyline ontology
Jeremy Tarling, BBC
15.00 Taming the ABC News Beast – a case study
Rob Corrao, LAC Group
15.30 Refreshments
16.10 Text analytics for news and social media
Ian Roberts, University of Sheffield
16.45 Embedding semantics into content
Pete Sowerbutts, Press Association
17.15 Fishbowl: question/answer session
Audience-led discussion of issues arising from the presentations
17.45 Close of meeting
17.45 Networking, wine and nibbles in the Ramsay Lecture Theatre

Happy 25th Birthday Internet

Today, the internet begins it’s 26th year since it’s conception by Bernstein. I thought it was important to mention it since this is a data blog and there would be no such thing as a data scientist without the internet (or at least it would not mean what it means today).

How to… create a datawrapper graph

Datawrapper is a tool that helps you create interactive charts and maps while making you appear more data-savvy than the average excel user. This example will go through the creation of a graph using the Datawrapper site and the data from my previous post on alcohol related deaths.

  • First, obtain the data you wish to work with. For this example, I downloaded the excel file of alcohol -related deaths  from the National Office for Statistics.ons
  • Then, ‘clean’ the data you wish to use by deleting unnecessary columns or rows. For my third chart, I decided to create a new table with the total number of alcohol-related deaths for males, females and both for every year.
  • Copy the table of data you wish to visualise and enter it on the website after clicking the try it now button. There is no need to sign up although the option exists. Don’t edit the data even if it looks messy when pasted.Untitled
  • Once you upload and visualise, you can decide on the type of chart and title. Untitled 3
  • Finally, you need to publish and embed. A URL and code will be generated so you can create a hyperlink from your blog or embed it in your website. WordPress, unfortunately doesn’t allow embedding.Untitled 4

Alcohol-related deaths finally decreasing?

Alcohol. A life saving disinfectant, a solvent in perfumes, a possible fuel and most of all a very popular drink. However, half of teenagers in this country put themselves at risk by intoxication of alcohol at least once a month according to Professor David Nutt, British psychiatrist and neuropsychopharmacologist specialising in the research of drugs. He says: “It is the commonest reason for death in men over the age of fifty.”

Since 1992,  alcohol related deaths in the UK in both male and females were increasing. By 2004,the number of deaths had doubled compared to 1992. However, data from 2010 onwards suggest for the first time in two decades, alcohol related deaths are decreasing in both males and females.

This data was taken from the Office for National Statistics

male deaths

Click here to see graph!

alcohol female

Click here to see graph!



Click here to see the graph

“Alcohol numerically is responsible for considerably more harm in the population, certainly health related harm in the population, than other drug.” says Paul Wallace, professor of primary health care at University College London. If this trend continues downwards, it could save many people of liver cirrhosis and death. It could also remove Britain’s reputation as one of the EU countries with higher than average rates of alcohol consumption.

However, this trend shown by the ONS data does not necessarily mean the success of government regulations. “We don’t have any rational policy on drugs in this country” says Prof Nutt and he believes major action needs to be taken to improve the health and life expectancy in the UK.





Happy International Petroleum Week – celebrate it while you can!

The week celebrating a matter that makes many aspects of our lives easier, while we wait for 100% renewable energy, has ended. Wishing it a happy birthday sounds reasonable considering its a dying resource. Here’s how the top ten proved crude oil reserves-containing countries have been doing in the last ten years. The data comes from the US Energy Information Administration.

Click here to see the full interactive chart!

oil circle

Or here to see it in map form!

oil map

In the last ten years, some countries retained roughly the same amount of proven oil reserves such as Saudi Arabia, the United Arab Emirates and Canada. Other countries such as Mexico, decreased in reserves while some jumped up the table like Venezuela. The latter country increased in proven oil reserves by over 280%.

But a report on oil depletion highlights there is not enough fossil energy on this planet to ensure the economic prosperity of 9.4 billion people. If you are alive by 2050,  the effects of oil scarcity will occur in your lifetime.