The tools we use to run our data journalism business

Alastair Otter
Media Hack
Published in
7 min readJul 14, 2022

--

Running a data journalism operation requires a lot more than a charting tool and Google Drive.

We’ve been doing data journalism for around eight years and we were two freelance journalists for the first six of those. In March 2020 we formally set up shop as Media Hack Collective. Today there are six of us in the team and we publish The Outlier and do data and visualisation for clients.

Over the past two years, we’ve mostly settled on a set of tools for our daily work. It’s far from perfect and there are still many gaps. Some decisions on tools have been very deliberate while others are based on circumstances outside of our control.

As two freelancers it was easy to switch between tools. We often did that when we wanted to learn something new or heard about a possibly better way of doing something. And it was hugely valuable as a way to learn what worked best. But as we grew as a team it became more important to settle on a specific set of tools. Even if some weren’t perfect they at least had the advantage of being common across the team, which made them easier to manage.

Much more than data in data journalism

A lot of data journalism is about collaboration. Some people are better at data collection, some are good at visualisation, and others at analysis or writing. Data journalism can also be complex; there is often more than one way to do something and the tendency is to jump onto the latest shiny tool. We’re learning the value of selecting, and sticking to, a short list of tools.

Our toolkit

Our preferred tools cover four core areas of our data journalism work:

  • Collaborating
  • Collecting, cleaning and analysing data
  • Visualising data
  • Publishing data stories

This is an opinionated list of tools and not a comprehensive one.

Collaboration

Notion

https://www.notion.so

We switched to Notion at the start of 2022. We previously used a combination of a project management tool and Google Docs to manage our day-to-day projects. We work across so many different projects and stories that it became difficult to keep track of everything, to the point that we barely used the project management tool. And Google Drive’s file management (or lack of it) meant we had hundreds of documents but no obvious way to find anything.

Notion’s primary advantage is that it bridged the divide between a project management tool and a document management tool. It’s easy to create documents directly in tasks so we no longer have things we need to work on listed in one place and the work itself in another.

Notion is not a drop-in solution and does require work to set up a working system, but the advantage is that it is infinitely customisable, so we will eventually have something that really works for us.

Slack

https://slack.com/

We would probably stop using Slack if we could but so many of the organisations we work with use it. And it’s pretty good for inter-organisation collaboration.

Discord

https://discord.com/

Discord would probably be our replacement if we decided to drop Slack. We currently have a Discord bot that sends out reminders and fun quotes daily, as well as letting us know when someone subscribes to our newsletters. Bots are pretty easy to make for Discord and are something I’d like to explore more.

Google Drive & Google Workspace

https://drive.google.com/

We used to use mostly Google Drive for everything we did but we pretty much only use it for Google Sheets now. All other writing and document sharing we do in Notion. Like many people, we use Google Workspace for email and email groups.

Data collection, cleaning, storage & analysis

Google Sheets

https://sheets.google.com

Google Sheets is the most important and core tool we have. Just about everything we do is based on a Google Sheet at some point. Sheets is perfect for collaboration and sharing and have more than enough features for the work we do. Occasionally we dip into Microsoft Excel for something, but anything we’re working on regularly will eventually end up in Google Sheets.

From a technical point of view, the ability to publish individual sheets from Google Sheets to the web as CSV or JSON files is hugely important to us. Most of the data for our projects is collected in Google Sheets and then published as a CSV. We will pull this CSV automatically into a SQL database for use in a range of other projects. Our Covid Tracker, for example, is automated and based on a set of Google Sheets. Our Coronavirus Dashboard ran for more than two years and was based on data maintained in Google Sheets for that entire time.

R

https://www.r-project.org/

While most data work is done in Google Sheets there is inevitably a time when we need to do more and our tool of choice is R. It’s particularly useful for work that we expect to be reproducing or work that goes beyond the scope of a pivot table in Google Sheets. It’s also really useful for formatting bigger data sets.

SQL & PHP

https://www.mysql.com/ & https://www.php.net/

SQL, particularly MySQL, is central to our work. We import a lot of the data we have in Google Sheets into MySQL either to publish to the web or for further analysis. We use mostly PHP to query the SQL databases for specific data if we’re building a data visualisation which means we can keep the final datasets we use very small even if the original dataset is very large.

Open Refine

https://openrefine.org/

In our experience of teaching data journalism, most people don’t get Open Refine when we first talk about it. And honestly, most people may never understand it. But when they do Open Refine is one of the best tools for cleaning large and particularly messy datasets. It does take time to feel comfortable with Open Refine but it’s worth the effort. See: 5 reasons to switch to Open Refine (and never look back).

Data visualisation

Flourish

https://flourish.studio/

Most data visualisation lists will include Flourish and DataWrapper. Both are excellent tools, and if we were just looking for a tool to create visualisations we would probably pick DataWrapper. But we use Flourish. It has many more options than most DataWrapper charts and is perfect for creating draft versions of charts we want to make. We mostly use Flourish to develop the initial chart and then export that to SVG format, which we can then refine in Figma for final publication.

Figma

https://figma.com/

We finish almost all of our charts in Figma. Figma is easy to learn and makes it easy to collaborate. Illustrator would be the obvious alternative to Figma and there are times we would consider switching to Illustrator, but in most cases, Figma is more than adequate for our purposes. New team members can get up to speed pretty quickly in Figma.

Javascript, Svelte & D3

https://svelte.dev/ & https://d3js.org/

This is not strictly for visualisation alone, but when we do want to develop more complex, interactive visualisations then we use Javascript and especially Svelte and D3. We’ve built many visualisations in the past using D3.js, but they were always a lot of work and difficult to manage. Svelte has made everything a lot simpler and we now mostly use Svelte to handle the layout and interaction for charts, while D3 handles the complex bits like calculating scales and paths.

Publishing

WordPress

https://wordpress.org/

WordPress is still the easiest and best way to publish content. There are many new content management systems available but WordPress is pretty much the standard if you just want to publish stories. We self-host WordPress for most of the sites we run. The biggest limitation for most data journalists is that it isn’t ideal if you want a full-screen, “scrollytelling” style story, or anything that can’t easily be embedded in a WordPress blog. In those instances we use Vercel.

GitHub & Vercel

https://github.com & https://vercel.com/

We are increasingly publishing a lot of our data visualisations using GitHub and Vercel. This is particularly the case if we’re building complex visualisations or visualisations that have multiple people working on them. We usually build these using Javascript, D3 and Svelte. The GitHub repository can be managed by multiple developers and new versions of the code are pushed straight to a Vercel site.

TouchBase Pro

https://www.touchbasepro.com/

We publish a number of email newsletters, including our main Outlier newsletter. We currently use an email service called TouchBase Pro. TouchBase is developed in South Africa which means its pricing is local and we have easy access to the team supporting it. These are advantages very specific to our circumstances so maybe not be all that useful to other publishers. We previously used Mailchimp for newsletters but it became costly as we grew our lists beyond entry-level packages.

Financial

PayFast

https://www.payfast.co.za/

PayFast is an SA payment processing service. We’ve used it for collecting donations for a crowdfunding project and for collecting credit card payments for training courses. It’s not an all-in-one service so it does need a little development work to be integrated into your site/s but there are some good plugins for WordPress that do most of the work. For our crowdfunding site, we used the Charitable plugin for WordPress which handled most of the set-up.

If you enjoyed this article, you can find other tips on how to visualise data here.

Media Hack Collective’s The Outlier publishes a data journalism newsletter every two weeks. Read the latest issue and subscribe here.

--

--

Media Hack Collective co-founder, editor at The Outlier, and data visualisation specialist. Data journalism newsletter: newsletter.theoutlier.co.za