Journalism & Dataviz: the Whos, Whats, Whys and Hows.

Published in

Batjo

9 min readOct 15, 2018

A brief report looking at use of dataviz in 2017–2018 award-winning data journalism

Looking at award-winning data journalism: what charts are journalists using? What data patterns are the most visualized?

With the project Batjo: Bits, Atoms and Journalism we want to assist newsrooms in producing physical data installations and provide them with a few templates and workflows that can be adapted to produce custom data installations. Before designing these data installations prototypes, we needed to understand what graphical forms and metaphors journalists are using in their work, and what are their most common communication goals. This short post is an overview of what we found out looking at 300+ datavizs (see bottom of post for sample and methodology).

We decided to focus on two awards: the 2017 Data Journalism Awards (DJA) by the Global Editors Network and the 2018 Malofiej Infographics Awards by the Society for News Design. For the DJA, we included both winning and shortlisted projects. The goal was to understand the current state of award-winning data journalism, especially in regards to the following questions:

What type of topics/beats are most covered by award-winning data journalism? This will help us understand better the target field and help us find appropriate visual metaphors when designing possible news installations
What subjects are being portrayed in the single data visualizations? That is, what are the most common data units represented: people? cities? money?. If it’s people, how are they characterized, politicians, refugees, victims,… ?. Like the previous point, this information will also guide us in choosing appropriate visual metaphors and data embodiment.
What chart types are being used and for what functions? This will shed light into what type of data relationships and patterns are most frequently highlighted and communicated through journalistic data visualizations
What is the level of interactivity offered? This will help us understand what type of interaction journalists embedded into the data visualization: clicks, hovers, filters, zooms…? Such information will guide us into devising physical gestures that are best suited to offer this level of interaction in the physical world.

Topics and subtopics

**TABLE1**: Topics of analyzed award winning data journalism at DJA 2017 and Malofiej 2018 (includes only the 84 entries fit for analysis)

Possibly also because of Trump and Brexit, politics is the dominant topic of data journalism projects at DJA 2017 and Malofiej 2018 (21 entries out of 84). Most of the projects covering this theme, 9 of the 21 (42%), focus on politicians: the emotions of presidential address speeches, fact-checking of their statements, their assets, and so on. If we distinguish between the two awards though, we notice that this holds true especially because of DJA 2017. At these awards, 39% of the entries cover politics (13 out of 33), while the number for Malofiej is only 16% (8 out of 51).

On the other hand, Among the winners of the Malofiej 2018 Infographics Award, the most represented topic is science & nature (14 out of 51 entries, or 27%), with the most extensive coverage within this category dedicated to natural disasters (hurricane, fires and floods).

**FIGURE 1**: Topics and subtopics of analyzed award winning data journalism at DJA 2017 and Malofiej 2018 (includes only the 84 entries fit for analysis)

Dataviz subjects

About a third of all the data visualization surveyed were about people: 114 out of 302, to be exact. Somehow mirroring the patterns discovered when looking at the most common story topics, people have been most often characterized as victims (51), refugees/migrants(15), or politicians (13).

Journalistic charts types

**TABLE 4**: Chart families of visualizations in analyzed award winning data journalism at DJA 2017 and Malofiej 2018

Journalists mostly used bar/column charts and maps for their data visualizations. 88 of the 302 data visualizations could be classified as belonging to the bar/column chart family, while 69 to that of maps. After these, the following most common visualizations were line charts and clustered graphs (here defined as chart where each single data point is shown as a data unit in the chart — like in packed circles, grid plots, etc.).

It’s interesting to note that, if we filter for only the main chart, then maps are the most common ones, with bar/column charts dropping to second position and clustered graphs jumping to third.

When looking at specific chart types, the most common one is the dot (density) map: a map where dots are used to communicate the spatial distribution of data — this includes both maps with dots on precise locations and maps with dots distributed randomly in a specific area to show density of the phenomenon within those specific boundaries (a.k.a. dot density maps).

**FIGURE 3**: Chart types and chart families in the 302 analyzed data visualizations in award winning data journalism at DJA 2017 and Malofiej 2018

Functions of journalistic charts

In order to design effective physical data representations, the main task is to understand what are key types of data patterns that journalists most often want to communicate. The most frequent ones in our sample are: spatial (74), part-to-whole/categorical comparison (70) and change over time (64). These three together account for about two-thirds of all the visualizations analyzed. The same ranking and proportions hold true also when looking only at the main charts of every project.

**FIGURE 4**: Chart types and functions in the 302 analyzed data visualizations in award winning data journalism at DJA 2017 and Malofiej 2018

Interactivity

Surprisingly, half of the charts in the sample had no interactivity. The proportion gets a bit lower when only looking at the main charts of each entry, but even in this subset the ratio of static charts is still high (44%).

The most common types of interactions were clicks and hovers on data points to show more details of the selected point; scrolling animations that changed the chart and/or the data shown, and filters.

What could journalistic data installations look like? Stay in touch and join the conversation.

Methodological notes

About the sample

For the analysis, we decided to focus on award-winning journalism. This means excluding the majority of journalistic work, but there is a reason for this selection. The first physical news installations will inevitably be more time consuming and resource-intensive than the regular daily journalistic news item. Given the investment needed to learn, adapt and master a new technology, it is likely that newsrooms will produce these news installations for extraordinary reporting, not ordinary daily stories. On this ground, analyzing award-winning journalism means that the sample is already filtered to include such extraordinary work.

Additionally, the selective sample contains the avant-garde of the field, the most innovative reporting that sets the example for others to follow. It is exactly this kind of work produced by those that will probably be most open to embrace innovation and experiment with new technology.

We decided to focus on two awards: the 2017 Data Journalism Awards (DJA) by the Global Editors Network and the 2018 Malofiej Infographics Awards by the Society for News Design. For the DJA, we included both winning and shortlisted projects. The choice of these two awards is based on the fact that they complement each other. The Data Journalism Awards have journalism projects that are necessarily based on data, but might not contain a data visualization. On the other hand, all Malofiej winners will have a prominent sophisticated information visualization component, although some might not be strictly data visualizations. Since the analysis focuses specifically into data visualizations, some entries from both of the awards are not included in the analyzed sample. Here is a list of reasons why a winning or shortlisted project has not been considered for analysis:

The entry is not a single article but an entire news organization or one of its departments; a whole website or a particular section of the website; or a portfolio.
The entry doesn’t contain any data visualization. It could be that there is no visual display of data at all, or that the visual component is more of an information visualization. A clear line between the two is very difficult to draw and might even be subjective, but here is an overview of the main categories of things that have been excluded from analysis: maps showing an anecdotal information (Mr. X Was in point Y when this happened); raw data dumps with no/little processing (a table of tweets; list of fact-checked statements); illustrations explaining a phenomenon. These instances have been excluded not because they are deemed less powerful than other examples of data visualizations, but because we felt that they are outside of the scope of what we want to get insights on.
The entry is not online. As we are focusing on digital journalism, we excluded entries published only on print. (Note: one project was online but not accessible)
A single article/project might include even dozens of data visualizations. If we analyzed each visualization in every article, not only would this be very time consuming, but it will also skew the results of the analysis in favor of pieces with many data visualizations. We therefore decided to analyze up to five data visualizations per piece. We looked at the first five, as visualizations on top are also the ones that the journalist likely deemed most relevant.
Within each article, we also had to make choices about what is part of the same visualization and what is not. Sometimes this has been straight-forward, other times it has been up for interpretation. It the latter case, here are the criteria that guided our choice: If a visualization is animated through scrolling/clicking, all its variations are counted as part of the original visualization. But, if the form of the visualization changes (for example from a bar chart to a map), then each mutation of the chart type is counted as a new visualization. If a chart is repeated using the technique of small multiples, all the small charts are counted as a single chart. If a chart is merely a zoomed-in version of a previous chart, it is ignored and counted as part of the previous chart.

Given these considerations, here is a table with an overview of the size and composition of our data sample.

**TABLE 1**: dataset size and composition

Coding the visualizations

In preparing the dataset, we firstly gathered the main information about each shortlisted entry (for DJA) or winning entry (for Malofiej). We included both data scraped from the official website of the awards — like publisher and country — and data that was filled in manually — like classification of the story’s main topic and sub topic.

For each entry, we then coded variables specific to each of the data visualizations contained in the story (up to 5). This data was targeted at gathering insights about the type of charts used, the perceived main communication function, and the subject/data unit of each data visualization. The tables following outline all the variables for which we collected data. More detailed information about each variable and the values used for coding the data are in Appendix A.

Each project and each visualization was analyzed through content analysis carried out by a single coder who is also the person who came up with the codebook and the categorization. Additionally, for some variables, the decision of what value to assign to each observation is subjective, and the boundaries between one categorization and another can be blurry. For these reasons, we are well aware that this analysis doesn’t fully abide to high research methodological standards . We did of course strive for accuracy and repeatedly checked the categorization.

The fact that the content analysis of each visualization was done manually by a single coder inevitably led also to some constraints in terms of how much data could be analyzed in a reasonable time frame. We prioritized the latest editions of two awards mentioned, leaving out from the data sample both prior editions of these awards and other journalism awards that could have included interesting data. For example, our sample might have been skewed towards political journalism because of the US elections and Brexit that both happened in 2017. Additionally, we realize that both awards have been dominated by US national news.

The results of this analysis will not be used in a deterministic way to shape our designs, but rather as a source of inspiration and guidance. For this reason, despite the major limitations of the data sample, the analysis is deemed good enough for its aim: to infer some of the characteristics of the current state of data journalism.

**TABLE 2**: Structure of the dataset used for analysis and variables compiled for each entry

The full compiled dataset will be publicly available together with the official launch of the project Batjo: Bits, Atoms and Journalism, in January 2019.