Mind the Gap - An Economic Chart Remake
Guest author Jon Schwabish is an economist with the U.S. federal government and creator of policy-relevant data visualizations. His website, policyviz.com, contains details about his one-day workshops on visualizing and presenting data for people in public policy. You can reach him at firstname.lastname@example.org or by following him on Twitter @jschwabish.
Download the Excel file with the data tables and charts covered in this post.
This chart in a recent post by Catherine Rampell on the New York Times’ Economix blog struck me in need of a redo. Rampell’s post uses the chart to help illustration how there is a greater share of women in high-achieving jobs in the U.S. than in many other countries.
Rampell references an interactive version of the graph from the Organisation for Economic Co-operation and Development (OECD). The interactive visualization was made in Tableau and is essentially identical to the static version, but uses different colors (orange bars instead of red). The interactivity enables the user to choose any country and any year between 1990 and 2011. (Technically, the data are published by the International Labor Organization, but OECD has visualized the data).
I find the static graph difficult to decipher for five main reasons (and the interactive version suffers from many of the same shortcomings).
It’s difficult to compare the values for men and women because the data for men (blue diamonds) are so far from data for women (red columns)—there’s nothing that helps the eye connect the two.
The red columns for women take up a much larger proportion of the graph than do the blue diamonds for men, which results in overemphasizing the data for women. This may have been purposeful, but an active title—such as “Women’s Employment as Senior Managers Averaged 6% in 2008”—would have made that clear. I’m also not a fan of the color gradient in the columns—it emphasizes the base of the column instead of the top of the column, which is where the data are encoded.
The legend is completely disconnected from the graph and off to the right side. We typically start reading a graph from the top-left and then work our way down and into the chart. A better position for the legend would be just below the title or above the y-axis.
There are a lot of gridlines, they are a bit too heavy, and the percentage signs on all of the y-axis labels are probably redundant. (There are additional gridlines in the interactive version).
Why, oh, why do we continue to make column charts with vertically-oriented x-axis labels? (I bet a lot of people who just read that sentence looked up to the chart and rotated their heads 90 degrees to the left).
Okay, so let’s take a tour through some revisions of this chart by trying to address the issues just mentioned.
Just for comparison sake, here is my repeat of the same chart. I’m going to make a series of small changes, but keep the overall chart design the same.
Here are the changes I’ve made: different font (Cabin); different color scheme (orange and blue); fewer and lighter gridlines; top-left aligned legend; title with the “(Percent)” subheader.
So now let’s make what may be the easiest and most important change: rotating the chart 90 degrees. When the graph is rotated in this way, the metrics stay the same and how we perceive the data is unchanged, but the x-axis labels are now horizontally oriented, which are much easier to read.
Still, the orange bars overwhelm the blue diamonds and the two series are disconnected from one another. One way to address the visual imbalance of the two series is to create what I call a “stacked broken bar chart.”
Excel tip: This type of chart is seriously easy to make: it includes three stacked series, the middle series equals some number—in this case 20—minus the value of the base (far-left) series; that middle series is then given an empty fill. Unfortunately, when you create this sort of graph, the x-axis labels are 0, 10, 20, and 30, but that can be fixed by adding four scatterplot points and labeling each point with the series name (e.g., “0”, “10”, “0”, and “10”). I walk through the Excel process of making this type of chart in more detail in my “Making Excel Graphs Better” tutorial available on Slideshare.
I think this chart is a little better — it gives a more equal visual treatment of the two series, it’s easier to compare the values between men and women because the bars are aligned horizontally, and it’s easier to read the country names because they are also aligned horizontally.
But it’s still a little difficult to compare the values between men and women because the two series are treated separately. In the next figure, I link the two groups together using a dot plot graph and connect each pair with a thin, grey line.
Excel tip: In this tutorial, Jon Peltier details a step-by-step approach to creating a dot plot.
Notice again the integrated title, unit, and legend. I’ve not included data values directly on the chart because it clutters the graph, not to mention that the original chart did not include data values. (In another iteration of this chart I connected all of the points, but I didn't think that was really necessary).
Take note of one other important characteristic of this graph: the country labels are completely separate from the data values, which again makes the graph hard to read. I could draw another line from each orange circle to the y-axis. That would help, but that means I’m adding more non-data elements to the chart. Instead, I’m going to move the labels directly onto the chart:
Excel tip: You can move labels the easy way or the hard way. The hard way is to manually type all of these data labels one by one. The easy way is to write or steal a macro, or grab an add-in to do it for you — I like this one from The Spreadsheet Page. Note, however, that some of the labels will wrap onto two lines (in the chart dimensions I’m using, ‘United Kingdom’ and ‘Slovak Republic’). You can try resizing the graph, but you can’t directly manipulate data labels in Excel 2010—I hear that this is available in Excel 2013. To fix those two countries in my graph, I manually inserted two text boxes.
I’m satisfied with this last chart, but I’ll take it one step further. If you read the Rampell post, you’ll notice at the very end she discusses the ratio of women’s employment rates to men’s employment rates:
“In the United States, the ratio of the share of women who are in senior management positions to the share of men who are in senior management positions is about 0.85 (13.9/16.3), whereas for all O.E.C.D. countries it is about 0.59 (6.1/10.3). The country with the next highest ratio, behind the United States, is New Zealand, with a ratio of 0.76 (11.7/15.4).”
I think the ratios are important and really help tell the story—the relative share of women who are senior managers across countries is revealing in its own way, but comparing those shares to another group (e.g., men) within each country helps take certain country-specific employment characteristics into account.
There are two ways I try to visualize these ratios. First, I add a bar chart to the right side of the dot plot graph—in this way, the focus of the visual is still on the levels of men and women, but the ratio information is also included. (Alternatively, if you think the bars are distracting, you could just include the numbers of the ratio and not the bar chart).
Of course, another option is to take a much simpler and direct approach: visualize the ratios.
This final graph is clearly a departure from the initial graph, but that’s because this graph has a different focus. Here, countries take on different ranks: for example, Italy ranks 12th in the share of women employed as senior managers (9.1%), but the ratio between women and men (75%) puts Italy behind only the United States (85%) and New Zealand (76%).
To sum up, I’ve redesigned a static graphic (in Excel) from a column-line chart combination that was very easy to make to a dot plot (a bar-scatterplot combination), which was only slightly harder to make. The goal was to improve the static graphic such that the series for women and the series for men were given more even visual weights and to make it easier for the reader to compare those two groups. The last couple of graphics incorporated the ratio between women and men because Rampell discusses that metric in the last part of her post. Whether you think that information should be included or not largely depends on whether you think it helps to visually support Rampell’s argument.
Download the Excel file with data and charts.