I reported and wrote an infographic on the epidemiology of liver cancer. It was a good growing experience to work with a designer and an editor on a piece of visual, data-oriented journalism. I learned a lot and ended up wanting to experiment more with conveying data visually, with words and stories as a complement, rather than the other way around. I’m looking forward to the next opportunity.
To best see the visual elements of the infographic, I recommend checking out the PDF below rather than the HTML web version.
Chris Evelo asked me for more on what I learned about infographics. These are probably obvious to anyone who works in visual design, but here are some of my thoughts:
I learned that I need to space things out on the page. When I write text, I don’t think too much about what you could call information density. Some paragraphs probably carry more new information than others, but the main thing with text is to ensure semantic clarity and continuity throughout the story. Readers can take their own pace through the story, which is linear.
But there is probably some kind of limit to how many numbers or colors or symbols we can process in our visual field at any one time. So we had to be careful about how much we tried to fit onto the page. The map, for example, required care. We had to draw a few things out, since nobody could (or would want to) process facts about all 180-odd countries at the same time.
I also learned how complicated it is to compare data from different sources. I really wanted to compare, for example, historic age-weighted incidence of a disease to predictions for the next couple of decades. But one of the authors of the Globocan dataset pointed out that the available predictions were based only on population structure changes and not on intrinsic changes in incidence. So much for that idea.
I also would have liked fuzzier ways of representing some of the data. The ring-chart (or pie-chart) of the types of liver cancer looks far more precise than the data I found. I might have liked to shade the liver illustration to its left with the rough proportions of types of liver cancer, with the colors bleeding into each other to indicate the uncertainties. But even the uncertainties are poorly documented at a global level. Something I’ll think about for the next one–how to transmit an honest, clear, and compelling visual sensation of the fuzziness and uncertainty in so much of our data.
Thanks for provoking this small reflection, Chris. I’d certainly welcome your comments specifically on this graphic or more generally on visual data representation! For simplicity, I’ll keep comments open on this item for a month.
Thanks Lucas!
I do like your infographic a lot and it is interesting to see how you struggled to show what you wanted to show and how sometimes the real data just didn’t fit in with that because it was about slightly different things.
I often feel that this kind of presentations should ideally be dynamic. Sometimes the viewer will have a question about what is presented in the graphic where that question could in principle be answered by the available data. In such a case it would be so nice to have tools that can show that next layer of information even if the top layer does not allow that because of clarity.
In your infographic for instance the countries on the world map could in principle be clickable and more detailed data (the actual numbers, historic developments, links to info about the country itself and it’s overall health developments in other resources) could show up. We are struggling with technology to make that real, both on the level of the actual presentation and when it is about linking the data.
In pathway analysis of biomolecular data for instance we allow the user to click on individual genes or metabolites to get connections to databases that know more about these specific biological entities (PathVisio is the tool we use for that).
Another example here is that the incidence bar plots also indicate that live cancer almost always kills you while others do not. The leftmost ring plot also says that (just about 16% survivors after one year). But it is actually not easy to see that this is special for liver cancer, even though the data is there. You made it as easy as can be, by using the same X-axis scales for both plots. Combining the two in one plot is what scientists often do. But that doesn’t always make it clearer. Here a dynamic solution would be a lot harder I think. So for me it is still a question how you could make that clearer.
We are often struggling with presenting complex information even while we still try to understand that during data analysis in scientific research itself. That typically leads to questions about what can be left out, like you mentioned for the map. But also to “can I see what was left out?”, or simpler “what country is that on the map?”. Again an extra layer that is accessible on request can help I think. People are more and more used to that kind of web based approaches and will often already try to click or mouse over.
Your mentioning of uncertainty is spot on. We often don’t know things precisely and sometimes we don’t even know how certain we are about something. I think what you suggested, adding an area of uncertainty to the graph itself, for instance using the inside and outside of the rings in alternating way, might indeed help, in that way you could even show different estimates (and again you might want to hide that in a basic view for clarity).
One aspect I stumbled upon is that what you think would be clearest from a visualisation perspective may not really work in a domain where people have conventions to look at things. We tried for instance to use color shades in a box or a box rim to indicate statistical significance. But it turns out that scientists are so used to see “stars” indicating that, that using anything else makes it less clear for them.