Visualizing Data: Tips & Resources


The first graph of statistical information (continuous distribution function) ever published (Huygens 1669) taken from

During the ICPSR summer stats camp, I had the honour and pleasure of taking two courses with William Jacoby, a political scientist well known for his contribution to the fields of measurement theory and data visualization. One of the courses I took with him was a short course on statistical graphics for visualizing data. In this post, I will briefly share some of the resources and takeaways I garnered from him.

This course focused a lot on analytical graphs (e.g. the graphs we use to help us gain insight into our data by making sense of patterns or relationships). Why might a researcher go to the trouble of coding a graph that they had no intention on including in a publication? Graphics prevent mistakes. Using graphics to analyze data was extended and popularized by statistician John Tukey (developer of the boxplot, among other innovations).

Presentational graphs, in contrast to analytical graphs, communicate the researcher’s main point to the intended audience. Thus, the purposes and uses of these two kinds of graphical displays of quantitative information are very different. However, often researchers treat them as the same thing, which can be a problem. According to Cleveland, the components of interpreting or decoding presentational graphics are: detection (can you see the data?), assembly (can you put things together into a structure?), and estimation (to what extent does the graphic facilitate accurate estimation?).

In all fields of social research, and particularly in social work research, presentation of results is highly important. Often we research phenomena for and with communities that may not be familiar with scientific or statistical methods. We can and should make the salient points of our analysis easier to understand through graphical representation. If we fail to do so, our research is likely not to have the kind of individual, community, and social impacts that we would like it to.

Resources for Information on Data Visualization (not at all exhaustive):

New Directions for Evaluation has special issues on Data Visualization. see Autumn (139), and Winter (2013) issues.

Anscombe, F. J. Graphs in Statistical Analysis [An amazing paper]

Whatever you do, do not do this.

Example of a very bad graph

Cleveland, W. S. Statistics Research Homepage [An excellent resource for using and understanding the trellis package from S-PLUS (lattice in R)]

Glenn, R. W. Data Graphics [A basic overview of data visualization history and theory]

Jacoby, W. Statistical Graphics for Visualizing Data [Slides, code, and lecture notes from the ICPSR course]

Tufte, E. R. The Graphical Display of Quantitative Information [A bible of sorts]

Data visualization in social work research part II

In part I of this post I demonstrated the merits of the dot plot. In part II I will show an improved way to visualize a hierarchical regression model. Below shows a fairly standard table of multiple regression output for three models building sequentially on each other. Table 3 from Shiovitz-Ezra, S., & Leitsch, S. A. (2010). The role of social relationships in predicting loneliness: The national social life, health, and aging project. Social Work Research, 34(3), 157–167.
ShiovitzEzra.2010.Tab3Using R again I build a graph that overlays the three models and displays the coefficients graphically. This is based largely on the R code from David Sparks.

Choropleth tutorial

Coefficient plot

Coefficient plot walkthrough

I reordered the models and variables to appear in the desired order. Below is the replication of Shiovitz et al.


Data visualization in social work research part I

Okay, here we go, my long overdue post on data visualization in social work research.

You can find study details and data here on the Dataverse network.

First, let’s examine a published logistic regression table published in Trocme, N., Knoke, D., & Blackstock, C. (2004). Pathways to the overrepresentation of Aboriginal children in Canada’s child welfare system. Social Service Review, 78(4), 577–600.


I’ve replicated Trocmé et al’s results to produce a dotplot in R based on the work of William Jacoby.


This visual representation of the data appears to have several advantages.

1. The results are shown in visual form rather than forcing the reader to interpret numbers.

2. It is easier to distinguish statistically significant variables from those that are not statistically significant based on solid black v. empty dots.

3. The strength of the relationship between each IV and DV is more intuitive based on the distance from baseline of OR =1.

In a recent brownbag discussion, Toni suggested making the size of dot correspond to the size of the Wald statistic (a form of effect size). The graph is missing information that helps the viewer appraise the amount of uncertainty, e.g., standard errors. I’ll show an example of how to report the standard errors in the next post on this subject. For now, I am curious if you can think of additional advantages / disadvantages to the dotplot in comparison to the standard regression table.

z-scores and scatterplots over time

Bruno Martorano (UNICEF/Innocenti) presented a paper at the ICSI Conference titled Child Well-being in Economically Rich Countries.
Many of the figures employ a simple scatterplot with report card 7 on x axis and report card 11 on the y axis. The scores are standardized and represent distance from the mean in sd units. I have never seen this strategy for showing change over time. Curious what caused Canada to be almost 2 sds below mean for report card 11 (p. 15).

Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.