« Older Entries | Newer Entries »

Angus Deaton on the importance of numbers

At long last, I started reading Angus Deaton‘s “The Great Escape”. The book is the esteemed professor’s take on economic development and health advances also explain the rise in inequality. It’s a substantive read for me and I wasn’t expecting to encounter any brownbag methods-related topics. However, in the introductory chapter, among other things, Deaton explains his approach to the book. Not surprisingly he roots his arguments in empirical findings. In a long sentence on page 16, he provides one of the most concise and compelling arguments for using evidence in the analysis of social problems:

 Unless we understand how the numbers are put together, and what they mean, we run the risk of seeing problems where there are none, of missing urgent and addressable needs, of being outraged by fantasies while overlooking real horrors, and of recommending policies that are fundamentally misconceived.

Making Statistics more Meaningful

The opening argument of Henry May for making interpretations of results of different statistical methods and models understandable to all stakeholders is quite convincing. No matter how fancy or complex models you use in your analysis, it bears a little value if it fails to communicate your explanation of reality to not only other researchers of your troop, but to policy makers and other common stakeholders in your community. If you explain something that exists in reality, it deserves to be understood by almost everyone whether it is Einstein’s theory of relativity or Hawking’s theory of big bang or Darwin’s theory of evaluation or Marx’s theory of dialectical materialism. We all should bear it in mind that statistics is just a simple tool for analysing all information in our hand to explain the nature of target research problems where statistics is dedicated to research, not research is for statistics. When we start our journey of research with formulating research questions, statistics doesn’t join us at the beginning rather the time we decide what type of data we will use to explain our research questions, we invite statistics as our aide to help us organise, summarise or explain the dynamics of relations among variables if you choose to collect quantitative data for our study.  Like the beginning, when we obtain the results of statistical methods or models, we need to explain or interpret them in plain English or any other languages as we have already formulated our research problems.

May, in his article, suggested guidelines for meaningful presentation of statistical outputs. He indicated three major features of meaningful presentation, which includes understandability, interpretability, and comparability. By understandability May meant “The results should be reported in a form that is easily understood by most people by making minimal assumptions about the statistical knowledge of the audience and avoiding statistical jargon” (May, 2004, p.527). He simply suggested to reduce or eliminate statistical jargons as much as possible, and to explain them in plain language. By interpretability, May simply meant the familiarity of the metrics or units of measure, which the statistic is based upon.  The more the unit of expressions is familiar to the audience, the more the interpretability is. The third feature, May emphasised is the comparability which refers to the characteristics of statistic of being compared across different factors of a single study or to the effects from other studies.

To make his argument more understandable to us, May came up with some examples from our everyday practices. He categorised statistical analysis into two major groups: descriptive statistics and relational statistics against the traditional categorization of statistics as descriptive and inferential nature. As we know, descriptive statistics usually describe some general aspects of a distribution such as percents, proportions, average, ratios, and sometimes variance and skewness in order to gauge the severity of problems. Most audience are familiar with percents, proportions and average, and may be a little familiar with standard deviation or variance and skewness. May suggested phrasing variance and skewness in the context of distributional density. I would simply say that standard deviation or variance tells us how good a distribution is in terms of Whether the data are distributed close around the average or  far around. Standard deviation doesn’t tell us about the direction of data. Skewness can tell us this story whether most data are scattered below the average or above it along with the density or compactness of them.

According to May, relational statistics are used to describe and gauge the strength of relationships between two or more variables, or to estimate the effect of one variable on another. Most commonly used relational statistics are linear modeling that includes analysis of variance (ANOVA), correlation, regression analysis, path models, and hierarchical linear models (HLM). To make correlations more meaningful, May suggested to explain the possible values for correlations (i.e. -1 to 1), and to explain other characteristics of correlations. Correlational results can also be reported using visual effects such as scatter plot or diagrams, which will help audience better understand the relations. To make regression coefficients more meaningful, he suggested explaining how the expected change in an outcome comes for per unit change in a predictor, and how the coefficient of determination R√ indicates the variation in the outcome variables explained by the predictor.

I personally believe that considering the nature of problems, we should use statistical tools as simple as possible. Why I should bother for a computer if I can solve a problem with a calculator.  If I am bound to use a computer I need to explain the outputs in a general language understandable to all since this clever machine has a language of its own, which is not spoken by all. Thus, the user-friendly interpretations of statistical results will help communicate the nature and dynamics of the problems to the researchers, policy makers and all possible stakeholders, and will benefit the community at large.


-May, H. 2004. Making statistics more meaningful for policy research and program evaluation. American Journal of Evaluation. 25. p. 525-540.  

Summary of Rose & Stone (2011): Instrumental Variable Estimation (IVE) in Social Work Research: A technique for estimating causal effects in nonrandomized settings

Although there already exist sophisticated statistical techniques for estimating causal effects in nonrandomized (e.g., observational) studies such as fixed effect models and propensity score matching, Rose and Stone attempt to make a case for the use of the Instrumental Variable Estimation (IVE) technique in social work research. Traditionally used by econometricians, the authors claim that the use of IVE in the field of social work can lead to less biased causal estimations by properly disentangling causal pathways involved in the study of effects of nonrandomly assigned treatments or interventions. The authors discuss the issue of endogeneity (also see Stone & Rose, 2011) and nonignorability in nonrandomized research, due to the fact that the effect (X) on the outcome variable (Y) cannot be completely manipulated or controlled by the researcher, which introduces the possibility that other unobserved/missing/unmeasured factors play a role in the treatment assignment and thus are also related to the outcome. These confounding variables (e.g.., history, maturation) dilute the true effect of X on Y as well as threaten internal validity and if left unmeasured or unaccounted for will cause biased causal estimates. Thus the goal is to determine which part of X is truly (and directly) casually related to Y, despite the presence of possible confounding variables.

Since observational studies are not viewed as the golden standard as per Rubin’s causal model (as read in Shadish, 2010) due to their nonrandomized (or non-experimental) nature, Rose  and Stone claim that IVE can artificially create the same conditions as random assignment, isolate the effect of X on Y and therefore produce results that can be held to the same golden standard as Rubin’s causal model. Another (and more commonly) used method to inform causality, correlation analysis, is discouraged as it is time consuming and requires a well established theory for the domain being researched (which can often be lacking in the social work discipline). In addition, correlation analysis requires the consideration of ALL possible confounding relationships in order to include them in the statistical model, which can be a daunting exercise. According the authors, IVE allows the researcher to bypass these demanding requirements through the use of a statistical control technique.

However, there are still some pre-requisites in order to properly use IVE – the researcher’s task is to identify the instrumental variable (IV) that will act as a control variable.  This variable is usually unrelated to the model under study, is non-endogenous and is a causal antecedent to X (the endogenous variable). Rose and Stone outline two rules for the selection of an appropriate instrument variable, also referred to as the exclusion requirement:

  1. The instrumental variable (Z) must be highly correlated to the endogenous variable (X) (aka Z must be the causal antecedent to X – pretty straight forward)
  2. Z must be uncorrelated with the error (aka Z cannot be associated with the outcome variable Y through error – not as straight forward).

These two rules must be met in order to be able to make definitive conclusions from the analysis. However, the authors point out that rule #2 is much more difficult to determine, as it cannot be demonstrated through statistical tests or the data itself. Thus, justification of rule #2 has to be derived from information that is external to the sample under study, which requires careful thought/logic exercises. The goal is to determine whether the instrument variable effectively randomizes persons to the condition of X (endogenous variable – cause). Since it is highly unlikely that a perfect instrumental variable will be found (variables have at least one possible causal antecedent in most cases), then most instruments will tend to be imperfect in nature.

Rose and Stone then present the statistical model for IVE in two ways: (1) IVE is contrasted with the Ordinary Least Squares (ORL) causal estimator approach and (2) a multistage estimation of the instrumental variable via a two-stage least squares approach. This part of the article was a bit difficult to follow and grasp, since my level of statistical formula familiarity is hazy at best.

I did find the examples useful in illustrating the ‘practical’ use of IVE in social work research and it allowed me to make possible linkages to my own research interests. For instance, if I want to examine the best age for youth to transition from foster care (and other types of residential care) to independent living (let’s say age 21 compared to 18) in order to achieve the best outcomes (let’s say, well-being) as young adults, I could use date of birth as an instrumental variable in order to determine the true effect of X (transitioning out of foster care) on Y (well-being of former youth in care). However, there are some pitfalls in using the IVE approach, even in my example, since even with a supposedly exogenous variable such as date of birth can become endogenous if (as also discussed in the article) for instance some of the parents had a role in determining exact date of birth of their child and thus making it a non-random occurrence. In addition, the role of human choice or behaviour (or as Rose and Stone call unobserved human agency) could also influence the outcome in my study, since some youth might chose on their own accord to transition to independent living at an earlier or later age, for various personal reasons.

Although I do agree that having knowledge of the IVE technique is definitely useful to the social work discipline in terms of being able to properly read and assess studies that employ the technique, it is important to weigh the pros and cons associated with its use within the particular research context it is being considered for. The authors caution that in order to properly use IVE, the sample size should be sufficiently large; however they do not explain how one can determine the appropriate sample size for their particular study (perhaps Maxwell, Kelly and Rausch’s (2008) sample size planning techniques could be of use). I am not convinced that it should be held at a golden standard similar to Rubin’s causal model, as justification of rule #2 seems to be a lot more challenging (and perhaps at times impossible) than Rose and Stone would like to admit since the burden of proof requires an extensive inductive analysis to rule out any possible unwanted associations. Even when rule #2 seems to be satisfied from the researcher’s perspective, explaining the rationale behind it to others might not be as obvious and can thus propel the “black box” phenomenon (as discussed in Green et al., 2010).


Green, D. P., Ha, S. E., & Bullock J. G. (2010). Enough Already about “Black Box” Experiments: Studying Mediation Is More Difficult than Most Scholars Suppose. The Annals of the American Academy of Political and Social Science, 628, 200-208.

Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537-563.

Rose, R. A., & Stone, S. I. (2011). Instrumental Variable Estimation in Social Work Research: A technique for estimating causal effects in nonrandomized settings. Journal of the Society for Social Work and Research, 2(2), 76-88.

Shadish, W. R. (2010). Campbell and Rubin: A Primer and Comparison of Their Approaches to Causal Inference in Field Settings. Psychological Methods, 15(1), 3–17

Stone, S. I., & Rose, R. A. (2011). Social Work Research and Endogeneity Bias. Journal of the Society for Social Work and Research, 2(2), 54-75.

summary of Cook, Scriven et.al.,(2010): Thinking about causation in Evaluation: A dialogue with Tom Cook and Michael Scriven

As I mentioned in the last class, I’m a type of critical researcher in the RCT (Randomization Clinical Trial) rather than the enthusiasts for it. My concerns with the RCT are mostly focused on its feasibility and ethical issues in the social work field. However, as one of the advocates of the evidence-based studies, and I always feel so confused with many issues related to RCT or its alternatives.

RCT is certainly the most idealistic way to conduct a research on the causal relationship, but from my perspective, it seems impossible to implement in a real research world. So what choice left for us? Or, is it possible to make it real?

This question seems to be difficult to answer for most researchers, even for the distinguished one like Tom Cook and Michael Scriven (phew).

While I was reading the debate between Tom Cook and Michael Scriven, I was usually on the side of Scriven, but Tom Cook’s idea was quite persuasive, especially the case example of trial to control random assignment problem was quite impressive.

In this debate, Michael Scriven criticized for tough requirements required by the current RCT designs. That is, he argued that not to lose the valid RCT design, researchers are required to keep watching two endemic problems with RTCs, attrition and cross-contamination, but the real problem is that there are not many researchers who can hold up those conditions imposed by RCTs.

According to Scriven, the hard-line RCT position is supported by three main pillars: (a) only the RCT design excludes all alternative causes, (b) it’s the only design that supports the true meaning of causation, (c) no other design (quasi-experimental) does it as well. Scriven criticized these three pillars with raising the problem of RCT design such as contamination or attrition problem, counterfactual mistake and not admitting the quasi-experimental designs. Especially, he emphasized the importance of quasi-experimental designs, not as just the alternatives for RCTs, but better ways in a certain situation. In addition, he raised the unethical issues related random assignment.

Tom Cook is one of the leader early leaders of the RCT gangs, but also critical user of it with open-mind. He was not obsessed by RCT method, and accepted the alternatives which have the ability to recreate experimental results such as regression discontinuity, a geographically local intact comparison group that is matched on pretest scores, and the research conducted when the process of assignment to treatment is perfectly known. However, he contradicted Scriven’s criticism for the RCTs problem with reasons as below.

First, Tom Cook argued that random assignment cannot by itself guarantee a secure causal inference, but with accumulated studies, they can find the way how to minimize the problem such as attrition or contamination. Second, he said that the counterfactual problem of overdetermination is solved by the use of statistical matching of groups rather than individual matching. About the third pillar mentioned by Scriven, Tom Cook argued that even the most fervent advocates of random assignment would not deny that you can make causal inferences without random assignment, and admitted that random assignment is limited to the situation where, given a specific cause, we want to know its effect. But, he said that in social ameliorative field like education, the interest is more often in going from causes to effects rather than from observing an effect to identifying its cause. Regarding to the ethical issues, Tom Cook put in this way. “If we concern about the ethical issues derived from random assignment, how about carrying on for years with practice based on poor evidence. Is it ethical?” He argued that the malpractice could be more harmful compared to the randomized assignment (What a pragmatist approach!!).

Lastly, he said that the RTC “crusade” in education referred by Scriven does not make sense. He argued that unlike the “crusade” in Middle-Ages, the advocates of randomized assignment have attempted for a long time to institutionalize randomized experiments as the method of choice in education research, but they did not have any intentions to dominate a source of all education research funds. According to his opinion, since over the last 30 years, quantitative studies like randomized experiments were downgraded systematically, they just hoped to set the historical records correctly by overemphasizing it.

Scriven challenged against this idea “little dictatorship is needed to fix up the bad times in the past”, because RCT has almost won the research fight. He kept saying that certainly there are somewhere left not conquered by RCT, but they should stop now before the situation worse. He underlined that because RCT is not the only way and there exist many different ways to achieve their research goals, researchers should be eligible for getting funded regardless of whatever methods they chose.

In his final remark, Tom Cook expressed similar opinion with Scriven by referring this praise “questions come first, and method choice second.” He pointed out that they both agree with that “randomized assignment is not the only way, but the best single method for causal inference in a certain condition.” He said that probably their dispute would be about how much better RCTs are when compared to other alternatives for causal inference.

In his final mark, Michael Scriven also approved that his main target is not Tom Cook, a critical user of RCT, but the RCT super-enthusiast. And he finally mentioned that even though he still cannot agree with the idea about controlled and controllable random assignment problems, it is true that their final goal is to pursue a middle ground: don’t fund bad studies, of whatever kind, and do fund good and efficient use of research resources, whatever the design they use.

After reading their debate, I still cannot find the answer to my question, but at least, I can feel it is not a big deal. “Question comes first, method choice second, there are so many ways to get there including experiment and quasi-experiment. And we should scrutinize all the ways with critical perspectives before we choose one.” That’s the main lesson that I got from this debate.

Summary of Maxwell, Kelly & Rausch (2008): Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation

As academics and researchers, common sense would dictate that the larger the sample, the better the results of a study. However in reality, most researchers do not have access to indefinite participants and thus must be aware of how to determine the best sample size according to their research goals. Maxwell, Kelly and Rausch’s (2008) review of various sample size planning techniques in psychological research illustrate that determining the appropriate sample size for a study is much more intricate than one would think, and can be used for various ends such as improving statistical significance (power), precision of effect estimates (within narrow confidence intervals) and accuracy of results. In the ideal research design, all three aforementioned areas of improvement should be considered when thinking of sample size planning.

It is important to note that statistical power and accuracy are only relevant when hypothesis testing is involved, since the very definition of power is the probability of rejecting the null hypothesis in favour of an alternate one that is true. Power and accuracy are also not one in the same, but should both be considered (along with preciseness) in sample size planning, as it is important to showcase the degree of uncertainty of an effect size in order to produce trustworthy results, whether large or small. The authors emphasize that even if a sample size is determined as being sufficiently large to guarantee adequate statistical significance (power), it may still not be large enough to guarantee accurate parameter estimates of the effect. They discuss a method and formula for accuracy sample size planning called Accuracy in Parameter Estimation (AIPE), which emphasizes the reporting of confidence intervals along with statistical significance in order to showcase if the effect has been estimated accurately.

The authors then outline a review of various targeted statistical power and AIPE techniques in the literature based on psychological research design (please see article for references for specific techniques):

  • Comparing the means of two independent groups (via 2 group t-test): For statistical significance, the smaller the value of the population standardized mean difference, the larger the sample size should be for a specified level of power. For accuracy, the larger the value of the population standardized mean difference, the larger the sample size should be for a desired/specified confidence interval width.
  • Adjustment for multiple mean differences comparison procedures (via ANOVA): This adjustment has to be taken into account in sample size planning otherwise the sample size will be too small for both statistical significance and accuracy.
  • Multiple regression (via 2×2 framework): Sample size planning techniques for multiple regression depends largely on whether the effect size of interest is omnibus or targeted as each requires a different sample size.
  • General linear multivariate model: Although there are effective power and AIPE techniques for this, most assume that the study employs fixed predictors even though most psychological predictor variables are continuous and random. The authors outline techniques for both fixed and random predictor studies.
  • Exploratory factor analysis: The authors caution that the rules of thumb in the literature for this type of study should not be trusted and that sample size planning techniques should take into account communalities as part of the procedure.
  • Confirmatory factor analysis and structural equation modelling: although AIPE techniques have not yet been developed for this domain, the authors outline sample size planning for power using chi-square ratio tests (to evaluate exact fit of the model) or fit indices (to quantify how well the model fits the data).
  • Longitudinal data analysis (via Latent growth curve (LGC) models for analysing change): The authors outline techniques for both continuous and discrete outcomes. The former involves techniques to detect group differences, patterns of attrition and the effect of missing data on power, while the latter involves using sample size tables to detect treatment effects between two groups based on number of repeated observations, group response rates and intraclass correlation.
  • Generalized linear models (modelling categorical variables, contingency tables and variables not normally distributed): The authors outline a version of the Wald test to test multiple parameters simultaneously for statistical power. They also caution that when a continuous variable is categorized (recoded as ordinal), the sample size must be increased in order to reduce the loss of power effect.
  • Cluster randomized trials (compares clusters of individuals such as classrooms, schools or neighbourhoods): For both power and accuracy a certain number of clusters is required even if the number of participants within each cluster is high. The authors mention a multilevel modelling method to determine statistical power, and a technique to determine the necessary number of both clusters and participants within each cluster in order to be able to detect a meaningful treatment effect.
  • Survival analysis (outcome of interest is the duration of time until a particular event occurs): The authors mention the Weibull model and Cox regression model to determine appropriate sample size for both power and accuracy.
  • Mixture modelling (decomposing an observed (composite) distribution into multiple unobserved (latent classes/groups) distributions): Sample size planning in this case is dictated based on the statistical method utilized, but generally results are better when the sample is larger. However, the authors caution that a large sample can sometimes overestimate the number of latent classes present; thus theory needs to be play a role in determining that number.

The authors also mention that for any research goal, design or model, a simulation approach (Monte Carlo simulation technique) can be used prior to conducting the analyses in order to determine appropriate sample size for power and accuracy. This technique involves generating random data from the population of interest, conducting selected statistical techniques and repeating them a large number of times with different sample sizes until the appropriate minimal sample size is found.

Although the authors emphasize that sample size is not the end-all-be-all for increasing statistical power and accuracy of a study’s results, it is still a very important aspect to research design especially to detect smaller effects. However, the true sample size required to obtain adequate power and accuracy can often exceed the resources available to the researcher. In these cases, the authors recommend either reducing the required confidence interval narrowness, utilizing within-subject designs or introducing co-variates into the analysis. They also suggest using meta-analysis (with a priori power analysis), since multiple smaller studies can often provide much more power than a single larger study with the same total sample size. However, they caution that meta-analyses often suffer from publication bias (referred to as the “file drawer effect”) due to unpublished studies and selective reporting of results in the literature. They recommend using multi-site studies to counteract this issue, as they are less prone to the “file drawer effect”.

According to the authors, there is not enough emphasis in the literature on sample size planning techniques to improve statistical power and accuracy, since most published studies test multiple hypotheses with often weak significances. Those that report large effects often do so without the use of confidence intervals for precision or accuracy, which can distort the predicted power of such studies. The authors call this phenomenon as underpowered research, since most studies are not equipped to detect medium to small effects that still might be scientifically or practically relevant. This creates a false sense of research certainty in the literature, and can cause readers to misinterpret discrepancies between similar studies as contradictory when the discrepancies could actually be due to simple sample variance. The authors also caution that there is an unrealistic expectation in the psychological research that a single study is able to provide definitive answers on a particular issue or treatment approach. According to them, knowledge should be treated as cumulative, not definitive, and this can be accomplished through published studies over time that take into consideration effective sample size planning techniques and approaches. Knowledge is only power when those techniques are applied.

Reference: Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537-563.

Reading summary of “Relationships between Poverty and Psychopathology”

Costello et al. (2003) in their study “Relationships between Poverty and Psychopathology” examined the association between poverty and mental illness using a quasi-experimental longitudinal design to address an elongated debate among the clinicians and researchers on whether the high prevalence of mental illness among the poor people is a social causation or a social selection. Before going deeper into the study, let’s talk about these two catchy terms: social causation and social selection. Let’s suppose that McGill students are the smarter one among the Canadian universities. One can explain that the quality faculty and state of art facilities of McGill contribute to the students to become smarter. In contrast, someone can claim that students with higher aptitude and intelligence, who are already smarter, come to study in McGill, and this how McGill gets smarter students among Canadian universities. The first explanation about the smartness of McGill students refers to social causation, and the later one refers to social selection.

Costello et al. (2003) examined whether the high prevalence of mental illness among the poor people is a social causation or social selection. In their study, they selected a representative sample of 1420 rural children aged from 9-13 years, and assessed them for 8 years (1993-2000). Among the sample participants, one quarter was from American Indian, and the rest were predominantly white. They used two main measures: psychometric symptoms consisting of emotional psychometric symptoms and behavioral psychometric symptoms, and level of poverty measured in terms of income threshold. In the fourth year, they used an intervention; a casino opening in the Indian reservation that provided income supplement to American Indian families. As effect of the intervention, 14% families came out of poverty, while 53% remained poor, and 32% were never poor. Now, they categorized these families into three groups: persistently poor who remained poor all the way before and after the intervention, ex-poor who were poor before the intervention and came out of poverty after the intervention, and the third one was never poor. They compared the psychometric symptoms scores within the group and across the groups.

Results showed that before the casino opening, the mean psychometric symptoms scores of children of persistently poor and ex-poor families were almost similar (M = 4.38 and M= 4.28 respectively) and were greater than that of never poor families (M = 2.75). The odds ratio of persistently poor families against ex-poor and never poor families were OR = 1.02 and OR = 1.59 respectively, and the odds ratio of ex-poor families against never poor families was OR =1.55. It is evident that before the casino opening, the prevalence of mental illness among the children of both persistently poor and ex-poor families was higher, and they were more likely to have mental disorders.

After the casino opening, the mean psychometric scores of the children of persistently poor families increase a little bit (M = 4.71), the mean scores of the children of ex-poor families decreased to a great extent (M = 2.90), and the mean score of the children of never poor families remained almost the same (M = 2.78) as before. The odds ratio of the children of persistently poor families, ex-poor families and never poor families before and after the casino opening were OR = 0.91, OR = 1.5, and OR = 1.00 respectively, which indicated that the likelihood of having mental disorders decreased significantly in case of the children of ex-poor families. The odd ratio of the children of persistently poor families against the children of ex-poor families after the intervention became OR = 1.69, which indicated that the likelihood of having mental disorders among the children of persistently poor families became 1.69 times higher than that of ex-poor families, where their odds ratio against never poor families became OR = 1.76, and the odds ratio of the children of ex-poor families against never poor families became OR = 1.04. Now, the likelihood of having mental disorders among the children of ex-poor and never poor families is almost similar.

The findings of this study established the social causation theory of poverty and mental illness against the paradigm of social selection. The mean psychometric symptoms scores decreased significantly among the children of ex-poor families when they came out of poverty. If the change wouldn’t happen, the theory of social selection could sustain. This study will help to change the attitude of many policy makers who feel comfortable to blame the poor people for their poverty and many other problems they face every day. It is also perceivable that as poverty affects mental illness, mental illness can lead poor people to further level of poverty, which might work similar to the theory of vicious cycle of poverty.


-Costello, E. J., Compton, S. N., Keeler. G. and Angold, A. 2003. Relationships between Poverty and Psychopathology: A natural experiment. Journal of American Medical Association. Vol. 290. No. 15. 

Singleton and Straits, 2005, Experimentation (Chapter 6)

Singleton and Straits’ (2005) chapter, “Experimentation” in the fourth edition of their book, Approaches to Social Research is an introduction to the use of experiments as a method of data collection in the social sciences.  The chapters helps readers to understand: 1) the basic logic behind experimentation, 2) stages of implementation and, 3) potential sources of error. The authors illustrate many of the ideas with reference to real-life experiments and examples.

One of the main reasons that social science researchers would engage with experimental design for gathering data is that it has long been viewed as “the optimal way to test causal hypotheses” (Singleton & Straits, 2005, pg. 155). Recalling that causal inference requires the three basic criteria of association, direction of influence (XàY), and the elimination of rival explanations (Singleton & Straits, 2005, pg. 156), experiments allow researchers to manipulate variables, control for rival explanations and observe potential associations more effectively than any other research design. Applying what Singleton and Straits (2005) refer to as the “basic requirements of a true experiment” (p. 159)*, researchers can produce results with generally high levels of internal validity and the promise of potentially high levels of external validity (best verified through experimental replication).

Singleton and Straits outline the main steps of carrying out an experiment** and discuss the importance of pretesting and creating a sense of realism despite the fact that experiments are generally performed in controlled environments. They then move on to discuss potential sources of bias/error that can emanate from both research subjects and the researchers themselves. The focus is on sources of error that are particular to experimental design such as a subjects’ willingness to “perform” and/or “look good” and therefore not react to a stimulus in a “natural” way. In two brief sections, the authors then discuss the use of experimentation in less-controlled, non-laboratory settings including field experiments (pgs. 178-181) and experimental survey research (pgs. 181-183).

While this chapter provides a useful and comprehensive introduction to experimental design in laboratory settings, this is not a setting where most social work-related research is likely to take place. One of the major disadvantages of this chapter is that by placing emphasis on laboratory experiments, that one would be more likely to find in psychology/behavioural sciences, researchers in disciplines where this type of research is less likely to be appropriate or practical, are offered significantly less guidance. Since experimental design in social work is more likely to be found when testing the impact of a psycho-social intervention seeking the cause of a particular social problem, it would have been useful for the authors to have drawn on examples of experiments that more closely align with these types of research endeavors. Examples of research studies with dependent variables relating more to human welfare than to human behaviour, would have been very useful. A discussion of more longitudinal experimental research design would also have been of value. While one can draw some parallels between lessons described in this chapter, and some experimental research in social work, discussion of research design within a natural experimental setting, such as the one reported on by Costello (2003), is nowhere to be found.  These content gaps, thus make it difficult to imagine that this chapter would be excessively informative for social work experimental research design. It is however quite useful as an introduction to key concepts in laboratory experimental design, which does help social work researchers to better understand the results of these types of studies while possibly informing our own designs.

*According to the authors, these criteria include: random assignment, manipulation of the independent variable, measurement of the dependent variable, at least one comparison or control group and constancy of conditions across groups (p. 159).

** The major steps include: “subject recruitment and acquisition of informed consent…introduction to the experiment…the experimental manipulation…manipulation checks…measurement of the dependent variable…debriefing” (pgs. 166-170).

Environmental Predictors of Deforestation: A summary

How can fateful difference be explained between Easter Island and Mangareva, two Pacific island societies who differ in the extent of their own contribution to deforestation? Rolett and Diamond (2004) examine 9 environmental variables at 81 sites on 69 islands to determine, as the title indicates, Environmental predictors of pre-European deforestation on Pacific Islands.

 Beginning with a story about the Pacific Islands in Asia, the authors provide a historical account that occurred between 1200BC and 1200 AD, when island settlers cut down trees for purposes of agriculture, firewood, and survival. However, what later early European settlers were puzzled about was the variety of the extent of deforestation as well as composition of new trees (Replacement) that had occurred between islands. So, the authors conducted a study to determine the extent that 9 IV Predictor variables had on 2 outcome variables, deforestation and replacement, that accounted for the variations between these 81 islands.

4 statistical analyses were used in this analysis:

  1. Spearman bivariate correlations, followed by bivariate regression coefficients between the two outcomes: Deforestation and Replacement
  2. Multivariate regressions to determine correlations on the 9 IV predictors.
  3.  Multivariate tree models to examine conditionally shared impacts of multiple IV variables on deforestation and replacement.
  4. Examining residuals, and looking for large discrepancies between the model and the data, and then determining other unknown variables (variables not included in the analyses) could have accounted for deforestation and replacement.

As you’d expect, The Photosynthesis Model learned in grade school partially predicts the authors’ analysis: Read on for an entertaining editorial by yours truly.

1) High Rain Fall->Replacement; and, Low Rainfall->Deforestation.

Why? Plants need water; also, with little water, plants make excellent dry firewood.

2) High Latitude (Cold Temperatures)->Deforestation (little plant growth); and High Latitude->Low Reforestation:

Why? Gotta keep them tropical plants warm and toasty; chestnuts cannot thrive in cold weather. It’s possible for only so many chestnuts roasting on an open fire high up in the mountains; probably why PA sells them at a gold brick/pound price.

3a) Makatea (raised sharp coral)->Retained forests

(b)Non-Makatea terrain->Deforestation

(c) Makatea->Low Replacement

(d) Makatea->Low Deforestation.

Why? No one wants to sleep or walk on sharp objects, so little reason to cut down or plant trees in those areas.

4) Old Island Age ->Deforestation

Old island (inconsistent) Replacement.

Why? Soil composition is a complicated mess. Eg: islands west of aerial tephras àlower deforestation

5) Tephars->lower replacement.

Why? Nutrient-dense soil stands the test of time; dust blows best, which helps to nutrify soil.

6) Dust fall off->low deforestation.

Dust fall off->low replacement.

Why? See #4. Dust adds to soil nutrients, which keeps trees thriving; it’s difficult to cut down strong, tall trees.

7) High Elevation->little Deforestation or Replacement.

Why? Good rain supply, high elevation rain catches nutrient-filled dust; high elevations are difficult to cross, so little motivation to cut down trees located here.

8) Large surface area->low deforestation and low replacement.

Why? Larger diversity, tricky terrain, and low chance of coastal water nearby supporting and inviting harmful humans. There’s a reason why FernGully was named The Last Rainforest.

Why did the characters in “Lost” ever leave the beach front, I will never know!

9) Distance from other islands->Both High Deforestation and Replacement.

Why? Ever live in a small town? There is little to do there. Might as well cut down a lot of trees and start replanting, especially if “Lost” is now off the air now.


So, why did Easter Island experience deforestation? 5) Lowest tephra; 6) lowest Dust Fall off; 9) second most isolated island; 3) No Makatea; 7) Low elevation. 8) Small surface area; and 1) little rainfall.

Essentially, unfavourable environmental conditions, rather than a surprisingly experienced group of tree cutters lead to Easter Island’s deforestation and little replacement. However, the authors cite unexplained variables as social pressures in Easter Island, where it is important to make way for transporting stone statues, as other contributing, and unexplained variables contributing to the island’s downfall.

To keep your trees looking their best, water them daily, keep them close the equator, elevated, in direct sunlight is best, planted around deathly spikes topped over nutrient-dense, dust-fallen soil. Keep them away from large bodies of water, which attract machete-wielding robbers, but not distanced away from civilization. If need be, supplement with more interesting activities than tree cutting, like watching re-runs of “Lost” or “The Gilmore Girls”. Better yet, throw in a television and some rabbit ears, and tune in to Hockey Night in Canada instead of cutting down those tall, habitat-supporting, and air-circulating trees.

The Moderator-Mediator Variable Distinction: A Summary

In The Moderator-Mediator Variable article by Baron and Kenny, 1986, the authors distinguish between these two terms, Moderators and mediators, in a way that helps researchers understanding the ways people behave. First, at the conceptual stage, the authors define the Moderator as: “A third variable, which partitions a focal independent variable into subgroups that establish its domains of maximal effectiveness in regard to a given dependent variable; and (b) The Mediator, “which represents the generative mechanism through which the focal independent variable is able to influence the dependent variable of interest” (p. 1173). If you are confused, fear not; we will unpack these onwards. Second, at the strategic stage, the authors go on to state examples where previous research has mistakenly used these terms interchangeably, noting social psychological studies on social loafing and locus of control on academic achievement. Third, at the statistical level, the authors’ purpose of this article is to focus on differences between the two terms in relation to their use in experimental research designs using correlations as well as Analyses of Variance ANOVAs). So, let’s go!

1) Let’s begin. A moderator is “a qualitative (sex, race, class) or quantitative (level of reward) variable that affects the direction and/or strength of the relation between an IV/ Predictor variable and a DV/Criterion variable (the variable being predicted). In a correlation, a Moderator is a third variable that affects the correlation between two variables. Here, the moderator can affect the correlation in either way: strengthen or weaken the correlation. In an ANOVA, the moderator can affect the cross- over interaction of the IV and the various factors that dictate its functions, whether increasing or reducing its occurrence. The authors provide a good example, stating “Glass and Singer’s (1972) finding of an interaction of the factors stressor intensity (noise level) and controllability (periodic-aperiodic noise), … an adverse impact on task performance occurred only when the onset of the noise was aperiodic or unsignaled” (p. 1174). It’s also important that the moderator variable be uncorrelated with the IV (predictor(s) and the DV (criterion). UNLIKE the Mediator, the moderator variables are exogenous to criterion effects (Moderators are always IVs), though mediators can shift their colours like a chameleon and play the parts of Causes and Effects.

2) So, we know that “the causal relation between two variables changes as a function of the moderator variable (p. 1174). So, we must determine the extent of the IV on the DV as a function of a third variable (say it with me, The Moderating Variable). Because correlational analyses are affected by changes in variances, regression models are preferred.

3) Since we know that the moderator variables are exogenous to criterion effects, bias occurs when there exists variability error in the IV across levels of the moderator. The paper provides three examples of how the moderator affects the relation between the IV and the criterion (DV) 1: Linear change between the effect of the IV on the DV as the moderator changes; 2: quadratically (still scratching my head on this example; and 3: a step-function method (more head scratching).  Y= X(DV)X(moderator), where X and Z are controlled for to ensure exogeneity. As the authors note, have a read through at Cohen and Cohen (1983) for more clarification.

Let’s continue on with the Mediator: (1) A variable is a mediator when it accounts for the relation between the predictor (IV) and the (DV) criterion, or, when certain effects will hold, and they provide clues about “how” or “why” effects occur.

(2) A mediator has the following conditions: (a) variations in levels of the IV significantly account for variations in the presumed mediator; (b) variations in the mediator significantly account for variations in the DV; finally, (c) when relations (a) and (b) are controlled for, a relation between IV and DV that was significant is now zero. If it is not zero, then multiple mediators exist. In social psychological research, reducing that previously significant path to near zero is sufficient.

(3) The authors note that conducting an ANOVA to determine mediators is not ideal; rather, it’s best to first read Fiske, Kenny, and Taylor (1982); secondly, perform three regression equations: (1) regressing the mediator on the IV; (2), regressing the DV on the IV; third, regressing the DV on both the IV and on the mediator.   In case (1) the IV must affect the mediator; In case (2), the IV must affect the DV; and in the (3), the mediator must affect the DV.  Perfect mediation holds when the IV variable has no effect when the mediator is controlled for.

Conceptual Distinctions between Moderators and Mediators

Moderators are introduced when there is a weak correlation between a predictor and a criterion variable. This occurs when one desires to replicate findings in a new setting or with a new sample of participants. Mediators are introduced, on the other hand, when there is a strong relation between the predictor and the criterion, and presumably, you desire to determine the “why” or “how” of the problem in question. What I’m surprised at, however, is that the authors note: “when mediation is at issue, we need to increase both the quality and quantity of the data” (p. 1179). One would presume, however, this would also be a need when a weak correlation exists and multiple moderators are needed to explain low correlation in new settings or in data with new participants. It is also possible to combine mediation with moderation, though the confused student would be wise to page 1180 and James and Brett, 1984, for a more thorough overview of this process.

Pacanowsky, 1978, Please Pass the Salt

Pacanaowsky’s (1978) article “Please Pass the Salt” published in The Washington Post is a witty and even somewhat sarcastic piece that, for reasons of personal affinity to this style of writing, I quite enjoyed (causal inference between style of writing and degree of enjoyment?! – you betcha!). The article is a satirical look at the evolution of Pacanowsky-invented research by Pacanowsky-invented researchers on factors influencing the passage of salt (aka causality and the passage of salt). He begins his discussion with a presentation of salt passage in literary classics, implying that a general assumption that a request for salt causes the salt to be passed, goes far back in human history. The article then moves on to discuss the evolution of empirical research into salt passage and the variety of research questions that have been applied to developing human knowledge about the nature and causality of salt passage. His review includes over a dozen different (fake) research endeavors illustrating both the variety of questions that can be applied to what appears to be an exceedingly straightforward relationship, as well as the creativity of experimental design. Among my favourites were “Festinger” who found that if you paid people more money to pass the salt, they were more likely to want to participate in your research and “Zimbardo” (my hands-down favourite) who found that if stated the word “assault” instead of the word “salt” that you would be more likely to be the subject of requests for clarification.

Despite, or perhaps because of (causality again!) Pacanowsky’s humour, his article effectively communicates a few underlying messages about empirical research and causal inference. He manages to provide some excellent illustrations of concepts that include intervening variables (other people at the table to pass the salt), environmental factors (e.g. container that salt is in), sample bias (social science students relating their passage of salt to cognitive dissonance!) and the dangers of confusing correlation with causality (relating salt passage to ownership of audio-visual equipment). In fact each of the research studies he describes offers an illustration of some important concept/lesson for conducting quality empirical research.

Perhaps most salient in Pacanowsky’s satire is that he brings to life, Hume’s idea that we can only ever arrive at a causal inference because there is no logical or empirical way of fully proving causality (as cited by Rothwell, 2014). Pacanowsky’s article allows us to see how our understanding of causal relationships will never be fully realized. He helps the reader to think about the infinite multitudes of ways in which a question can be asked and research design established – each differing from others and each contributing something new or previously unconsidered. This relates to the problem of infinite regress which implies that we can never fully know because we can infinitely seek the cause of the cause of the cause of the cause, etc. and look at the same question in infinite numbers of ways – never arriving at a definitive conclusion.

We may never fully know what influences the passage of salt, why it is passed, what conditions speed up or slow down its passage, or how to best state one’s request that salt be passed. Before reading this article, I don’t suppose I ever would have considered the factors at play when I seek to have the salt passed to me. Perhaps this is the brilliance of this article, not that it makes you laugh or that it relates to Hume, but that it gets you thinking critically about causality in relation to the most mundane yet assumption-laden causal relationships.


Pacanowsky, M. (1978, April 9). ‘Please Pass the Salt’: Examing the Motivational Variables, Idiosyncratic Dynamics and Historic Precedents Associated with the Utterance. The Washington Post, p. C1.

Rothwell, D. (2014, May 22). Course lecture. Social Work 724. Lecture conducted from McGill University, Montreal, QC.

« Older Entries | Newer Entries »
Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.