Multilevel statistical model: What does it mean?

Is individual independent of context, the place or the time he lives in? Social or behavioral scientists would say, obviously NOT. Individual behavior tends to be shaped by his/her personal traits as well as by the factors of environment s/he lives in. Most of the data we collect in social and behavioral science research are hierarchical or clustered (Goldstein, 1999) in geographical areas or in periods of time. Individual characteristics vary within context as well as across contexts. There is a notion that individual behavior has to be examined in the context in which it takes place (Jones and Moon 1987). Hence, social and behavioral science researchers are in a dilemma. If they perform individual analysis as we do most of the time, they will miss the context, and if they undertake an aggregate analysis, they will miss individual level variation, and will approach to potential ‘ecological fallacy’, the invalid transfer of aggregate results to individuals (Subramanian and Jones, 2012). Hence, we are in need of a tool which will consider individual variation, contextual variation and the variation in individual-contextual interaction at the same time. Multilevel modeling (MLM) is such an approach which is able to conduct analysis simultaneously at the individual as well as contextual level. Recently, I participated in a training session on multilevel modeling organized by Quebec Inter-university Center for Social Statistics (QICSS). The session was conducted by Dr. Subramanian, a Harvard professor of Public Health and Geography. He has been teaching and working on MLM for more than twenty years. He is amazing in explaining MLM lucidly, and making it apprehensible to the beginners like me. I have developed a basic understanding of the concept and application of MLM, and I would like to share some of the basics of MLM.

Multilevel refers to the level of analysis where data tend to be nested in different structures. The tendency of data of being nested in structures is not ignorable. Let us suppose, we are interested to conduct a study on the performance of 8th grade students in the city of Montreal, and we want to find the association between students’ school performance and parents’ level of education. We, generally, follow multi-stage stratified sampling technique to select desired sample. Let’s say, there are 20 school districts in Montreal. We want to select five schools from each school district and 20 students from each school following simple random sampling method. Now, we get a three level data structure where students are level-1 units, schools fall in level-2, and the school districts are level-3 units. What sort of impression does this dataset give us? We might find that students’ performance varies on parents’ level of education. Isn’t it logical to think that the performance of the students also varies due to the variation in quality of schools, and the quality of schools also varies across the school districts? In other words, the students of a particular school might tend to be alike in performance and vary from other schools, and the schools in a particular district might be alike, and vary in quality from the schools in other districts. In an OLS regression model, we would regress students’ performance on parents’ level of education, and might find a pattern and a large volume of residuals or standard errors. We would dump the residuals considering wastage or noises as we don’t have anything to do with this garbage. Isn’t this volume of residuals what we term as garbage, produced due to our lack of mechanism to consider the variation of students’ performance for the variation in quality of schools, and for the variations across districts? Multilevel modeling provides us the tool to consider all these variations redistributing them across levels.

Let us assume that 8th grade students’ performance is measured in a single test across schools, which measures score ranging from 0-100, and parents’ level of education is measured in year of schooling. In a single level analysis or in an OLS regression we can use the following model to predict students’ performance:


In this model,β0, the intercept, gives us the average score of a student , and β1 , the slope, gives us the average change in score for an unit change in parents’ year of schooling. The intercept and the slope represent the fixed part of the regression model and provide us estimates of the average score and year of schooling relationship. The residuals, ε0  represent the random part of the model, and provide us the individual differences from the fixed regression line, the mean of which is considered 0 and the variance σ^ε0. If we extend this analysis into a multilevel with a two- level model, where students are level-1 units and schools are level-2 units, we will use the following model:


This is a combined model derived by substituting the macro model into micro model where both the intercept and the slop are variable across schools.

The micro model frames as:


and the macro model frames as


The fixed part of this model is β0+β1 , and the random part is eq6  .At level-2, we now, get two additional residuals, u0j and u1j in the random part of the model, which explain the variability of both the intercept,β0, and the slope, β1, respectively. This feature of analyzing variance in different levels makes MLM distinguishable from the standard linear regression models. The means of these new residuals are again 0, and the variances are  σ^u0 and σ^u1 the covariance is σu0u1  . We also have the level-1 residuals,, with a mean 0, and variance, σ^ε0  .

In this model, ‘i’ denotes the number of level-1 units, the 8th grade students, and ‘j’ denotes the number of level-2 units, the schools. Now, how do all these new parameters work in this model? Here, , the intercept β0, gives us the average score of a student, and u0j gives us the differentials of  across schools. The slope, β1 , gives us the average change in score for an unit change in year of schooling, and u1j gives us the differentials of  across schools. The level-1 residuals, ε0, provide us the individual differences of students.

MLM provides us the scope to partition variation according to the different levels, and provide us a new statistic known as intra-class correlation, or intra-unit correlation, or variance partitioning coefficient denoted by Greek letter rho,ρ, which takes values between 0 and 1. If ρ approaches 1, it implies an ecologic model indicating that the students within a school are highly similar to each other in terms of their score. If  ρ approaches to 0, it implies independence between students within a school indicating that the source of variation in the scores is mainly at the student level. Now, we get an impression whether we should go for a multilevel modeling or remain with standard linear regression models.

I am looking forward to using MLM to analyze Program for International Student Assessment (PISA) dataset of 15-year-old students’ performance in financial literacy in the 18 countries and economies in 2012. This secondary dataset is available to make comparative analysis of financial literacy globally. It is, now, time to make our observational data more meaningful.


Goldstein, H. (1999). Multilevel statistical models. Institute of Education, Multilevel models project. London

Jones, K., and Moon, G. (1987). Health, disease and society. London. Routledge.

Subramanian, S. V., and Jones, K. (2012). Multilevel statistical models: concept and applications. Harvard University. Massachusetts

Leave a Reply

Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.