Summary of Remler & Van Ryzin (2011) Ch. 6: Secondary Data

Secondary data is the most commonly used form of quantitative data for the purposes of statistical analysis in the fields of social and policy research, with the majority of published quantitative studies focusing on the analysis of this type of data. This makes sense, since it is data that already exists, is much less time consuming and costly than collecting original primary data, and is also widely available through published data tables from various national (and sometimes international) surveys and databases. It is important to note that most published datasets are often aggregated and thus not as detailed as the original datasets. However, according to Remler and Van Ryzin (2011) the aggregated (and abbreviated) data are much more manageable and are still useful to address certain research questions, such as trends over time (assuming that the dataset is longitudinal in nature). An example would be examining classroom disciplinary climate trends at the classroom or school level over the course of several school years. The authors caution that it is very important for researchers to familiarize themselves with the dataset prior to analysis by reading any notes and codebooks carefully, and to be aware of any updates made to the dataset after obtaining it.

Secondary data can also include administrative data, which is often collected by government agencies, non-governmental organizations and private firms for the purposes of planning, managing and monitoring programs and service delivery performance. However, these data are often not readily available for public use due to substantial privacy issues, and have to be de-identified by the agency prior to releasing the data to the researcher. The data also have to be cleaned, coded and reformatted by the researcher once obtained since the databases in which they are stored (Management Information Systems) are not formatted for statistical analyses purposes but more so for record keeping and case management purposes. Some administrative data, even to this day, are still recorded in paper file format (e.g., court proceedings); this can pose even more of a challenge for the researcher since the data have to also be computed into a quantitative format and are very difficult to adapt for statistical analysis. However, despite these potential barriers, Remler & Van Ryzin emphasize that the analysis of administrative data is crucial for researchers to inform policy decisions and reform as well as for practitioners to monitor certain client trends and outcomes pertaining to the interventions they deliver.

The authors make a distinction between survey and data collection tool design and research design, as the latter has to do with what you actually do (analytically) with the data rather than the former; this is often a common mistake that researchers make when outlining their research design. They also outline the various forms of data that can be obtained through secondary datasets, which vary depending on their level of aggregation (individual/micro or group level/macro) and time dimension (snapshot or longitudinal). Aggregate (macro level) data can contain information about groups such as households, schools, classrooms or geographical areas such as neighbourhoods, regions, provinces and even countries. Time dimensions of data can include snapshot data of one point in time (aka cross-section data), or longitudinal data which gathers data over defined periods of time. Longitudinal data can be collected for various purposes, including pre- and post-treatment comparisons (paired-sample data), repeating measures on the same people over time (panel data for a repeated measures study), examining longitudinal outcomes that happen only after a certain amount of time has passed (panel data for a cohort study), or repeating the same measures on new cohorts (pooled cross sectional data). Table 6.1 outlines the various types of quantitative data in a easily understandable format.

The authors also mention that linking various types of secondary data can be useful to provide a broader and more unique picture of the issue or phenomenon being observed. For instance, linking Geographic Information System (GIS) data with Socio-economic Status (SES) data can allow researchers to create SES maps by province, region and even neighbourhoods. This can be helpful when trying to analyse certain region-specific social issues. Linking quantitative and qualitative data can also be useful; for instance, collecting additional focus group information on the reasons underlying high school drop out rates could be pertinent in explaining the phenomenon more clearly.

However, there are limitations in using secondary data for the purposes of statistical analysis, and these should be kept in mind when deciding your research design. For instance, secondary data availability can distort the social work research field, since researchers often have to settle with what they can get or have access to as opposed to going directly in the field and obtaining the data that you need. Public secondary data can also be outdated if not collected longitudinally, can lack certain vital information and variables and often times cannot be narrowed down to smaller units of analysis such as cities, neighbourhoods or individuals. Often times privacy issues prevents the release of pertinent data such as postal code, which could assist with creating SES indicators.

It is important to note that Remler and Van Ryzin’s (2011) book and chapters contained therein are based on American examples; thus we should be familiarizing ourselves with public secondary databases and datasets available in Canada such as census data  and the National Longitudinal Survey on Children and Youth (NLSCY) via Statistics Canada,  or those made available to university students via Research Data Centres (RDC) and various university research centres such as the Canadian Incidence Study of Reported Child Maltreatment (CIS) through the Centre for Research on Children and Families (CRCF) at McGill University.

Here are some examples of useful links to available public secondary databases and administrative data:







Leave a Reply

Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.