Summer Camp for Social Scientists! Software Edition

RcodeThe subject of statistical software fluency is an oft discussed topic round the brown bag table. The dominance of SPSS in social work has been discussed many times, sometimes controversially. Here at ICPSR I’ve heard a few things about software, my favourite being that number eight of the fabled ten commandments of the summer program was something to the effect of ‘don’t be a software snob.’ Also amusing is that this commandment has been wildly violated by many here this summer.

It’s true all stats programs have their strengths and weaknesses. However, it must be said that some programs have shorter ranges and shallower depths than others. For example, SPSS and Stata often fall short in terms of graphical analysis and presentation. Each program has its strengths as well. In my scaling class this week, it was mentioned that the only reliable routine for unidimensional unfolding analysis is in SPSS. This was the first time SPSS was cited as having superior capabilities than other programs. From what I understand, only one of the courses here utilizes SAS, and another utilizes the HLM program, while the rest rely primarily on Stata and R.

In my work here, I have mainly been focusing on becoming fluent in R, although I have learned a bit of JMP and expanded my Stata skills. The R statistical computing environment is pretty incredible. Its breadth and depth are currently unparalleled. Many statistical techniques are simply not available in any other program. Even though it has a reputation for being difficult to use, I find the logic of the R language to be extremely straightforward. It does take longer to learn than other stats syntax for the non-programmer. But the way R forces an understanding of the moving parts of model estimation actually encourages greater understanding of the underlying math and logic to the statistics that far outweighs any labour costs. R has its more direct downsides of course, like any program. Tonight I ran into a parametric distribution that R did not support and I had to switch back to Stata. Also, I imagine that coding one’s work only in R poses challenges in terms of collaboration, particularly for fields that are dominated by other statistical programs.

This afternoon, I caught one of my professors on the way out of class, and he stressed that one should ‘become bilingual.’ Indeed, learning programming languages is just like learning regular languages: more can very rarely be a bad thing.

Summer Camp for Social Scientists! LaTeX Edition

ICPSR summer program bannerI’m spending the next eight weeks in Ann Arbor, which, aside from its quaint and shady tree-filled streets, is well known as the home of the Michigan Wolverines. What’s perhaps less known  (outside academic circles) is that the University of Michigan is home to one of the most advanced social science research institutes in the world, The Institute for Social Research (ISR). The ISR consists of five separate and independent research centres, one of which is the Inter-university Consortium for Political and Social Research (ICPSR), of which McGill University is a member. The ICPSR manages and curates secondary datasets as well as conducts its own research on how to best curate and manage data. The archive at ICPSR is not only impressively large, but unique and easily accessible to both novice researchers and advanced scholars. Codebooks and data files are all indexed and linked to each other. Previous articles written on the respective dataset are also indexed and linked. The whole layout of the site makes working with secondary data less head-numbingly perplexing. So, while I was browsing their site back in the fall, I happened upon the ICPSR Summer Program in Methods of Quantitative Research. The courses at the summer program are diverse, and aimed at scholars of all levels. For example, they offer courses on basic computing to advanced Bayesian methods. Stumbling upon this social science training program made me feel as if I had found a secret garden. Today at orientation my feelings were confirmed when the director of the program, Professor Bill Jacoby welcomed us to not only the world’s foremost quantitative methods training program but also a ‘summer camp for social scientists’. In short, I feel absolutely privileged to be here*, and intend on sharing some of the resources and experiences I gather here on the blog.

Tonight, Professor Dave Armstrong gave us a quick tour of LaTeX. I’ve used LaTeX before, admittedly though “used” is a liberal term here. However, the lecture was accessible and packed with coding tips and tricks to make LaTeX more attractive to the less than savvy.  LaTeX is a markup language, similar to HTML. It is not a WYSIWYG (what you see is what you get) word processor like Word, OpenOffice, or Pages. So if you’re afraid of programming, then LaTeX is not for you. But if you don’t mind a bit of front-end effort, LaTeX pays off (see this great summary and guide here). First, it looks good. I mean really good. Every econometrics paper you’ve ever read since 2000 has been typeset with it. Which brings me to my second point, it’ll get you lots and lots of nerd-cred by using it. But seriously, LaTeX does offer some compelling and concrete benefits over traditional WYSIWYG programs such as:

  • It’s free & open-source and there’s tons of documentation online regarding troubleshooting
  • You can add on programs such as Sweave that allow you to conduct your stats analysis in R within the a LaTeX text editor program
  • It has easy separation of the content from the format of the document (comes in handy when writing a long document with sections and sub-sections)
  • It has wonderful presentation of mathematical functions and equations (and is miles easier to incorporate into your document than doing so in Word)
  • It beautifully typesets documents (it automatically adjusts the words in ways the WYSIWYG programs do not)
  • It has easy integration of figures and graphs (again, very handy for long documents)

For me, as someone who enjoys light programming, the main barriers to using LaTeX have been the lack of easy integration with zotero and my inability to figure out how to make it format documents respecting the APA 6th Edition Publication Manual. However, these barriers were at least half-way solved this evening. Zotero exports to BibTex, which is the bibliographic program compatible with LaTeX. Although exporting and creating a BibTex file makes for another step in the bibliographic formatting process for zotero users, it’s not an insurmountable step. I also found, although have not tried, this APA 6th edition template and guide for LaTeX. After having attended the lecture this evening, many things that I struggled with when I “used” LaTeX before were clarified. As is the case with learning any new program or programming language, there are many steps and stumbles along the way. I’m sure there’s a lot more to learn about LaTeX than can be explained in one lecture. But tonight I look forward to starting my journey by installing and playing around with TeXStudio.


*I want to take a second to gratefully acknowledge the generous support of my supervisor David Rothwell and the CRCF Travel Fund, without whose support I would not be able to take full advantage of this opportunity.

Oops – another data entry mistake

Yes, here is another data entry mistake. This time with a team of researchers led by McGill’s Anne Crocker who were examining not criminally responsible cases and histories of being found not criminally responsible. Story has important implications for Conservative government’s Bill C-54.

A good case example for teaching relationship between policy research.

the original report said 38.1 per cent of sex offenders found not criminally responsible and accused of a sex offence had at least one prior NCR finding; that number was changed in the March report to 9.5 per cent.

Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.