Fieldwork Lab, 4/15 — Will Johnston

This week, Will Johnston will present a talk titled: “Verb serialization as event-building: Evidence from Hmong”. (This is a 20-minute practice talk for MOTH; abstract follows.) Fieldwork Lab meets on Thursdays, though due to the unusual class schedule, Fieldwork Lab will exceptionally begin at 4:15 this week.

Abstract:  I examine two common and highly productive types of serial verb construction in Hmong (Hmong-Mien). These are the so-called ‘Attainment’ SVCs, which express telicity, and ‘Cause-Effect’ SVCs, which express direct causation. I argue that both are reflexes of the same underlying system: both are formed by merging multiple verbal roots within the event-building portion of the verbal projection. I then discuss the extent to which this treatment might apply to other types of SVCs in Hmong.

Fieldwork Lab, 4/8 — Hermann Keupdjio

This Thursday, during Fieldwork Lab, Hermann Keupdjio will talk to us about doing a virtual fieldtrip. Contact Carol-Rose Little if you would like to join.

Doing a virtual “fieldtrip”:

Collecting data from understudied languages is a vital enterprise that enriches our knowledge of the nature of human language. Accomplishing this with in person visits is invaluable, however, in addition to the current pandemic situation, there is an urgent need for more data, and a limited number of linguists with the training and resources to conduct field work. In this situation, online experiments provide a powerful supplementary tool for linguists and fieldworkers studying underdocumented languages. Specifically, rather than supplanting fieldwork, online experiments can allow for an expansion of field work with pre-visit pilots and follow-up experiments. More importantly, they are a helpful tool in creating and enhancing global collaborations and capacity building between field linguists, members of understudied language communities, and linguists without field training.

MCQLL Meeting, 4/8 — Michaela Socolof

This week’s MCQLL meeting on Thursday, April 8 at 1:30-2:30pm, will feature a talk from Michaela Socolof, a third year PhD student in the Linguistics department at McGill.

Abstract: I will be presenting an overview of issues relating to the syntax of relative clause constructions across languages. The purpose of this talk is to explore possibilities for computational projects in this area.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

MCQLL Meeting, 4/1 — Maya Watt

This week’s MCQLL meeting Thursday, April 1, 1:30-2:30pm, will feature a talk from Maya Watt. Bio and talk abstract are below.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Bio: Maya Watt is a U3 undergraduate student in Honours Linguistics with a minor in Computer Science.

Abstract: Theories of inflectional morphology differ in terms of how they treat semi-productive inflection types, that is, inflections  that apply to multiple words but are not completely productive (e.g. grow-grew, know-knew, but not clow-clowed). How such semi-regular classes generalize may help distinguish theories, but little work has explored this question due to the difficulty of finding overgeneralized uses of these inflectional classes  in naturalistic corpora. We address this issue by conducting a prompted lexical decision study on English past tenses. Participants were shown a regular or irregular verb in the infinitive form (to snow, to grow) and then presented with either a correct inflection (snowed, grew) or an overgeneralization (snew, growed) and asked to indicate whether it is the correct past tense form. We compare how various overgeneralized  types (snow-snew, sneeze-snoze) differ in terms of reaction times and accuracy rates finding differences between classes which may inform future theoretical comparisons.

Fieldwork Lab Meeting, 4/1 — Eszter Ótott-Kovács

This week in Fieldwork Lab Eszter Ótott-Kovács, PhD candidate at Cornell University, will be presenting her work “Genitive-Nominative Case Alternation in the Nominal Domain in Kazakh”. Fieldwork lab meets Thursday at 4pm. Contact Carol Rose Little if you would like to attend.


It is well-known that Turkic languages have Differential Object Marking, where the specific (presuppositional) direct object is marked with the accusative, while the non-specific object is unmarked for case/nominative (Enç 1991, Diesing 1992, Kelepir 2001). Relying on (mostly) Turkish data, it has been assumed that specificity drives the genitive-nominative case “alternation” in a similar manner to DOM (Kornfilt 2009, a.o.).

The talk explores the genitive-nominative “alternation” in Kazakh (Turkic), found (1) on the possessor in possessive constructions, on the subjects of (2) nominalized argument clauses and (3) relative clauses, based on novel data elicited by the author. I show that, in contrast to DOM, genitive-nominative alternation is not solely driven by specificity in this language. The genitive-nominative alternation on the possessor and the relative clause subject follows the pattern described for Turkish in terms of specificity. However, the genitive-nominative alternation on the argument clause subject is determined by the anaphoricity of the subject DP: genitive is marked on anaphoric DP subjects, nominative is used otherwise (in the case of unique definite or indefinite subjects).

MCQLL Meeting, 3/25 — Emily Goodwin

This week’s MCQLL meeting Thursday, March 25, 1:30-2:30pm, will feature a talk from Emily Goodwin. Talk abstract is below.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Abstract: Recent attention in neural natural language understanding models has focused on generalization that is compositional (the meanings of larger expressions are a function of the meanings of smaller expressions) and systematic (individual words mean the same thing when put in novel combinations). Datasets for compositional and systematic generalization often focus on testing classes of syntactic constructions (testing only on strings of a certain length or longer, or novel combinations of particular predicates). In contrast, the compositional freebase queries (CFQ) training and test sets are automatically sampled. To measure the compositional challenge of a test set relative to its training set, they measure the divergence between the distribution of syntactic compounds in test and train. Training and test splits with maximum compound divergence (MCD) are highly challenging for semantic parsers, but (unlike other datasets designed to test compositional generalization) the splits do not specifically hold-out human-recognizable classes of syntactic constructions from the training set.In this talk I will present preliminary results of a syntactic analyses of the MCD splits released in the CFQ dataset, and explore whether model failures on MCD splits can be explained in terms of phenomena familiar to syntactic theory.

MCQLL Lab Meeting, 3/14 — Ben LeBrun

This week’s MCQLL meeting Thursday, March 18, 1:30-2:30pm, will feature a talk from Ben LeBrun. Talk abstract is below.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Abstract: The use of pre-trained Transformer language models (TLMs) has led to significant advances in the field of natural language processing. This success has typically been measured by quantifying model performance on down-stream tasks, or through their ability to predict words in large samples of text. However, these benchmarks are biased in favour of frequent natural language constructions, measuring performance on common, recurring patterns in the data. The behaviour of TLMs on the large set of complex and infrequent linguistic constructions is in comparison understudied. In this talk, I will present preliminary results exploring GPT2’s ability to reproduce this long-tail of syntactic constructions, and how this ability is modulated by fine-tuning.

MCQLL Lab Meeting, 3/11 — Eva Portelance

This week’s MCQLL meeting (Thursday, March 11th, 1:30-2:30pm), will feature a talk from Eva Portelance. Abstract and bio are below.

If you would like to join the meeting but have not yet registered for this semester's MCQLL meetings, please send an email to mcqllmeetings@gmail.com.

Bio: Eva is currently a Ph.D. candidate at Stanford University in Linguistics, working with Mike Frank and Dan Jurafsky. She completed a B.A. Honours in Linguistics and Computer Science at McGill University in 2017. She is interested in linguistic structure and language learning both in humans and machines. This work was started during an internship at Microsoft Research Montreal.

Abstract: Learning Strategies for the Emergence of Language in Iterated Learning

In emergent communication studies, agents play communication games in order to develop a set of linguistic conventions referred to as the emergent language. Here, we compare the effects of a variety of learning functions and play phases on the efficiency and effectiveness of emergent language learning. We do so both within a single generation of agents and across generations in an iterated learning setting. We find that allowing agents to engage in forms of selfplay ultimately leads to more effective communication. In the iterated learning setting we compare different approaches to intergenerational learning. We find that selfplay used jointly with imitation can also lead to effective communication in this setting. Additionally, we find that encouraging agents to successfully communicate with previous generations rather than to successfully imitate them can lead to both effective language and efficient learning. Finally, we introduce a new dataset and a new agent architecture with split visual perception and representation modules in order to conduct our experiments.

Fieldwork Lab Meeting, 3/11 — Jaime Pérez González

This week during Fieldwork Lab, Jaime Pérez González, a PhD candidate in the Department of Linguistics at University of Texas at Austin, will present Grammatical Aspect in Mocho’ (Mayan). We meet at 4pm on Thursday. Contact Carol-Rose Little if you would like to join.


This talk addresses in detail the aspectual system in Mocho’, a highly endangered Mayan language. Its complexity has led to a different analysis by Kaufman (1967) and Palosaari (2011). The outcome of this research is an alternative analysis to those proposed in previous studies. I show that this language has a split aspectual system based on transitivity and partially on person. Mocho’ exhibits two sub-paradigms of aspect based on the type of verb that heads the clause. On the one hand, when the head of the predicate corresponds to an active transitive verb, or when the head of the predicate is an intransitive underived verb that indicates its subject with the pronominal markers from Set A, the language will display three aspectual distinctions that contrast with one another in their temporal interpretations. On the other hand, inverse verbs and any intransitivized verbs with a suffix -(v)vn that take Set C to indicate their subject will have a binary opposition. On top of this, the morphological ergative split alignment in Mocho’ leads to an aspectual marker distinction between Speech Act Participants (SAPs) and third person. Based on corpus and elicitation sessions, this complex aspectual system is untangled here. Previous proposals have not been tested with corpus data, which can serve as a test-bed for the linguistic analysis proposed as well as for the intuitions on which the proposal is based. Thus, I will show that grammatical aspect (viewpoint aspect) in Mocho’ cannot solely be understood by eliciting data, but rather, a look from a corpus can tell us more about the nature of the language.

Fieldwork Lab Meeting, 2/25 — Victoria Chen

This week during our fieldwork lab meeting, Victoria Chen (Assistant Professor in Syntax at Victoria University of Wellington, New Zealand) will present “When Austronesian-type voice meets Indo-European-type voice: Insights from Puyuma”. See attached abstract! Contact Carol-Rose if you would like to join the fieldwork lab. We meet from 4-5pm on Thursdays.

MCQLL Meeting, 2/25 — Richard Futrell

This week’s MCQLL meeting, taking place Thursday, Feb 25th, 1:30-2:30pm will feature a talk entitled “Information-theoretic models of natural language” by Professor Richard Futrell. Abstract and bio are below. If you would like to join the meeting and have not yet registered for this semester’s MCQLL meetings, please send an email to mcqllmeetings@gmail.com requesting the link.

Abstract: I claim that human languages can be modeled as information-theoretic codes, that is, systems that maximize information transfer under certain constraints. I argue that the relevant constraints for human language are those involving the cognitive resources used during language production and comprehension. Viewing human language in this way, it is possible to derive and test new quantitative predictions about the statistical, syntactic, and morphemic structure of human languages.

I start by reviewing some of the many ways that natural languages differ from optimal codes as studied in information theory. I argue that one distinguishing characteristic of human languages, as opposed to other natural and artificial codes, is a property I call “information locality”: information about particular aspects of meaning is localized in time within a linguistic utterance. I give evidence for information locality at multiple levels of linguistic structure, including the structure of words and the order of words in sentences.

Next, I state a theorem showing that information locality is a property of any communication system where the encoder and/or decoder are operating incrementally under memory constraints. The theorem yields a new, fully formal, and quantifiable definition of information locality, which leads to new predictions about word order and the structure of words across languages. I test these predictions in broad corpus studies of word order in over 50 languages, and in case studies of the order of morphemes within words in two languages.

Bio: Richard Futrell is an Assistant Professor of Language Science at the University of California, Irvine. His research applies information theory to better understand human language and how humans and machines can learn and process it.

Fieldwork Lab Meeting, 2/4 — Jorge Emilio Rosés Labrada and Erin Hashimoto

At this week’s Fieldwork Lab meeting, Jorge Emilio Rosés Labrada and Erin Hashimoto will give a presentation entitled “Using Legacy Text Collections for Student Training and Linguistic Research”. Details follow. Fieldwork Lab meets on Thursdays at 4:00pm. Contact Carol-Rose Little if you would like to attend.


Using Legacy Text Collections for Student Training and Linguistic Research


Jorge Emilio Rosés Labrada
Assistant Professor, Indigenous Languages Sustainability
University of Alberta

Erin Hashimoto
MA Student
University of Victoria


In language documentation, the “Boasian trilogy”—which has come to be seen as the gold standard— refers to a grammar, a dictionary and a text collection. While grammars and dictionaries have received substantial attention in the literature over the last 30 years, text collections remain understudied. Yet legacy texts—broadly understood here to include narratives, procedural texts, songs, etc. collected in the past—constitute invaluable sources of language and culture for many Indigenous communities. In this talk, we focus on the potential of legacy text collections in student training and linguistic research through a case study on the mobilization of such a collection for Makah (Wakashan, Washington State, USA). To conclude, we also briefly explore the potential benefits of such work for communities.

MCQLL Meeting, 1/28 — Dzmitry Bahdanau

This week’s MCQLL meeting, on Thursday, Jan 28th at 1:30-2:30pm, will feature a talk from Dzmitry (Dima) Bahdanau, a research scientist at Element AI, a research group at ServiceNow.

Speaker Bio: I am a research scientist at Element AI that has just been acquired by ServiceNow. I am also a Core Industry Member of Mila and Adjunct Professor at McGill University. The current goal of my research is to further the adoption of language user interfaces. To this end I am interested in semantic parsing and task-oriented dialogue methods, in particular their systematic (compositional) generalization and sample efficiency. My prior research interests include grounding language in vision and action, question answering, speech recognition, machine translation and structured prediction in general. I have recently completed my PhD at Mila working under supervision of Yoshua Bengio.

Abstract: I will talk about the task of translating natural language queries into Structured Query Language (SQL). I will first discuss the broad relevance and importance of this task. I will make connections between SQL and meaning representations that are more conventional in linguistics, namely lambda-calculus. I will talk about type-based heuristics for query completion and how they sometimes allow models to infer correct queries without much syntactic understanding. I will describe how state-of-the-art models work, focusing on the recent DuoRAT model produced by our group. Lastly, I will talk about the on-going few-shot cross-domain text2sql project that we are currently working on at Element AI.

If you would like to attend this talk but have not yet registered for this semester's MCQLL meetings, please send an email to mcqllmeetings@gmail.com so that we can get you the link.

Fieldwork Lab Meeting, 1/28 — James Crippen

The next Fieldwork Lab meeting will be January 28th, 2021, at 4:00pm. James Crippen will be giving a talk about morphosyntactic and semantic elicitation. Contact Carol-Rose Little if you would like to attend and do not have the zoom link.


I will talk informally about the art and craft of morphosyntactic and semantic elicitation based on my academic traditions and personal experience. I will address some background ideologies of elicitation and basic practices as well as common errors and pitfalls. I will then talk a bit about elicitation of highly context-dependent phenomena including aspect and information structure. I emphasise the needs for advanced preparation, clear and explicit communication about tasks with consultants, and the need for flexibility and improvisation.

MCQLL Meeting, 1/21 — Koustuv Sinha

This week’s MCQLL meeting on Thursday, Jan 21, 1:30-2:30pm will feature a talk from Koustuv Sinha. Koustuv is a third year PhD candidate at McGill University / Mila / Facebook AI Research, and is supervised by Joelle Pineau and Will Hamilton. His primary research interest lies in understanding systematic reasoning and generalization in discrete modalities, encompassing language understanding and graph-based reasoning.

If you are not already on the MCQLL mailing list and would like to attend this meeting and/or join the mailing list, please send an email to mcqllmeetings@gmail.com ASAP so we can make sure to get you the link to the meeting in time.

Fieldwork Lab Meeting, 1/14

The Fieldwork Lab will resume meetings this semester, with their first meeting at 4pm on Thursday January 14th. Please contact Carol-Rose Little if you would like to be added to the Fieldwork Lab slack channel for more information on meetings.

Fieldwork Lab Meeting, 12/3 — Aaron Broadwell

This Thursday (Dec 3) at 4pm during the Fieldwork Lab meeting, Aaron Broadwell (University of Florida) will be presenting “Making historical texts in indigenous languages accessible to communities: A Zapotec case study”. Please email Carol-Rose Little if you would like to attend and do not have access.

Abstract: Caseidyneën Saën is a set of open educational resources on Colonial Zapotec funded by an ACLS grant and created by a team including activists, educators, academics, and students. Here, we present this resource as a case study that contributes to larger conversations related to (1) communities working with historical corpora in their languages (e.g. Leonard 2011, Hinton 2011) and (2) the role digital scholarship can play in such projects (e.g. Czaykowska-Higgens et al. 2014).

Zapotec languages (Otomanguean) are indigenous to Oaxaca and are also spoken in diaspora communities, including in the greater Los Angeles area. Historical forms of Zapotec are attested in an expansive corpus written during the Mexican Colonial period. The online, digital resource Ticha (https://ticha.haverford.edu) makes these manuscripts accessible to the public by providing open access to high-resolution images, transcriptions, translations, linguistic analysis, and historical context. The continued development of Ticha is embedded in pedagogical practices and committed to co-creation with Zapotec individuals and pueblos. In Caseidyneën Saën, a collection of public-facing teaching materials, we use the resources available on Ticha to teach about Zapotec language, culture, and intellectual history.

The e-book Caseidyneën Saën was created by a team comprised of both Zapotec and non-Native collaborators, and the 18 co-authors of this multilingual (English, Spanish, and Zapotec), multimedia presentation represent the diversity of the team. In this talk I discuss how we went from traditional linguistic fieldwork to a deeper immersion into collaborative work with Zapotec communities to provide documents that are relevant to their linguistic history.

George Aaron Broadwell is Elling Eide Professor of Anthropology at University of Florida. See more details at https://people.clas.ufl.edu/broadwell/.

Fieldwork Lab Meeting, 11/26 — Dan Brodkin

This week in the Fieldwork Lab meeting, Dan Brodkin (UC Santa Cruz) will present work titled “Agent Focus in South Sulawesi”. Fieldwork Lab will meet at 4 on Thursday, November 26th. Contact Carol-Rose Little (carol.little@mcgill.ca), if you would like to join the meeting.

MCQLL Meeting, 11/11 — Bing’er Jiang

At this week’s MCQLL meeting (1:30-2:30pm Wednesday, November 11), Bing’er Jiang, a sixth year PhD student at the McGill Linguistics Department, will present her work on the perceptual tonal space in Mandarin Chinese continuous speech. Talk abstract is below.

If you would like to join the meeting and have not already registered for the MCQLL mailing list, please do so ASAP using this form.

Abstract: This study examines the perceptual tonal space in Mandarin Chinese continuous speech and how various acoustic properties signalling the tonal contrast are represented in this space. Previous studies on Mandarin tones mainly focus on words produced in isolation, but there is little understanding on the perception of tones in continuous speech, which are realized with more variability. We first evaluate the importance of three acoustic correlates (pitch, intensity, and duration) for the tonal contrast by using a set of tone classification models trained on broadcast news. Instead of model ablation, we use a novel method of data ablation inspired from conventional perceptual experiments to restrict the acoustic information the model can access. We further force the model to learn a low-dimensional representation, which can be seen as the model’s perceptual representation for tones. We find that the information for tonal distinction can be compressed in a two-dimensional space, and the structure of the space corresponds to the findings on human’s perception of isolated tones in the literature.

MCQLL Meeting, 11/4 — Emi Baylor

At this week’s MCQLL meeting (Wednesday, November 4th, 1:30-2:30pm), Emi Baylor, masters student at McGill School of Computer Science and Mila, will be presenting on her work with morphological productivity. Bio and talk abstract are below.

If you would like to attend the talk but are not already on the MCQLL listserv, please sign up at this link as soon as possible, as there is still a registration step that needs to be completed after that.

Bio: Emi Baylor is a masters student at McGill Computer Science and Mila. She is interested in computational morphology, multilingual NLP, and low resource languages, as well as the combination of all three.

Abstract: This work investigates and empirically tests theories of linguistic productivity. Language users are able to make infinite use of finite means, meaning that a finite number of words and morphemes can be used to create an infinite number of utterances. This is largely due to linguistic productivity, which allows language users to create and understand novel expressions through stored, reusable units. One example of a productive process across language is plural morphology, which generalizes the use of plural morphemes in a language to novel words. This work investigates and empirically tests theories of how this generalization of forms is learned and carried out, through data from the complex German plural noun system.

