« Older Entries

MCQLL Lightning Talks, 9/14

MCQLL will be meeting this Tuesday, September 14 at 3:00 PM on Zoom.

This week’s meeting will be a series of lightning talks by MCQLL lab members, giving brief introductions to their research. All are welcome to come learn more about current work being done in the lab.

If you haven’t already, please register here to get the meeting link.

MCQLL Meeting, 4/8 — Michaela Socolof

This week’s MCQLL meeting on Thursday, April 8 at 1:30-2:30pm, will feature a talk from Michaela Socolof, a third year PhD student in the Linguistics department at McGill.

Abstract: I will be presenting an overview of issues relating to the syntax of relative clause constructions across languages. The purpose of this talk is to explore possibilities for computational projects in this area.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

MCQLL Meeting, 4/1 — Maya Watt

This week’s MCQLL meeting Thursday, April 1, 1:30-2:30pm, will feature a talk from Maya Watt. Bio and talk abstract are below.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Bio: Maya Watt is a U3 undergraduate student in Honours Linguistics with a minor in Computer Science.

Abstract: Theories of inflectional morphology differ in terms of how they treat semi-productive inflection types, that is, inflections  that apply to multiple words but are not completely productive (e.g. grow-grew, know-knew, but not clow-clowed). How such semi-regular classes generalize may help distinguish theories, but little work has explored this question due to the difficulty of finding overgeneralized uses of these inflectional classes  in naturalistic corpora. We address this issue by conducting a prompted lexical decision study on English past tenses. Participants were shown a regular or irregular verb in the infinitive form (to snow, to grow) and then presented with either a correct inflection (snowed, grew) or an overgeneralization (snew, growed) and asked to indicate whether it is the correct past tense form. We compare how various overgeneralized  types (snow-snew, sneeze-snoze) differ in terms of reaction times and accuracy rates finding differences between classes which may inform future theoretical comparisons.

MCQLL Meeting, 3/25 — Emily Goodwin

This week’s MCQLL meeting Thursday, March 25, 1:30-2:30pm, will feature a talk from Emily Goodwin. Talk abstract is below.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Abstract: Recent attention in neural natural language understanding models has focused on generalization that is compositional (the meanings of larger expressions are a function of the meanings of smaller expressions) and systematic (individual words mean the same thing when put in novel combinations). Datasets for compositional and systematic generalization often focus on testing classes of syntactic constructions (testing only on strings of a certain length or longer, or novel combinations of particular predicates). In contrast, the compositional freebase queries (CFQ) training and test sets are automatically sampled. To measure the compositional challenge of a test set relative to its training set, they measure the divergence between the distribution of syntactic compounds in test and train. Training and test splits with maximum compound divergence (MCD) are highly challenging for semantic parsers, but (unlike other datasets designed to test compositional generalization) the splits do not specifically hold-out human-recognizable classes of syntactic constructions from the training set.In this talk I will present preliminary results of a syntactic analyses of the MCD splits released in the CFQ dataset, and explore whether model failures on MCD splits can be explained in terms of phenomena familiar to syntactic theory.

MCQLL Lab Meeting, 3/14 — Ben LeBrun

This week’s MCQLL meeting Thursday, March 18, 1:30-2:30pm, will feature a talk from Ben LeBrun. Talk abstract is below.

If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Abstract: The use of pre-trained Transformer language models (TLMs) has led to significant advances in the field of natural language processing. This success has typically been measured by quantifying model performance on down-stream tasks, or through their ability to predict words in large samples of text. However, these benchmarks are biased in favour of frequent natural language constructions, measuring performance on common, recurring patterns in the data. The behaviour of TLMs on the large set of complex and infrequent linguistic constructions is in comparison understudied. In this talk, I will present preliminary results exploring GPT2’s ability to reproduce this long-tail of syntactic constructions, and how this ability is modulated by fine-tuning.

MCQLL Lab Meeting, 3/11 — Eva Portelance

This week’s MCQLL meeting (Thursday, March 11th, 1:30-2:30pm), will feature a talk from Eva Portelance. Abstract and bio are below.

If you would like to join the meeting but have not yet registered for this semester’s MCQLL meetings, please send an email to mcqllmeetings@gmail.com.

Bio: Eva is currently a Ph.D. candidate at Stanford University in Linguistics, working with Mike Frank and Dan Jurafsky. She completed a B.A. Honours in Linguistics and Computer Science at McGill University in 2017. She is interested in linguistic structure and language learning both in humans and machines. This work was started during an internship at Microsoft Research Montreal.

Abstract: Learning Strategies for the Emergence of Language in Iterated Learning

In emergent communication studies, agents play communication games in order to develop a set of linguistic conventions referred to as the emergent language. Here, we compare the effects of a variety of learning functions and play phases on the efficiency and effectiveness of emergent language learning. We do so both within a single generation of agents and across generations in an iterated learning setting. We find that allowing agents to engage in forms of selfplay ultimately leads to more effective communication. In the iterated learning setting we compare different approaches to intergenerational learning. We find that selfplay used jointly with imitation can also lead to effective communication in this setting. Additionally, we find that encouraging agents to successfully communicate with previous generations rather than to successfully imitate them can lead to both effective language and efficient learning. Finally, we introduce a new dataset and a new agent architecture with split visual perception and representation modules in order to conduct our experiments.

MCQLL Meeting, 2/25 — Richard Futrell

This week’s MCQLL meeting, taking place Thursday, Feb 25th, 1:30-2:30pm will feature a talk entitled “Information-theoretic models of natural language” by Professor Richard Futrell. Abstract and bio are below. If you would like to join the meeting and have not yet registered for this semester’s MCQLL meetings, please send an email to mcqllmeetings@gmail.com requesting the link.

Abstract: I claim that human languages can be modeled as information-theoretic codes, that is, systems that maximize information transfer under certain constraints. I argue that the relevant constraints for human language are those involving the cognitive resources used during language production and comprehension. Viewing human language in this way, it is possible to derive and test new quantitative predictions about the statistical, syntactic, and morphemic structure of human languages.

I start by reviewing some of the many ways that natural languages differ from optimal codes as studied in information theory. I argue that one distinguishing characteristic of human languages, as opposed to other natural and artificial codes, is a property I call “information locality”: information about particular aspects of meaning is localized in time within a linguistic utterance. I give evidence for information locality at multiple levels of linguistic structure, including the structure of words and the order of words in sentences.

Next, I state a theorem showing that information locality is a property of any communication system where the encoder and/or decoder are operating incrementally under memory constraints. The theorem yields a new, fully formal, and quantifiable definition of information locality, which leads to new predictions about word order and the structure of words across languages. I test these predictions in broad corpus studies of word order in over 50 languages, and in case studies of the order of morphemes within words in two languages.

Bio: Richard Futrell is an Assistant Professor of Language Science at the University of California, Irvine. His research applies information theory to better understand human language and how humans and machines can learn and process it.

MCQLL Meeting, 1/28 — Dzmitry Bahdanau

This week’s MCQLL meeting, on Thursday, Jan 28th at 1:30-2:30pm, will feature a talk from Dzmitry (Dima) Bahdanau, a research scientist at Element AI, a research group at ServiceNow.

Speaker Bio: I am a research scientist at Element AI that has just been acquired by ServiceNow. I am also a Core Industry Member of Mila and Adjunct Professor at McGill University. The current goal of my research is to further the adoption of language user interfaces. To this end I am interested in semantic parsing and task-oriented dialogue methods, in particular their systematic (compositional) generalization and sample efficiency. My prior research interests include grounding language in vision and action, question answering, speech recognition, machine translation and structured prediction in general. I have recently completed my PhD at Mila working under supervision of Yoshua Bengio.

Abstract: I will talk about the task of translating natural language queries into Structured Query Language (SQL). I will first discuss the broad relevance and importance of this task. I will make connections between SQL and meaning representations that are more conventional in linguistics, namely lambda-calculus. I will talk about type-based heuristics for query completion and how they sometimes allow models to infer correct queries without much syntactic understanding. I will describe how state-of-the-art models work, focusing on the recent DuoRAT model produced by our group. Lastly, I will talk about the on-going few-shot cross-domain text2sql project that we are currently working on at Element AI.

If you would like to attend this talk but have not yet registered for this semester’s MCQLL meetings, please send an email to mcqllmeetings@gmail.com so that we can get you the link.

MCQLL Meeting, 1/21 — Koustuv Sinha

This week’s MCQLL meeting on Thursday, Jan 21, 1:30-2:30pm will feature a talk from Koustuv Sinha. Koustuv is a third year PhD candidate at McGill University / Mila / Facebook AI Research, and is supervised by Joelle Pineau and Will Hamilton. His primary research interest lies in understanding systematic reasoning and generalization in discrete modalities, encompassing language understanding and graph-based reasoning.

If you are not already on the MCQLL mailing list and would like to attend this meeting and/or join the mailing list, please send an email to mcqllmeetings@gmail.com ASAP so we can make sure to get you the link to the meeting in time.

MCQLL Meeting, 11/11 — Bing’er Jiang

At this week’s MCQLL meeting (1:30-2:30pm Wednesday, November 11), Bing’er Jiang, a sixth year PhD student at the McGill Linguistics Department, will present her work on the perceptual tonal space in Mandarin Chinese continuous speech. Talk abstract is below.

If you would like to join the meeting and have not already registered for the MCQLL mailing list, please do so ASAP using this form.

Abstract: This study examines the perceptual tonal space in Mandarin Chinese continuous speech and how various acoustic properties signalling the tonal contrast are represented in this space. Previous studies on Mandarin tones mainly focus on words produced in isolation, but there is little understanding on the perception of tones in continuous speech, which are realized with more variability. We first evaluate the importance of three acoustic correlates (pitch, intensity, and duration) for the tonal contrast by using a set of tone classification models trained on broadcast news. Instead of model ablation, we use a novel method of data ablation inspired from conventional perceptual experiments to restrict the acoustic information the model can access. We further force the model to learn a low-dimensional representation, which can be seen as the model’s perceptual representation for tones. We find that the information for tonal distinction can be compressed in a two-dimensional space, and the structure of the space corresponds to the findings on human’s perception of isolated tones in the literature.

MCQLL Meeting, 11/4 — Emi Baylor

At this week’s MCQLL meeting (Wednesday, November 4th, 1:30-2:30pm), Emi Baylor, masters student at McGill School of Computer Science and Mila, will be presenting on her work with morphological productivity. Bio and talk abstract are below.

If you would like to attend the talk but are not already on the MCQLL listserv, please sign up at this link as soon as possible, as there is still a registration step that needs to be completed after that.

Bio: Emi Baylor is a masters student at McGill Computer Science and Mila. She is interested in computational morphology, multilingual NLP, and low resource languages, as well as the combination of all three.

Abstract: This work investigates and empirically tests theories of linguistic productivity. Language users are able to make infinite use of finite means, meaning that a finite number of words and morphemes can be used to create an infinite number of utterances. This is largely due to linguistic productivity, which allows language users to create and understand novel expressions through stored, reusable units. One example of a productive process across language is plural morphology, which generalizes the use of plural morphemes in a language to novel words. This work investigates and empirically tests theories of how this generalization of forms is learned and carried out, through data from the complex German plural noun system.

MCQLL Meeting, 10/28 — Michaela Socolof

At this week’s MCQLL meeting (Wednesday, October 8th, 1:30-2:30pm), Michaela Socolof, PhD student in the McGill Linguistics department, will be presenting on her work with idioms and compositionality. Bio and talk abstract are below.

If you would like to attend the talk but are not already on the MCQLL listserv, please sign up at this link as soon as possible, as there is still a registration step that needs to be completed after that.

Bio: Michaela Socolof is a PhD student at McGill Linguistics. She is interested in syntax and semantics, with a focus on using computational tools to explore questions in these domains.

Talk: This work addresses the question of how idioms should be characterized. Unlike most phrases in language, whose meanings are largely predictable based on the meanings of their individual words, idioms have idiosyncratic meanings that do not come from straightforwardly combining their parts. This observation has led to the commonly repeated notion that idioms are an exception to compositionality that require special machinery in the linguistic system. We show that it is possible to characterize idioms based on the interaction of two simple properties of language: the extent to which the word meanings are dependent on context and the extent to which the phrase is stored as a unit. We present computational approximations of these two properties, and we show that our measures successfully distinguish between idiomatic and non-idiomatic phrases.

MCQLL Meeting, 10/21 — Jacob Hoover

At this week’s MCQLL meeting (Wednesday, October 21st, 1:30-2:30pm), Jacob Louis Hoover, a PhD student at McGill and Mila, will present on the connection between grammatical structure and the statistics of word occurrences in language use. Abstract and bio are below.

If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form ASAP to do so.

Bio: Jacob is a PhD student at McGill Linguistics / Mila. He is broadly interested in logic, mathematical linguistics, and the generative / expressive capacity of formal systems, as well as information theory, and examining what both human and machine learning might be able to tell us about the underlying structure of language.

Talk: There is an intuitive connection between grammatical structure and the statistics of word occurrences observed in language use. This intuitive connection is reflected in cognitive models and also in NLP, in the assumption that the patterns of predictability correlate with linguistic structure. We call this the “dependency-dependence” hypothesis. This hypothesis is implicit in the use of language modelling objectives for training modern neural models, and has been made explicitly in some approaches to unsupervised dependency parsing. The strongest version of this hypothesis is to say that compositional structure is in fact entirely reducible to cooccurrence statistics (a hypothesis made explicit in Futrell et al. 2019). Investigating the mutual information of pairs of words using pretrained contextualized embedding models, we show that the optimal structure for prediction is in fact not very closely correlated to the compositional structure. We propose that contextualized mutual information scores of this kind may be useful as a way to understand the structure of predictability, as a system distinct from compositional structure, but also integral to language use.

MCQLL, 10/7 — Mika Braginsky

At this week’s MCQLL meeting (Wednesday, October 7th, 1:30-2:30pm), Mika Braginsky, a graduate student in Brain and Cognitive Sciences at MIT, will discuss their work investigating linguistic productivity and child language acquisition. Talk abstract is below.
If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form to do so.
Talk: In learning morphology, do children generalize from their vocabularies on an item-by-item basis, or do they form global rules on a developmental timetable? We use large-scale parent-report data to address this question by investigating relations among morphological development, vocabulary growth, and age. For three languages, we examine irregular verbs (e.g. go) and predict children’s correct inflection (went) and overregularization (goed/wented). Morphology knowledge relates strongly to vocabulary, more so than to age. Further, this relation is modulated by age: for two children with the same vocabulary size, the older is more likely to correctly inflect and overregularize, and the effect of vocabulary on morphology decreases with age. Lastly, correct inflection and overregularization rates rise in tandem over age, and vocabulary effects on them are correlated across items. Our findings support that morphology learning is strongly coupled to lexical learning and that correct inflection and overregularization are related, verb-specific, processes.

MCQLL, 9/30 — Maya Watt

At this week’s MCQLL meeting (Wednesday, September 30th, 1:30-2:30) Maya Watt will be presenting her research on the rates of over-irregularization of English past-tense verbs. See below for the talk abstract and Maya’s bio.

If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form.

TALK: In her talk, Maya will discuss the rates of over-irregularization of English past-tense verbs (i.e. believing the past tense of snow is snew instead of snowed). Such mistakes rarely happen in natural speech, so very little is know about the nuances of over-irregularization — do people tend to over-irregularize verbs of a particular inflectional class, or do the rates stay fairly similar? Because capturing an instance of over-irregularization in natural speech is difficult, we decided to collect our data via implementing a lexical decision task (LDT) and launching it on Mechanical Turk. The assumption is that highly natural over-irregularized non-words (e.g. brang) will take longer to be judged as non-words than other, less-natural non-words (e.g. screamt). The goal of this project is to provide some data and insight into language learning and productive morphology.

BIO: Maya is an undergraduate student in Linguistics and Computer Science. She’s interested in syntax, logic, and formal linguistics. Her research interests lie in the intersection of natural language and mathematics.

MCQLL, 9/23 — Emily Goodwin

This week at MCQLL (Wednesday 1:20-2:30), Emily Goodwin will present her ongoing work on systematic syntactic parsing. Abstract and bio are below. If you would like to join the mailing list and/or attend the meeting, please fill out this google form (as soon as possible).

ABSTRACT:
Recent work in semantic parsing, including novel datasets like SCAN (Lake and Baroni, 2018) and CFQ (Keysers et al., 2020) demonstrate that semantic parsers generalize well when tested on items highly similar to those in the training set, but struggle with syntactic structures that combine components of training items in novel ways. This indicates a lack of systematicity , the principle that individual words will make similar contributions to the expressions they appear in, independently of surrounding context. Applying this principle to syntactic parsing, we show similar problems plague state of the art syntactic parsers, despite achieving human or near-human performance on randomly sampled test data. Moreover, generalization is especially poor on syntactic relations which are crucial for the compositional semantics.

BIO:
Emily is an M.A. Student in the McGill linguistics department, supervised by Profs. Timothy J. O’Donnell and Siva Reddy, and by Dzmitry Bahdanau of ElementAI. She is interested in compositionality and systematic generalization in meaning representation.

MCQLL, 9/16 — Lightning Talks

As last week’s meeting was cancelled to make it easier for people to participate in the scholar strike, this week’s MCQLL meeting (1:30pm on Wednesday, September 16th) will be lightning talks by returning MCQLL lab members (that would have normally taken place last week). This will serve as an introduction to the type of work done at MCQLL, as well as provide a space to ask questions about our research and the lab in general.

Please make sure to register here beforehand so that you can get the meeting link. If you already registered last week, then there is no need to register again, just join with the link you got in your registration confirmation email.

MCQLL, 9/9 — Lightning Talks

At this week’s MCQLL meeting (1:30pm on Wednesday, September 9th), there will be a series of lightning talks by returning MCQLL lab members. This will serve as an introduction to the type of work done at MCQLL, as well as provide a space to ask questions about our research and the lab in general.

Please make sure to register here beforehand so that you can get the meeting link.

The (tentative) meeting agenda is as follows:

  1. Announcements
  2. Lightning Talks
    • Clarifying Questions here are fine, but please hold all discussion questions until the Q & A session.
  3. Q & A Session, including:
    • Discussion questions relating to the talks
    • Questions relating to the lab in general

Please don’t hesitate to reach out if you have any questions, comments, or concerns.

MCQLL meeting, 6/3 – Timothy J O’Donnell 

This week at the Montreal Computational and Qualitative Linguistics Lab meeting, Timothy O’Donnell will be presenting his Meditations on Compositional Structure, to makeup for last week’s postponement. This presentation attempts to synthesize several threads of work in a broader framework. We meet at 2:30 via zoom (if you are not on the MCQLL emailing list, please contact Emily Goodwin emily.goodwin@mail.mcgill.ca for the meeting link).

 

MCQLL meeting, 5/13 — Bing’er Jiang 

The next meeting of the Montreal Computational and Quantitative Linguistics Lab will take place on Wednesday May 13th, at 2:30, via Zoom. Bing’er will present on Modelling Perceptual Effects of Phonology with Automatic Speech Recognition Systems. If you would like to participate but are not on the MCQLL or computational linguistics emailing list, contact emily.goodwin@mail.mcgill.ca for the Zoom link.
« Older Entries
Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.