« Older Entries | Newer Entries »

Fieldwork Lab Meeting, 12/3 — Aaron Broadwell

This Thursday (Dec 3) at 4pm during the Fieldwork Lab meeting, Aaron Broadwell (University of Florida) will be presenting “Making historical texts in indigenous languages accessible to communities: A Zapotec case study”. Please email Carol-Rose Little if you would like to attend and do not have access.

Abstract: Caseidyneën Saën is a set of open educational resources on Colonial Zapotec funded by an ACLS grant and created by a team including activists, educators, academics, and students. Here, we present this resource as a case study that contributes to larger conversations related to (1) communities working with historical corpora in their languages (e.g. Leonard 2011, Hinton 2011) and (2) the role digital scholarship can play in such projects (e.g. Czaykowska-Higgens et al. 2014).

Zapotec languages (Otomanguean) are indigenous to Oaxaca and are also spoken in diaspora communities, including in the greater Los Angeles area. Historical forms of Zapotec are attested in an expansive corpus written during the Mexican Colonial period. The online, digital resource Ticha (https://ticha.haverford.edu) makes these manuscripts accessible to the public by providing open access to high-resolution images, transcriptions, translations, linguistic analysis, and historical context. The continued development of Ticha is embedded in pedagogical practices and committed to co-creation with Zapotec individuals and pueblos. In Caseidyneën Saën, a collection of public-facing teaching materials, we use the resources available on Ticha to teach about Zapotec language, culture, and intellectual history.

The e-book Caseidyneën Saën was created by a team comprised of both Zapotec and non-Native collaborators, and the 18 co-authors of this multilingual (English, Spanish, and Zapotec), multimedia presentation represent the diversity of the team. In this talk I discuss how we went from traditional linguistic fieldwork to a deeper immersion into collaborative work with Zapotec communities to provide documents that are relevant to their linguistic history.

George Aaron Broadwell is Elling Eide Professor of Anthropology at University of Florida. See more details at https://people.clas.ufl.edu/broadwell/.

Fieldwork Lab Meeting, 11/26 — Dan Brodkin

This week in the Fieldwork Lab meeting, Dan Brodkin (UC Santa Cruz) will present work titled “Agent Focus in South Sulawesi”. Fieldwork Lab will meet at 4 on Thursday, November 26th. Contact Carol-Rose Little (carol.little@mcgill.ca), if you would like to join the meeting.

MCQLL Meeting, 11/11 — Bing’er Jiang

At this week’s MCQLL meeting (1:30-2:30pm Wednesday, November 11), Bing’er Jiang, a sixth year PhD student at the McGill Linguistics Department, will present her work on the perceptual tonal space in Mandarin Chinese continuous speech. Talk abstract is below.

If you would like to join the meeting and have not already registered for the MCQLL mailing list, please do so ASAP using this form.

Abstract: This study examines the perceptual tonal space in Mandarin Chinese continuous speech and how various acoustic properties signalling the tonal contrast are represented in this space. Previous studies on Mandarin tones mainly focus on words produced in isolation, but there is little understanding on the perception of tones in continuous speech, which are realized with more variability. We first evaluate the importance of three acoustic correlates (pitch, intensity, and duration) for the tonal contrast by using a set of tone classification models trained on broadcast news. Instead of model ablation, we use a novel method of data ablation inspired from conventional perceptual experiments to restrict the acoustic information the model can access. We further force the model to learn a low-dimensional representation, which can be seen as the model’s perceptual representation for tones. We find that the information for tonal distinction can be compressed in a two-dimensional space, and the structure of the space corresponds to the findings on human’s perception of isolated tones in the literature.

MCQLL Meeting, 11/4 — Emi Baylor

At this week’s MCQLL meeting (Wednesday, November 4th, 1:30-2:30pm), Emi Baylor, masters student at McGill School of Computer Science and Mila, will be presenting on her work with morphological productivity. Bio and talk abstract are below.

If you would like to attend the talk but are not already on the MCQLL listserv, please sign up at this link as soon as possible, as there is still a registration step that needs to be completed after that.

Bio: Emi Baylor is a masters student at McGill Computer Science and Mila. She is interested in computational morphology, multilingual NLP, and low resource languages, as well as the combination of all three.

Abstract: This work investigates and empirically tests theories of linguistic productivity. Language users are able to make infinite use of finite means, meaning that a finite number of words and morphemes can be used to create an infinite number of utterances. This is largely due to linguistic productivity, which allows language users to create and understand novel expressions through stored, reusable units. One example of a productive process across language is plural morphology, which generalizes the use of plural morphemes in a language to novel words. This work investigates and empirically tests theories of how this generalization of forms is learned and carried out, through data from the complex German plural noun system.

Fieldwork Lab, 11/5 — Dorothea Hoffmann

The next Fieldwork Lab Meeting will be on November 5th, 2020, this week exceptionally at 2:30pm. (Contact Carol-Rose Little if you would like to join.)

Dorothea Hoffmann will present a talk entitled “Event- and team-based fieldwork with a non-profit in comparison to the “lone-wolf” approach: A personal account”.

Abstract:
This paper compares “traditional” academic fieldwork in Australia as a “lone-wolf” linguist with the event- and team-based approach developed by the non-profit organization The Language Conservancy (TLC) in the US and Canada. After briefly describing my fieldwork methods and experiences working in the Northern Territory of Australia with the Malak Malak, I will shift focus to my work on North American languages such as Acoma Keres, Ute Mountain Ute, Ho-Chunk, and Stoney Nakoda. I will place particular emphasis on describing a modification of the Rapid Word Collection method (RWC), which was originally developed by SIL International (2010) in order to create practical dictionaries in a relatively short period of time. TLC adapted the semantic domain associations of the RWC method to the North American endangered language situation where both literacy levels and number of speakers are generally low. As a result, TLC developed a specialized software tool to collect both written and audio recordings for each entry in the semantic domain database in a two-week workshop setting.

After a workshop is completed, all collected data is consolidated into a digital spreadsheet and checked to ensure standardized spelling, accurate transcription, and grammatical consistency by a team of experienced linguists. The data is being flagged and organized so that it can be reviewed and re-recorded by fluent speakers in subsequent weeklong workshops. These workshops become true community events bringing Elders and speakers together in an effort to document an endangered language for the purposes of language revitalization. Additionally, the speed and efficiency of the process ensures that high-quality language materials can be delivered into the hands of the community in a relatively short period of time.

References:
SIL International. (2010). rapidwords.net. Retrieved 2020, from http://www.rapidwords.net/.
Warfel, Kevin. (2016). Dictionary Production: Rapid Word Collection Method. [Brochure]. SIL International. Retrieved 2020, from http://www.rapidwords.net/resources/files/rapid- word-collection-flyer

Dorothea Hoffmann holds a BA/MA in German and English linguistics and literary studies from the University of Konstanz, Germany and a PhD in linguistics from the University of Manchester, UK entitled “Descriptions of Motion and Travel in Jaminjung and Kriol”. She spent 5 years as a postdoctoral fellow at the University of Chicago working on the Australian languages Malak Malak and Matngele. She started working for the non-profit organization The Language Conservancy in 2017 and is now Linguistic Project Manager. She has researched various Australian and North American Indigenous languages  and is enthusiastic about language documentation and revitalization.

MCQLL Meeting, 10/28 — Michaela Socolof

At this week’s MCQLL meeting (Wednesday, October 8th, 1:30-2:30pm), Michaela Socolof, PhD student in the McGill Linguistics department, will be presenting on her work with idioms and compositionality. Bio and talk abstract are below.

If you would like to attend the talk but are not already on the MCQLL listserv, please sign up at this link as soon as possible, as there is still a registration step that needs to be completed after that.

Bio: Michaela Socolof is a PhD student at McGill Linguistics. She is interested in syntax and semantics, with a focus on using computational tools to explore questions in these domains.

Talk: This work addresses the question of how idioms should be characterized. Unlike most phrases in language, whose meanings are largely predictable based on the meanings of their individual words, idioms have idiosyncratic meanings that do not come from straightforwardly combining their parts. This observation has led to the commonly repeated notion that idioms are an exception to compositionality that require special machinery in the linguistic system. We show that it is possible to characterize idioms based on the interaction of two simple properties of language: the extent to which the word meanings are dependent on context and the extent to which the phrase is stored as a unit. We present computational approximations of these two properties, and we show that our measures successfully distinguish between idiomatic and non-idiomatic phrases.

MCQLL Meeting, 10/21 — Jacob Hoover

At this week’s MCQLL meeting (Wednesday, October 21st, 1:30-2:30pm), Jacob Louis Hoover, a PhD student at McGill and Mila, will present on the connection between grammatical structure and the statistics of word occurrences in language use. Abstract and bio are below.

If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form ASAP to do so.

Bio: Jacob is a PhD student at McGill Linguistics / Mila. He is broadly interested in logic, mathematical linguistics, and the generative / expressive capacity of formal systems, as well as information theory, and examining what both human and machine learning might be able to tell us about the underlying structure of language.

Talk: There is an intuitive connection between grammatical structure and the statistics of word occurrences observed in language use. This intuitive connection is reflected in cognitive models and also in NLP, in the assumption that the patterns of predictability correlate with linguistic structure. We call this the “dependency-dependence” hypothesis. This hypothesis is implicit in the use of language modelling objectives for training modern neural models, and has been made explicitly in some approaches to unsupervised dependency parsing. The strongest version of this hypothesis is to say that compositional structure is in fact entirely reducible to cooccurrence statistics (a hypothesis made explicit in Futrell et al. 2019). Investigating the mutual information of pairs of words using pretrained contextualized embedding models, we show that the optimal structure for prediction is in fact not very closely correlated to the compositional structure. We propose that contextualized mutual information scores of this kind may be useful as a way to understand the structure of predictability, as a system distinct from compositional structure, but also integral to language use.

Fieldwork Lab Meeting, 10/22 — Zoë Belk

At this week’s Fieldwork Lab Meeting (Thursday, October 22 at 4:00pm), Zoë Belk will present work titled “Loss of case and gender in two generations: Contemporary Hasidic Yiddish worldwide”. Zoë is a postdoctoral research associate, in the Department of Linguistics at University College London.

All are welcome! If you would like to attend and are not currently on the mailing list, please contact Carol-Rose little.

Abstract:
Standard and pre-Second World War varieties of Yiddish exhibit a robust system of morphological case and gender marking on full DPs. However, as a result of the Holocaust, Yiddish underwent a catastrophic loss of speakers and disruption to the geographical communities that spoke it. Today, it is spoken by approximately 750,000 Hasidic (strictly Orthodox) Jews worldwide (Biale et al. 2018). In this talk, I will present the findings of our ongoing fieldwork into contemporary Hasidic Yiddish, which so far covers approximately 40 speakers in four countries (the US, Israel, Canada and the United Kingdom). I will demonstrate that, within two generations of the Holocaust, Hasidic Yiddish underwent a complete loss of morphological case and gender. I will discuss a number of factors that contributed to this significant development in the language and provide some comparison to minority German dialects of North America to argue that contemporary Hasidic Yiddish presents a very rare opportunity to study such rapid and pervasive language change.

MCQLL, 10/7 — Mika Braginsky

At this week’s MCQLL meeting (Wednesday, October 7th, 1:30-2:30pm), Mika Braginsky, a graduate student in Brain and Cognitive Sciences at MIT, will discuss their work investigating linguistic productivity and child language acquisition. Talk abstract is below.
If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form to do so.
Talk: In learning morphology, do children generalize from their vocabularies on an item-by-item basis, or do they form global rules on a developmental timetable? We use large-scale parent-report data to address this question by investigating relations among morphological development, vocabulary growth, and age. For three languages, we examine irregular verbs (e.g. go) and predict children’s correct inflection (went) and overregularization (goed/wented). Morphology knowledge relates strongly to vocabulary, more so than to age. Further, this relation is modulated by age: for two children with the same vocabulary size, the older is more likely to correctly inflect and overregularize, and the effect of vocabulary on morphology decreases with age. Lastly, correct inflection and overregularization rates rise in tandem over age, and vocabulary effects on them are correlated across items. Our findings support that morphology learning is strongly coupled to lexical learning and that correct inflection and overregularization are related, verb-specific, processes.

MCQLL, 9/30 — Maya Watt

At this week’s MCQLL meeting (Wednesday, September 30th, 1:30-2:30) Maya Watt will be presenting her research on the rates of over-irregularization of English past-tense verbs. See below for the talk abstract and Maya’s bio.

If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form.

TALK: In her talk, Maya will discuss the rates of over-irregularization of English past-tense verbs (i.e. believing the past tense of snow is snew instead of snowed). Such mistakes rarely happen in natural speech, so very little is know about the nuances of over-irregularization — do people tend to over-irregularize verbs of a particular inflectional class, or do the rates stay fairly similar? Because capturing an instance of over-irregularization in natural speech is difficult, we decided to collect our data via implementing a lexical decision task (LDT) and launching it on Mechanical Turk. The assumption is that highly natural over-irregularized non-words (e.g. brang) will take longer to be judged as non-words than other, less-natural non-words (e.g. screamt). The goal of this project is to provide some data and insight into language learning and productive morphology.

BIO: Maya is an undergraduate student in Linguistics and Computer Science. She’s interested in syntax, logic, and formal linguistics. Her research interests lie in the intersection of natural language and mathematics.

MCQLL, 9/23 — Emily Goodwin

This week at MCQLL (Wednesday 1:20-2:30), Emily Goodwin will present her ongoing work on systematic syntactic parsing. Abstract and bio are below. If you would like to join the mailing list and/or attend the meeting, please fill out this google form (as soon as possible).

ABSTRACT:
Recent work in semantic parsing, including novel datasets like SCAN (Lake and Baroni, 2018) and CFQ (Keysers et al., 2020) demonstrate that semantic parsers generalize well when tested on items highly similar to those in the training set, but struggle with syntactic structures that combine components of training items in novel ways. This indicates a lack of systematicity , the principle that individual words will make similar contributions to the expressions they appear in, independently of surrounding context. Applying this principle to syntactic parsing, we show similar problems plague state of the art syntactic parsers, despite achieving human or near-human performance on randomly sampled test data. Moreover, generalization is especially poor on syntactic relations which are crucial for the compositional semantics.

BIO:
Emily is an M.A. Student in the McGill linguistics department, supervised by Profs. Timothy J. O’Donnell and Siva Reddy, and by Dzmitry Bahdanau of ElementAI. She is interested in compositionality and systematic generalization in meaning representation.

MCQLL, 9/16 — Lightning Talks

As last week’s meeting was cancelled to make it easier for people to participate in the scholar strike, this week’s MCQLL meeting (1:30pm on Wednesday, September 16th) will be lightning talks by returning MCQLL lab members (that would have normally taken place last week). This will serve as an introduction to the type of work done at MCQLL, as well as provide a space to ask questions about our research and the lab in general.

Please make sure to register here beforehand so that you can get the meeting link. If you already registered last week, then there is no need to register again, just join with the link you got in your registration confirmation email.

MCQLL, 9/9 — Lightning Talks

At this week’s MCQLL meeting (1:30pm on Wednesday, September 9th), there will be a series of lightning talks by returning MCQLL lab members. This will serve as an introduction to the type of work done at MCQLL, as well as provide a space to ask questions about our research and the lab in general.

Please make sure to register here beforehand so that you can get the meeting link.

The (tentative) meeting agenda is as follows:

  1. Announcements
  2. Lightning Talks
    • Clarifying Questions here are fine, but please hold all discussion questions until the Q & A session.
  3. Q & A Session, including:
    • Discussion questions relating to the talks
    • Questions relating to the lab in general

Please don’t hesitate to reach out if you have any questions, comments, or concerns.

MCQLL meeting, 6/3 – Timothy J O’Donnell 

This week at the Montreal Computational and Qualitative Linguistics Lab meeting, Timothy O’Donnell will be presenting his Meditations on Compositional Structure, to makeup for last week’s postponement. This presentation attempts to synthesize several threads of work in a broader framework. We meet at 2:30 via zoom (if you are not on the MCQLL emailing list, please contact Emily Goodwin emily.goodwin@mail.mcgill.ca for the meeting link).

 

MCQLL meeting, 5/13 — Bing’er Jiang 

The next meeting of the Montreal Computational and Quantitative Linguistics Lab will take place on Wednesday May 13th, at 2:30, via Zoom. Bing’er will present on Modelling Perceptual Effects of Phonology with Automatic Speech Recognition Systems. If you would like to participate but are not on the MCQLL or computational linguistics emailing list, contact emily.goodwin@mail.mcgill.ca for the Zoom link.

MCQLL meeting, 5/6 — Jacob Hoover

The next meeting of the Montreal Computational and Qualitative Linguistics Lab will take place on Wednesday May 6th at 2:30, via Zoom. Jacob Hoover will present an ongoing project on compositionality and predictability.

For abstract and more information see the MCQLL lab page. If you would like to participate but are not on the MCQLL or computational linguistics emailing list, contact emily.goodwin@mail.mcgill.ca for the Zoom link.

MCQLL meeting, 4/29 — Koustuv Sinha

The next meeting of the Montreal Computational and Qualitative Linguistics Lab will take place on Wednesday April 29th, at 2:30, via Zoom. Koustuv Sinha will present “Learning an Unreferenced Metric for Online Dialogue Evaluation” (ACL, 2020). For abstract and more information see the MCQLL lab page. If you would like to participate but are not on the MCQLL or computational linguistics emailing list, contact emily.goodwin@mail.mcgill.ca for the Zoom link.

MCQLL meeting, 4/22 – Spandana Gella

Spandana Gella, research scientist at Amazon AI, will present on “Robust Natural Language Processing with Multi-task Learning” at this week’s Montreal Computational and Quantitative Linguistics Lab meeting. We are meeting Wednesday, April 22nd, at 2:00 via Zoom (to be added to the MCQLL listserve, please contact Jacob Hoover at jacob.hoover@mail.mcgill.ca).
Abstract:

In recent years, we have seen major improvements to various Natural Language Processing tasks. Despite their human-level performance on benchmarking datasets, recent studies have shown that these models are vulnerable to adversarial examples. It is shown that these models are relying on spurious correlations that hold for the majority of examples and suffer from distribution shifts and fail on atypical or challenging test sets. Recent work has shown that large pre-trained models improve model robustness to spurious associations in the training data.  We observe that superior performance of large pre-trained language models comes from their better generalization from a minority of training examples that resemble the challenging sets. Our study shows that multi-task learning with the right auxiliary tasks improves accuracy on adversarial examples without hurting in distribution performance. We show that this holds true for multi-modal task of Referring Expression Recognition and text-only tasks of Natural language inference and Paraphrase identification.

MCQLL meeting, 4/1 — Guillaume Rabusseau

The next meeting of the Montreal Computational and Qualitative Linguistics Lab will take place on Wednesday April 1st, at 1:00, via Zoom (meeting ID: 912 324 021). This week, Guillaume Rabusseau will present on “Spectral Learning of Weighted Automata and Connections with Recurrent Neural Networks and Tensor Networks”.
Abstract:
Structured objects such as strings, trees, and graphs are ubiquitous in data science but learning functions defined over such objects can be a tedious task. Weighted finite automata~(WFAs) and recurrent neural networks~(RNNs) are two powerful and flexible classes of models which can efficiently represent such functions.In this talk, Guillaume will introduce WFAs and the spectral learning algorithm before presenting surprising connections between WFAs, tensor networks and recurrent neural networks. Guillaume Rabusseau is an assistant professor at Univeristé de Montréal and at the Mila research institute since Fall 2018, and a Canada CIFAR AI (CCAI) chair holder since March 2019.  His research interests lie at the intersection of theoretical computer science and machine learning, and his work revolves around exploring inter-connections between tensors and machine learning and developing efficient learning methods for structured data relying on linear and multilinear algebra.
Meeting ID: 912 324 021

MCQLL Meeting, 2/19 — Vanna Willerton

This week at MCQLL, Vanna Willerton will be discussing the (over)application of irregular inflection, and exploring how it can influence our understanding of morphological productivity. She will review existing studies, as well as a recent large-scale corpus study of child speech with Graham Adachi-Kriege, Shijie Wu, Ryan Cotterell, and Tim O’Donnell, and current work analyzing recent experimental results.As usual, we meet at 1:00 in 117 of the McGill Linguistics building, and all are welcome!

« Older Entries | Newer Entries »
Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.