« Older Entries

Linguistics/CS Seminar, 3/28 — Fatemeh Asr

McGILL UNIVERSITY DEPARTMENT OF LINGUISTICS AND SCHOOL OF COMPUTER SCIENCE 

SpeakerFatemeh Asr
Date & Time: Thursday, March 28, 2019 9:30am
Place: RPHYS 114
Title: Relations between words in a distributional space: A cognitive and computational perspective.

Abstract:

Word embeddings obtained from neural networks trained on big text corpora have become popular representations of word meaning in computational linguistics. In this talk, we first take a look at the different types of semantic relations between two words in a language and ask whether these relations can be identified with the help of popular embedding models such as Word2Vec and GloVe. I propose different measures to obtain the degree of paradigmatic similarity vs. syntagmatic relatedness between two words. In order to evaluate these measures, we use two datasets obtained from experiments on human subjects: SimLex-999 (Hills et al. 2016) including explicitly instructed ratings for word similarity, and explicitly instructed production norms (Jouravlev & McRae, 2016) for word relatedness.

In the second part of the talk, we look into the question of modeling the meaning of discourse connectives. Similarities between a pair of such particles, e.g., “but” and “although”, cannot be computed based directly on surrounding words. I explain however that discourse connectives can also be viewed from a distributional semantics perspective if a suitable abstraction of context is employed. For example, the slightest differences in the meaning of “but” and “although” can be revealed by studying their distribution in a corpus annotated with discourse relations. Finally, I draw some future directions for research based on our findings and the current developments in computational linguistics and natural language processing.

Linguistics/CS Seminar, 3/24 – Siva Reddy

McGILL UNIVERSITY DEPARTMENT OF LINGUISTICS AND SCHOOL OF COMPUTER SCIENCE 

SpeakerSiva Reddy
Date & Time: Monday, March 25, 2019 9:30am
Place: ARTS W-20
Title: Interacting with machines in natural language: A case for the interplay between linguistics and machine learning

Abstract:

Computing machinery such as smartphones are ubiquitous, and so will be smart home appliances, self-driving cars and robots in the near future. Enabling these machines with natural language understanding abilities opens up potential opportunities for the broader society to benefit from, e.g., in accessing the world’s knowledge, or in controlling complex machines with little effort.

In this talk, we will focus on the task of accessing knowledge stored in knowledge-bases and text documents in a colloquial manner. First, we will see how brittle the current models are to compositional and conversational language. Then we will explore how linguistic knowledge and inductive biases on neural architectures can circumvent these problems.

The scientific questions we will address are 1) Are linguistically-informed models better than uninformed models? 2) How can inductive biases help machine learning? and 3) What are the challenges in enabling conversational interactions? For building linguistically-informed models, I will propose a novel syntax-semantics interface based on typed lambda calculus for converting dependency syntax into formal semantic representations.

Bio:

Siva Reddy is a postdoc in the Computer Science Department at Stanford University working with Chris Manning. His research goal is to understand universal semantic structures in languages and build linguistically-informed machine learning models to enable natural language interaction between humans and machines. His research is supported by grants from Amazon and Facebook. Before postdoc, he was a Google PhD fellow at the University of Edinburgh working with Mirella Lapata and Mark Steedman. His work experience includes an internship at Google and a research position at Sketch Engine.

Linguistics/CS Seminar, 3/20 – Frank Mollica

McGILL UNIVERSITY DEPARTMENT OF LINGUISTICS AND SCHOOL OF COMPUTER SCIENCE 

SpeakerFrank Mollica
Date & Time: Wednesday, March 20, 2019 9:30am  
Place: WILSON 105
Title: The Human Learning Machine: Computational Models of Lexical Acquisition

Abstract:

Language allows us to face novel concepts and situations by building structured mental representations of the world. The primary goal of my research program is to use computational models and behavioral experiments to understand how we construct and update these rich mental models both from experience (i.e., language acquisition) and from language (i.e., language processing). In this talk, I draw on methods in computational linguistics and computational cognitive science to propose a model of lexical acquisition formalized as logical program induction. First, I’ll illustrate how the model explains the systematic patterns of behavior observed in children as they acquire kinship words. Then, I will present a large cross-cultural data analysis model that infers how children use data from the timing of their lexical acquisition. Lastly, I will use children’s acquisition of exact number words as a case study to demonstrate how both of these models can be combined to learn about the universal and culturally-specific processes of the human learning machine. Taken together, this body of work provides the first computational model for how children learn relational word meanings, the first large-scale cross-linguistic model of children’s data usage during early word learning and an innovative computational toolbox for leveraging large datasets and discipline knowledge to draw theoretical insights in child development.

Linguistics/CS Seminar 3/11 — Rachel Rudinger,

McGill UNIVERSITY – DEPARTMENT OF LINGUISTICS AND SCHOOL OF COMPUTER SCIENCE 

SpeakerRachel Rudinger, Center for Language and Speech Processing, Johns Hopkins University
Date & Time: Monday, March 11, 2019 9:30am  
PlaceARTS W-20
TitleNatural Language Understanding for Events and Participants in Text

Abstract:

Consider the difference between the two sentences “Pat didn’t remember to water the plants” and “Pat didn’t remember that she had watered the plants.” Fluent English speakers recognize that the former sentence implies that Pat did not water the plants, while the latter sentence implies she did. This distinction is crucial to understanding the meaning of these sentences, yet it is one that automated natural language processing (NLP) systems struggle to make. In this talk, I will discuss my work on developing state-of-the-art NLP models that make essential inferences about events (e.g., a “watering” event) and participants (e.g., “Pat” and “the plants”) in natural language sentences. In particular, I will focus on two supervised NLP tasks that serve as core tests of language understanding: Event Factuality Prediction and Semantic Proto-Role Labeling. I will also discuss my work on unsupervised acquisition of common-sense knowledge from large natural language text corpora, and the concomitant challenge of detecting problematic social biases in NLP models trained on such data.

Linguistics/CS Seminar 3/13 — Kyle Mahowald

McGill UNIVERSITY – DEPARTMENT OF LINGUISTICS AND SCHOOL OF COMPUTER SCIENCE 

SpeakerKyle Mahowald
Date & Time: Wednesday, March 13, 2019 9:30am  
Place: WILSON 105
Title: Cognitive and communicative pressures in natural language

Abstract:

There is enormous linguistic diversity within and across language families. But all languages must be efficient for their speakers’ needs and cognitively tractable for processing. Using ideas and techniques from computer science, we can generate hypotheses about what efficient languages should look like. Using large amounts of multilingual linguistic data, computational modeling, and online behavioral experiments, we can test these hypotheses and therein explain phenomena observed across and within languages. In particular, I will focus on the lexicon and explore why languages have the words they do instead of some other set of words. First, consistent with predictions from Shannon’s information theory, languages are optimized such that the words that convey less information are a) shorter and b) easier to pronounce. For instance, word shortenings like chimpanzee -> chimp are more likely to occur when the context is predictive. Second, across 97 languages, phonotactically probable words are more likely to also have high token frequency. Third, applying these ideas about efficiency to syntax, I show that, across 37 languages, the syntactic distances between dependent words are minimized. I will conclude with a discussion of my work in experimental methods and my directions for future research.

MCQLL Meeting Wednesday, 11/21

At this week’s MCQLL meeting, Bing’er Jiang will present Feldman et al.’s (2013) A Role for the Developing Lexicon in Phonetic Category Acquisition. Please find the abstract below:

Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning.

 

We will be meeting Wednesday November 21 at 5:00pm in room 117. Food will be provided. See you then!

Kyle Gorman Visit

Kyle Gorman from Google AI and CUNY will be visiting the Department the week of November 12th. He will be giving a talk at 15:30 – 17:00 on Monday in Room 117 1085 Dr. Penfield (title and abstract will be sent out soon), and a Tutorial on Pynini, a Python library he developed for weighted finite-state grammar compilation, on Wednesday 12:00-15:00 in Ferrier room 230.

(Talk, Monday)
Grammar engineering in text-to-speech synthesis
Many speech and language applications, including speech recognition and speech synthesis, require mappings between “written” and “spoken” representations of language. Despite substantial progress in applied machine learning, it is still the case that real-world industrial text-to-speech (TTS) synthesis systems largely depend on language-specific hand-written rules for these conversions. These may require a great deal of development effort and linguistic sophistication, and as such represent substantial barriers for quality control and internationalization. 
I first consider the case of number names, where the goal is to map written forms like 328 to three hundred twenty eight. I propose two computational models for learning this mapping. The first uses end-to-end recurrent neural networks. The second, inspired by prior literature on cross-linguistic variation in number naming, uses an induction strategy based on finite-state transducers. While both models achieve near-perform performance, the latter model is trained using several orders of magnitude less data, making it particularly useful for low-resource languages. The latter model is being used at Google to produce number grammars for dozens of languages and locales. 
I then consider the case of grapheme-to-phoneme conversion, where the task is to map written words onto their phonemic transcriptions. I describe a model in which the grammar engineering is performed by providing input and output vocabularies; in Spanish for instance, the input vocabulary includes digraphs like ll and rr, which denote single phonemes, and for Japanese kana, the output vocabulary includes entire syllables. This grammatical information, incorporated into a finite-state generative model, results in a significant improvement over a baseline system which lacks direct access to such information.
 
(Tutorial, Wednesday)
Pynini: Finite-state grammar development in Python
Finite-state transducers are abstract computational models of relations between sets of strings, widely used in speech and language technologies and studied as computational models of morphophonology. In this tutorial, I will introduce the finite-state transducer formalism and Pynini (Gorman 2016; http://pynini.opengrm.org), a Python library for compiling and processing finitestate grammars. In the first part of the tutorial, we will cover the finite-state formalism in detail. In the second part, we will install the Pynini library and survey its basic functionality. In the third, we will tackle case studies including Finnish vowel harmony rules and decoding ambiguous text messages. Participants are assumed to be familiar with the Python programming language, but I do not assume any experience with finite-state methods or natural language processing. Note to participants: You are encouraged to bring a working laptop. We will reserve some time to install the necessary libraries so that you can follow along and participate in a few select exercises. This software has been tested on Linux, Mac OS X (with an up-to-date version of XCode), and Windows 10 (with the Ubuntu flavor of Windows Subsystem for Linux). In case you wish to get a head start, installation instructions are available here: http://wellformedness.com/courses/PyniniTutorial/installation-instructions.html  

Special talk, 10/23 – David Barner

Speaker: Dr. David Barner, UCSD
Place: Room 461, 2001 McGill College
Title: Linguistic origins of uniquely human abstract concepts
Abstract: Humans have a unique ability to organize experience via formal systems for measuring time, space, and number. Many such concepts – like minute, meter, or liter – rely on arbitrary divisions of phenomena using a system of exact numerical quantification, which first emerges in development in the form of number words (e.g., one, two, three, etc). Critically, large exact numerical representations like “57” are neither universal among humans nor easy to acquire in childhood, raising significant questions as to their cognitive origins, both developmentally and in human cultural history. In this talk, I explore one significant source of such representations: Natural language. In Part 1, I draw on evidence from six language groups, including French/English and Spanish/English bilinguals, to argue that children learn small number words using the same linguistic representations that support learning singular, dual, and plural representations in many of the world’s languages. For example, I will argue that children’s initial meaning for the word “one” is not unlike their meaning for “a”. In Part 2, I investigate the idea that the logic of counting – and the intuition that numbers are infinite – also arises from a foundational property of language: Recursion. In particular, I will present a series of new studies from Cantonese, Hindi, Gujarati, English, and Slovenian. Some of these languages – like Cantonese and Slovenian – exhibit relatively transparent morphological rules in their counting systems, which may allow children to readily infer that number words – and therefore numbers – can be freely generated from rules, and therefore are infinite. Other languages, like Hindi and Gujarati, have highly opaque counting systems, and may make it harder for children to infer such rules. I conclude that the fundamental logical properties that support learning mathematics can also be found in natural language. I end by speculating about why number words are so difficult for children to acquire, and also why not all humans constructed count systems historically.
Bio: Dr. Barner’s research program engages three fundamental problems that confront the cognitive sciences. The first problem is how we can explain the acquisition of concepts that do not transparently reflect properties of the physical world, whether these express time, number, or logical content found in language. What are the first assumptions that children make about such words when they hear them in language, and what kinds of evidence do they use to decode their meanings? Second, he is interested in how linguistic structure affects learning, and whether grammatical differences between languages cause differences in conceptual development. Are there concepts that are easier to learn in some languages than in others? Or do cross-linguistic differences have little effect on the rate at which concepts emerge in language development? Dr. Barner studies these case studies taking a cross-linguistic and cross-cultural developmental approach informed by methods in both psychology and linguistics, and studies children learning Cantonese, Mandarin, Japanese, Hindi, Gujarati, Arabic, Slovenian, Spanish, French, and English, among others.

McGill at NELS 49

The 49th meeting of the Northeast Linguistics Society (NELS 49) took place 5-7 October at Cornell. The following papers and posters were presented by current McGillians.

Number inflection, Spanish Bare Interrogatives, and Higher-Order Quantification
Luis Alonso-Ovalle and Vincent Rouillard

Feet are parametric – even in languages with stress
Guilherme D. Garcia and Heather Goad

Control-Forming Domains are Not Only Phases: Evidence for Probe Horizons
Jurij Božič (poster)

Domain restriction and noun classifiers in Chuj (Mayan)
Justin Royer (poster)

McGill affiliates of present and past gathered for a photo:

Carol-Rose Little (BA hon 2012), Luis Alonso-Ovalle, Vincent Rouillard (BA hon 2017), Mark Baker (McGill prof 1986-1998), Nico Baier, Justin Royer, Heather Goad

MCQLL Meeting October, 10/3

This week, MCQLL meeting will be meeting Wednesday from 5:30pm to 7:30 in room 117. Greg Theos will present about his work in analyzing data from lexical decision tasks.

Symposium in Honour of Lydia White

The department hosted a reunion of some of Lydia White’s students at Thomson House, August 31stSeptember 1st. Lydia officially retired August 31st but will still continue doing research. Congratulations Lydia!

Symposium on Second Language Acquisition in Honour of Lydia White

We are pleased to announce that the Department of Linguistics will be hosting the Symposium on Second Language Acquisition in Honour of Lydia White, August 31–September 1, 2018. The program is attached. Everyone is invited to attend. You can find the program here.

We gratefully acknowledge the support of our McGill sponsors: Provost’s Research Fund, Dean of Arts’ Development Fund, as well as the Department of Linguistics.

Alonso-Ovalle, Shimoyama,and Schwarz Awarded Insight Grant

Congratulations to Luis Alonso-Ovalle, Junko Shimoyama, and Bernhard Schwarz who have been awarded an SSHRC Insight grant for their application Modality across Categories: Modal Indefinites and the Projection of Possibilities!

Semantics Reading Group, Friday April 6th

Bernhard Schwarz and Mathieu Paillé will be giving a practice talk for
WCCFL, on the subject of wh-complements with ‘know’. We will be meeting
on Friday, April 6th at 3pm in room 117.

Colloquium: Susana Béjar, 23/02

Susana Béjar from the University of Toronto will giving a talk entitled “Person, Agree, and Derived Predicates” as part of the McGill Linguistics Colloquium Series on Friday, February 23th at 3:30pm in room 433 of the Education Building. All are welcome to attend! For the abstract and for any other colloquium information, please visit the Colloquium Series web page: https://www.mcgill.ca/linguistics/events/colloquium-series.

Goodhue published in Semantics & Pragmatics, and Goodhue & Wagner published in Glossa

Daniel Goodhue’s paper “Must p is felicitous only if p is not known” has recently been published in Semantics & Pragmatics.
Daniel Goodhue and Michael Wagner’s paper “Intonation, yes and no” has recently been published in Glossa. Both papers are open access.
Congratulations to both!

Jessica Coon Receives National Geographic Explorers Grant

Jessica Coon received a National Geographic Explorers Grant to fund research and documentation on Ch’ol (Mayan) during her time in Mexico this year. The title of her project is “Documenting word order variation in Mayan languages: A collection of Ch’ol narratives.” The project will involve training workshops on language documentation in several Ch’ol communities in collaboration with Ch’ol-speaking linguists; recording, transcription and publication of Ch’ol narratives; and analysis of word order variation.

Linguists at Arts Undergraduate Research Event

Linguistics undergraduates presented the results of their summer work at the Arts Annual Undergraduate Research Event, January 18th. The five students who won summer internships to conduct research with linguistics faculty members in 2017 were:

“Documentation and Revitalization of the Chuj Language”
Paulina Elias, Linguistics
Prof. Jessica Coon, Linguistics
PDF icon Paulina Elias [.pdf]

“Perceptual Discrimination of /s/ in Hearing Impaired Children”
Fiona Higgins, Linguistics
Prof. Heather Goad, Linguistics
PDF icon Fiona Higgins [.pdf]

“Understanding high adverbs in Malagasy and the nature of clefts”
Clea Stuart, Linguistics
Prof. Lisa Travis, Linguistics
PDF icon Clea Stuart [.pdf]

“How does structured variability help talker adaption?”
Claire Suh, Linguistics
Prof. Meghan Clayards, Linguistics
PDF icon Claire Suh [.pdf]

“Syntactic Representation and Processing in L2 Acquisition”
Yunxiao (Vera) Xia, Linguistics
Prof. Lydia White, Linguistics
PDF icon Yunxiao (Vera) Xia [.pdf]

Colloquium: Sharon Goldwater, 01/12

Sharon Goldwater from the University of Edinburgh will be giving a talk entitled Bootstrapping Language Acquisition as part of the McGill Linguistics Colloquium Series on Friday, January 12th at 3:30pm in room 433 of the Education Building. All are welcome to attend! For the abstract and for any other colloquium information, please clear here to visit the Colloquium Series web page.

McGill at the 21st Amsterdam Colloquium

The bi-annual Amsterdam Colloquium (http://events.illc.uva.nl/AC/AC2017/) brings together linguists, philosophers, logicians, cognitive scientists and computer scientists who share an interest in the formal study of the semantics and pragmatics of natural and formal languages. The 2017 edition, which took place December 20-22, featured four presentations by current and former Department members:
  1. Brian Buccola (PhD 2015, http://brianbuccola.com/) (with Andreas Haida): “Expressing agent indifference in German”
  2. Mitcho Erlewine (Postdoc 2014-15; https://mitcho.com/) (with Hadas Kotek, Postdoc 2014-15, http://hkotek.com/): “Intervention tracks scope-rigidity in Japanese”
  3. Bernhard Schwarz: “On question exhaustivity and NPI licensing”
  4. Alexandra (Sasha) Simonenko (PhD 2014, http://people.linguistics.mcgill.ca/~alexandra.simonenko/): “Towards a semantic typology of specifi city markers”
Group picture, from left to right: Bernhard, Mitcho,  Sasha, Brian
« Older Entries
Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.