Has Deep Learning Uncovered Chomsky’s Deep Structure?
November 27, 2016
After Trump’s shocking victory, many of our professors began class with an opportunity for us to voice any fears or feelings we were harboring. One of my professors spoke about how studying linguistics is a way to study what unites us as humans: this strange ability called “language.” Despite all of our languages looking and sounding different, all humans have this amazing ability to learn complex rules and thousands of words in our first few years of existence. Moreover, we do this without being prodded to learn and without much explicit instruction. Language is something that should, at its core, unite us, not divide us.
Earlier this week, Google Research announced a breakthrough in “one-shot” machine translation. What this means is that Google Translate can now perform translations on unseen pairs of languages. Typically, a machine translation algorithm needs to be trained on each language pair, e.g. English <–> French and French <–> Spanish. But Google’s latest results can perform translations from, e.g., English <–> Korean by only being trained on pairs of other languages (see Google’s visual representation below). In essence, they are only training the machine on the “gist” of language or language relationships, rather than a specific pairing.
The Google team calls this “interlingua.” For linguists, this underlying abstract form has been the basis of their field since Chomsky’s earliest writings. “Deep Structure” or D-structure, is distinct from “Surface Structure,” or S-structure; where Deep Structure is something like the Platonic form, the S-structure is the concrete realization in the phonetic sounds of a sentence. For example, the sentences I love New York and New York is loved by me both have essentially the same meaning. According to Chomsky, the D-structure of both of these sentences is the same, and the deep structure is transformed in different ways en route to the different respective surface realizations.
The field of generative syntax has been primarily concerned with elucidating the rules and constraints that each and all languages undergo during this transformational process. If we can unwind these transformations, peeling back layer upon layer of surface structure, then we can uncover the deep structure underlying all of language.
And now, it’s my turn to be speculative: For the last 20 years, computational linguists have been trying to apply the rules and constraints of generative syntax to the computational tasks of natural language understanding and translation. However, rules-based accounts have been less successful than the more flexible probability-based algorithms. The result has been that many “language engineers” have become dismissive of the rules-based Chomskian community.
But if we (speculatively) assume that Google’s algorithms have uncovered an underlying interlingua, then perhaps this means that Chomsky’s notion of D-structure has been right all along, we’ve just been going about the process of uncovering it in the wrong way. Whereas generative syntacticians base most of their findings on patterns in a single language or single collection of languages, maybe the real findings lie in the space between languages, the glue that binds it all together.
Of course, the findings of many deep learning-based systems are notoriously difficult to suss apart, so we don’t really know what the features of this possible interlingua look like. While this is frustrating, I suppose it also means there is still plenty of work left for a budding computational linguist. And if we can start to elucidate the ties that linguistically bind us, maybe we can elucidate the ties that bind humanity, as well.
Sound Discrimination vs. Meaningful Sound Discrimination, and Phonological Neighborhoods
December 6, 2015
This is a continuation of a series of posts exploring the process of relearning language and sound processing with my new hearing implants, Auditory Brainstem Implants. The first two posts can be found here and here. Although it’s difficult to distill my experiences down to a single theme, I am slowly realizing that a vast amount of understanding speech comes down to making useful discrimination of phonemes.
What is an phoneme, you might ask?
Great question! A phoneme is one of the most basic units of sound within phonology. The word red, for instance, consists of three distinct phonemes: /ɹ/, /ɛ/ and /d/. However, there is not always a one-to-one correspondence of letter to phoneme. For example, the word through also consists of only three phonemes, corresponding to th, r, and oo. (/θ/, /ɹ/ and /u/ in IPA).
At the most basic level, we can discriminate between two different words if there is at least one different phoneme. When only one phoneme differentiates the pronunciation of two different words, these words are known as a minimal pair. The words knit (nɪt) and gnat (/næt/) are a minimal pair, because they only differ by one phoneme, the middle vowel.
But each phoneme can actually have different variations, called allophones. For example (stolen straight from wikipedia), the /p/ phoneme is actually pronounced differently in actually pin (/pʰɪn/) versus spin (/spɪn/). Most native speakers are unaware of these variations in pronunciation, and if a different allophone is used for the same phoneme, the word will probably still be understandable, but just sound “weird.” Two different words will always differ by at least one phoneme, not by just one allophone. For the sake of this post, I’ll call discriminating between allophones “non-useful sound discrimination.”
Useful Sound Discrimination
If some sound discrimination really isn’t all that useful, then what is useful? The ability to discriminate between phonemes that have a high neighborhood density. And what is neighborhood density? From a recent paper by Susanne Gahl and colleagues: “two words are considered neighbors if they differ by deletion, insertion, or substitution of one segment” (Gahl, et al. 2012). For instance, the word tad has a bunch of phonological neighbors, such as rad, fad, dad, toad and add. The word osteoporosis, on the other hand, has no phonological neighbors.
For me, the important thing is to relearn how to discriminate between phonemes that often live in the same phonological neighborhood. This is something that normal hearing individuals do effortlessly, and our very sophisticated auditory system is an expert at differentiating between these different frequencies in a sound signal.
For my limited auditory system, consisting of an ABI that replaces tens of thousands of hair cells with a few dozen electrodes, this discrimination is a nontrivial task. This hit home for me during a therapy session in which I could not, for the life of me, differentiate between the sounds /oo/ and /mm/. For my ABI, both of these sounds activated the exact same electrode pattern.
When I am practicing phoneme discrimination, my therapist covers his mouth, so I cannot also use lipreading. When I can use lipreading, discriminating between /oo/ and /mm/ becomes easy. Moreover, /oo/ and /mm/ rarely are phonological neighbors. That is, there are very few words where /oo/ could be replaced with /mm/, and this would result in a different, intelligible word. The only exception might be an addition/deletion, such as zoo and zoom. Nonetheless, zoo and zoom are not contextual neighbors, i.e. I cannot think of a sentence where zoo and zoom could be used to fill the same slot (The rhinoceros at the zoo stole my lollipop vs *The rhinoceros at the zoom stole my lollipop).
So, am I screwed?
Probably not. What’s remarkable about human communication and information transmission is that we find ways to adapt and filter out the most critical information. For instance, Esteban Buz and Florian Jaeger (Buz & Jaeger, 2012) found that context also plays a significant role in how much or how little we articulate or hyper-articulate certain words. And as long as you’re not a “low talker,” I should be fine.
Sound Processing to Sentence Processing
November 22, 2015
This is Part 2 of my series of articles chronicling the process of auditory rehabilitation therapy from the perspective of a linguist. Part 1, Learning Linguistics; Relearning to Hear, can be found here.
The past few weeks of therapy have provided a number of interesting experiences. I’m not sure I can tie them all into a cohesive narrative, so I’ll just focus on one area: sentence processing as distinct from sound processing.
Filling in the Blanks
I am fascinated by how the mind processes language. Further, I’m fascinated by how we build up a sentence, and go from a series of sound waves to building meaningful words.
A common exercise in rehabilitation therapy is for the therapist to say a sentence with their mouth covered (so I can’t get anything from lipreading) and have me (try to) repeat the sentence. But an interesting thing happened the other week. Let’s say the sentence was The teacher talked to the students. What I heard was:
___ ___ ___ ___ ___ students.
But once I heard/understood students I immediately could piece together the rest of what I had heard. If my therapist had stopped before students, I would have said that I hadn’t understood anything. But since I could piece together the sentence after I understood the final word, it means I clearly got something from the words I thought I completely missed. Clearly my mind was storing those sound patterns as “something.” Were they candidate words, each with an assigned probability weight? And once I understood students, the probability weights crossed a critical threshold and formed a meaningful sentence.
This experience is distinct from a very similar experience I’ve had, which reflects the well-known concept of “priming.” In that case, a certain stimulus restricts the domain of possibilities, and influences subsequent responses. For example, on the first repetition of a sentence, I got:
___ ___ ___ a pie
So I knew the domain of the sentence was food, baking, etc. This made it much easier to get the other words when the sentence was repeated. Similarly, I’ve experienced a form of syntactic priming, where I understood the logical or functional structure of the sentence initially, mostly from prosodic cues. In other words, I got:
[SOMEBODY] [DID AN ACTION] [TO ANOTHER SOMEBODY]
Upon repetition of the sentence, I could restrict the domain of each word to a noun, a verb and another noun, respectively. Not that it isn’t super cool that our minds can do this, but it’s not quite as mystifying as the first example.
In the first example, all of the processing occurred “on the fly.” I didn’t need a second repetition to understand the sentence, but rather backfilled it using some sort of semantic representations of word-forms, that initially I thought I had completely missed. But clearly some form of information was transmitted through those sounds.
All of this seems to be bundled under the rubric of sentence processing. I am fascinated by this process and the amazing things our mind naturally does, that my mind is currently relearning to do.
Stay tuned for Part 3, on relearning allophones.
Learning Linguistics; Relearning to Hear
October 15, 2015
This is the first in a series of posts centered around learning theoretical linguistics, specifically phonology, and learning how to use my new hearing implants through post-implantation rehabilitation therapy.
I would consider the nexus of these two events to be unique, in that I am experiencing linguistics from both sides: on the one hand learning about theoretical sound systems in the classroom, while at the same time learning a concrete application of those theories in therapy. And as much as I’m enjoying the experience, I also wanted to document it. Mostly for me, but I’m sure that other people will find it interesting, as well.
Not many people have the opportunity (or burden) to have to learn how to hear again, from the ground up, making sense of stimuli that were passively received and processed for the first 30 years of their life.
To provide some clarification, sound is now an incredibly non-precise stimulus for me, where I can tell that a sound is being produced but cannot discriminate between that sound and a similar sound. A good analogy is to imagine the PA announcements in the NYC subways in the 1980s/1990s. If you don’t know what I’m talking about, this video might help to clarify.
Essentially, all different sounds seem as if they’re coming through the most lo-fi speaker system available, which is essentially what happens when you have to rely on organs other than our incredibly sophisticated ears to do the hearing.
I’ve already seen a few parallels between the two sides. A couple that stand out:
- The International Phonetic Alphabet (IPA) is used to represent and categorize the sounds of an oral language. In each of my therapy sessions, we tackle a new category of sounds, such as fricatives or plosives, and learn to discriminate between the primary individual sounds. I love that an abstract rubric from my text book plays such an import role in re-learning how to hear.
- My therapist mentioned “minimal pairs” the day after we learned about them in class. I found minimal pairs so interesting because they are such an intuitive, but non-obvious part of categorizing sounds. It’s cool that these are so essential for hearing therapy, as well.
This post is just the kick off. I have very little idea what direction to take things from here. What questions do you have? What kinds of things should I document as I go through both processes? Feel free to leave questions in the comments section below.