Has Deep Learning Uncovered Chomsky’s Deep Structure?

After Trump’s shocking victory, many of our professors began class with an opportunity for us to voice any fears or feelings we were harboring. One of my professors spoke about how studying linguistics is a way to study what unites us as humans: this strange ability called “language.” Despite all of our languages looking and sounding different, all humans have this amazing ability to learn complex rules and thousands of words in our first few years of existence. Moreover, we do this without being prodded to learn and without much explicit instruction. Language is something that should, at its core, unite us, not divide us.

Earlier this week, Google Research announced a breakthrough in “one-shot” machine translation. What this means is that Google Translate can now perform translations on unseen pairs of languages. Typically, a machine translation algorithm needs to be trained on each language pair, e.g. English <–> French and French <–> Spanish. But Google’s latest results can perform translations from, e.g., English <–> Korean by only being trained on pairs of other languages (see Google’s visual representation below). In essence, they are only training the machine on the “gist” of language or language relationships, rather than a specific pairing.

The Google team calls this “interlingua.” For linguists, this underlying abstract form has been the basis of their field since Chomsky’s earliest writings. “Deep Structure” or D-structure, is distinct from “Surface Structure,” or S-structure; where Deep Structure is something like the Platonic form, the S-structure is the concrete realization in the phonetic sounds of a sentence. For example, the sentences I love New York and New York is loved by me both have essentially the same meaning. According to Chomsky, the D-structure of both of these sentences is the same, and the deep structure is transformed in different ways en route to the different respective surface realizations.

The field of generative syntax has been primarily concerned with elucidating the rules and constraints that each and all languages undergo during this transformational process. If we can unwind these transformations, peeling back layer upon layer of surface structure, then we can uncover the deep structure underlying all of language.

And now, it’s my turn to be speculative: For the last 20 years, computational linguists have been trying to apply the rules and constraints of generative syntax to the computational tasks of natural language understanding and translation. However, rules-based accounts have been less successful than the more flexible probability-based algorithms. The result has been that many “language engineers” have become dismissive of the rules-based Chomskian community.

But if we (speculatively) assume that Google’s algorithms have uncovered an underlying interlingua, then perhaps this means that Chomsky’s notion of D-structure has been right all along, we’ve just been going about the process of uncovering it in the wrong way. Whereas generative syntacticians base most of their findings on patterns in a single language or single collection of languages, maybe the real findings lie in the space between languages, the glue that binds it all together.

Of course, the findings of many deep learning-based systems are notoriously difficult to suss apart, so we don’t really know what the features of this possible interlingua look like. While this is frustrating, I suppose it also means there is still plenty of work left for a budding computational linguist. And if we can start to elucidate the ties that linguistically bind us, maybe we can elucidate the ties that bind humanity, as well.