Has Deep Learning Uncovered Chomsky’s Deep Structure?
November 27, 2016
After Trump’s shocking victory, many of our professors began class with an opportunity for us to voice any fears or feelings we were harboring. One of my professors spoke about how studying linguistics is a way to study what unites us as humans: this strange ability called “language.” Despite all of our languages looking and sounding different, all humans have this amazing ability to learn complex rules and thousands of words in our first few years of existence. Moreover, we do this without being prodded to learn and without much explicit instruction. Language is something that should, at its core, unite us, not divide us.
Earlier this week, Google Research announced a breakthrough in “one-shot” machine translation. What this means is that Google Translate can now perform translations on unseen pairs of languages. Typically, a machine translation algorithm needs to be trained on each language pair, e.g. English <–> French and French <–> Spanish. But Google’s latest results can perform translations from, e.g., English <–> Korean by only being trained on pairs of other languages (see Google’s visual representation below). In essence, they are only training the machine on the “gist” of language or language relationships, rather than a specific pairing.
The Google team calls this “interlingua.” For linguists, this underlying abstract form has been the basis of their field since Chomsky’s earliest writings. “Deep Structure” or D-structure, is distinct from “Surface Structure,” or S-structure; where Deep Structure is something like the Platonic form, the S-structure is the concrete realization in the phonetic sounds of a sentence. For example, the sentences I love New York and New York is loved by me both have essentially the same meaning. According to Chomsky, the D-structure of both of these sentences is the same, and the deep structure is transformed in different ways en route to the different respective surface realizations.
The field of generative syntax has been primarily concerned with elucidating the rules and constraints that each and all languages undergo during this transformational process. If we can unwind these transformations, peeling back layer upon layer of surface structure, then we can uncover the deep structure underlying all of language.
And now, it’s my turn to be speculative: For the last 20 years, computational linguists have been trying to apply the rules and constraints of generative syntax to the computational tasks of natural language understanding and translation. However, rules-based accounts have been less successful than the more flexible probability-based algorithms. The result has been that many “language engineers” have become dismissive of the rules-based Chomskian community.
But if we (speculatively) assume that Google’s algorithms have uncovered an underlying interlingua, then perhaps this means that Chomsky’s notion of D-structure has been right all along, we’ve just been going about the process of uncovering it in the wrong way. Whereas generative syntacticians base most of their findings on patterns in a single language or single collection of languages, maybe the real findings lie in the space between languages, the glue that binds it all together.
Of course, the findings of many deep learning-based systems are notoriously difficult to suss apart, so we don’t really know what the features of this possible interlingua look like. While this is frustrating, I suppose it also means there is still plenty of work left for a budding computational linguist. And if we can start to elucidate the ties that linguistically bind us, maybe we can elucidate the ties that bind humanity, as well.
NAACL ’15 Roundup
June 7, 2015
I just returned from NAACL 2015 in beautiful Denver, CO. This was my first “big” conference, so I didn’t know quite what to expect. Needless to say, I was blown away (for better or for worse).
First, a side note: I’d like to thank the NAACL and specifically the conference chair Rada Mihalcea for providing captions during the entirety of the conference. Although there were some technical hiccups, we all got through them. Moreover, Hal Daume and the rest of the NAACL board were extrememly receptive to expanding accessibility going forward. I look forward to working with all of them.
Since this was my first “big” conference, this is also my first “big” conference writeup. Let’s see how it goes.
Keynote #1: Lillian Lee Big Data Pragmatics etc….
- This was a really fun and insightful talk to open the conference. There were a few themes within Lillian’s talk, but my two favorite were why movie quotes become popular and why we use hedging. Regarding the first topic, my favorite quote was: “When Arnold says, ‘I’ll be back’, everyone talked about it. When I say ‘I’ll be back’, you guys are like ‘Well, don’t rush!'”
- The other theme I really enjoyed was “hedging” and why we do it. I find this topic fascinating, since it’s all around us. For instance, in saying “I’d claim it’s 200 yards away” we add no new information with I’d claim.” So why do we say it? I think this is also a hallmark of hipster-speak, e.g. “This is maybe the best bacon I’ve ever had.”
Ehsan Mohammady Ardehaly & Aron Culotta Inferring latent attributes of Twitter users with label regularization
- This paper uses a lightly-supervised method to infer attributes like age and political orientation. It therefore avoids the need for costly annotation. One way that they infer attributes is by determining which Twitter accounts are central to a certain class. Really interesting, and I need to read the paper in-depth to fully understand it.
One Minute Madness
- This was fun. Everyone who presented a poster had one minute to preview/explain their poster. Some “presentations” were funny and some really pushed the 60-second mark. Joel Tetreault did a nice job enforcing the time limit. Here’s a picture of the “lineup” of speakers.
Nathan Schneider & Noah Smith A Corpus and Model Integrating Multiword Expressions and Supersenses
- Nathan Schneider has been doing some really interesting semantic work, whether on FrameNet or MWEs. Here, the CMU folks did a ton of manual annotation of the “supersense” of words and MWEs. Not only do they manage to achieve some really impressive results on tagging of MWES, but they also have provided a really valuable resource to the MWE community in the form of their STREUSLE 2.0 corpus of annotated MWEs/supersenses.
Keynote #2: Fei-Fei Li A Quest for Visual Intelligence in Computers
- This was a fascinating talk. The idea here is to combine image recognition with semantics/NLP. For a computer to really “identify” something, it has to understand its meaning; pixel values are not “meaning.” I wish I had taken better notes, but Fei-Fei’s lab was able to achieve some incredibly impressive results. Of course, even the best image recognition makes some (adorable) mistakes.
Manaal Faruqui et al. Retrofitting Word Vectors to Semantic Lexicons
- This was one of the papers that won a Best Paper Award, and for good reason. It addresses a fundamental conflict in computational linguistics, specifically within computational semantics: distributional meaning representation vs. lexical semantics. The authors combine distributional vector representation with information from lexicons such as WordNet and FrameNet, and achieve significantly higher accuracy in semantic evaluation tasks from multiple languages. Moreover, their methods are highly modular, and they have made their tools available online. This is something I look forward to tinkering around with.
Some posters that I really enjoyed
- Oracle and Human Baselines for Native Language Identification – Shervin Malmasi, Joel Tetreault and Mark Dras
- Lexicon-Free Conversational Speech Recognition with Neural Networks – Andrew L. Maas, Ziang Xie, Dan Jurafsky, Andrew Y. Ng
- Using Zero-Resource Spoken Term Discovery for Ranked Retrieval – Jerome White et al.
- Recognizing Textual Entailment using Dependency Analysis and Machine Learning – Nidhi Sharma, Richa Sharma and Kanad K. Biswas
- Deep learning and neural nets are still breaking new ground in NLP. If you’re in the NLP domain, it would behoove you to gain a solid understanding of them, because they can achieve some incredibly impressive results.
- Word embeddings: The running joke throughout the conference was that if you wanted your paper to be accepted, it had to include “word embeddings” in the title. Embeddings were everywhere (I think I saw somewhere that ~30% of the posters included this is their title). Even Chris Manning felt the need to comment on this in his talk/on Twitter:
RT @aidotech: RT aidotech: Chris actually showing a tweet on his slides! #deeplearning #naacl2015 pic.twitter.com/GWI7rDiQVC
— StanfordCSLI (@StanfordCSLI) June 5, 2015
Takeaways for Future Conferences
- I should’ve read more of the papers beforehand. Then I would have been better prepared to ask good questions and get more out of the presentations.
- As Andrew warned me beforehand “You will burn out.” And he was right. There’s no way to fully absorb every paper at every talk you attend. At some point, it becomes beneficial to just take a breather and do nothing. I did this Wednesday morning, and I’m really glad I did it.
- Get to breakfast early. If you come downstairs 10 minutes before the first session, you’ll be scraping the (literal) bottom of the barrel on the buffet line.
Shameless self-citation: Here is the paper Andrew and I wrote for the conference.