My research is primarily in the vein of computational (psycho)linguistics. I use mountains of data to tease out trends in how we process language, and thereby elucidate cognitive processes. My current research investigates how timing surrounding computer keyboard typing can uniquely identify an individual, and how these patterns reflect underlying cognition.
- Multi-word expressions (MWEs) are phrases such as “kick the bucket” or “a lot” which are made up of multiple words but are single lexical units. I study how we store and retrieve these types of lexical units in working memory. According to current hypotheses, MWEs are stored as a single unit, rather than word by word. My current research investigates how this is realized during language output, specifically typed text.
- Andrew Rosenberg and I had a paper accepted to the 11th Workshop on Multiword Expressions at NAACL-HLT 2015. Muddying The Multiword Expression Waters: How Cognitive Demand Affects Multiword Expression Production (paper) (slides)
- I presented some initial findings in a poster at the CUNY Research Day (3/28/14) Here is a pdf of my poster: Detecting Multi-word Expressions Through Typing Patterns.
- I gave a more thorough presentation at the 15th meeting of the Texas Linguistic Society.
Typing patterns are unique to individuals. Just as everyone has a unique voice, such as pitch, rate of speech, and vocabulary, everyone has a unique way of typing. We frame typing patterns by the keystroke’s linguistic context, such as the word within which the keystroke was produced, or that word’s lexical category. For example, a typist might produce TH differently in a noun versus in a verb.
We utilize keystroke dynamics for two tasks: individual identification and cohort identification. The first task uses keystroke dynamics to better identify an individual. The latter task uses keystroke dynamics to try to predict a typist’s demographics, e.g. native language or gender, from their typing patterns.
- Andrew Rosenberg, David-Guy Brizan and I were invited to submit a manuscript to a special edition journal, Image and Vision Computing: Best of Biometrics. The final draft, Utilizing Overt and Latent Linguistic Structure to Improve Keystroke-Based Authentication has been accepted for publication.
- Andrew Rosenberg and I presented a paper at BTAS 2015 in Washington, D.C., Improvements to Keystroke-Based Authentication By Adding Linguistic Context.
- David-Guy Brizan, Andrew Rosenberg and I recently had a paper accepted to the International Journal of Human-Computer Studies Utilizing linguistically-enhanced keystroke dynamics to predict typist cognition and demographics
- Some questions that still vex me:
- How do temporal patterns surrounding typing reflect cognition and language familiarity?
- Why do we pause in the midst of typing?
- Do certain linguistic phenomena induce pauses in typing?
Computational Linguistics vs Natural Language Processing
In order to frame my research goals, I’d like to throw my hat in the “Natural Language Processing vs. Computational Linguistics” ring. A great deal of ink has been spilled on the subject, so a bit more spillage can’t hurt.Computational Linguistics
In English, a modifying adjective precedes the noun. Thus, computational linguistics is a form of linguistics. For me, computational linguistics is using computational power as a means to better understand linguistic phenomena. Rather than examining a few dozen sentences, a computer allows a linguist to study thousands or millions of sentences.
Importantly, a computer cannot do anything a human cannot do. A computer just performs human actions billions of times faster than a human. The goal of computational linguistics, in my mind, is to further our understanding of the very human phenomena of language, as practiced by humans. CL just uses the awesome speed of a computer to accelerate this understanding.
As an analogy, computational biology does not aim to reinvent biology. Rather, it harnesses the power of computers to better simulate biological events. Petri dishes are slow and laborious, whereas computers are fast and light. Rather than reinvent the building blocks of life, computational biology aims to better understand it.
Natural Language Processing
The driving goal of natural language processing (NLP) is to generate and process natural (as opposed to computational) language. This goal really doesn’t take into account how the processing is accomplished. If it’s accomplished using mechanisms similar to human language processing, great! If not, so be it.
As an example, let’s say we invented a microchip that was trillions of times more powerful than the human brain, and this microchip could perfectly parse every natural sentence it encountered. From an NLP perspective, this would be a complete victory, even though the microchip is performing calculations that are known to be impossible for human beings. This isn’t to say that an NLP researcher wouldn’t strive for greater efficiency. Rather, replicating human processes is not the end-goal.
I still haven’t figured out whether CL and NLP are inextricably interdependent or discrete domains. But I think the answer is…both.
I do not necessarily think that the best NLP solutions will come from mimicking human language processing and I do not think the best insights into language processing will come from more accurate and faster NLP.
We, as humans, are clumsy, emotional and slow, and probably don’t process language in an optimal sense. But perhaps an understanding of how and why we are dumb, emotional and slow will aid NLP research. As Chris Manning reminds us in a fun, short article in Computational Linguistics, early work on neural networks was strongly influenced by David Rumelhart’s work on cognition. My own research into typos and why we make mistakes was originally influenced by some of Rumelhart’s early work, as well.
My primary research goal is to use computational language processing to better elucidate underlying cognition. But a closely-tied, secondary research goal is to improve NLP and machine learning solely to make computers better at processing human language. There is a lot of overlap in these goals, and maybe my third goal should be to figure out where that overlap starts and ends…
My Erdős number is at most four (A Goodkind to K McKeown to Z Galil to N Alon to P Erdős).
My Bacon-Erdős number is, unfortunately, still undefined.