Research

My research is primarily in the vein of computational (psycho)linguistics. I use mountains of data to tease out trends in how we process language, and thereby elucidate cognitive processes. My current research investigates how timing surrounding computer keyboard typing can uniquely identify an individual, and how these patterns reflect underlying cognition.

Multi-word Expressions

Keystroke Dynamics

Typing patterns are unique to individuals. Just as everyone has a unique voice, such as pitch, rate of speech, and vocabulary, everyone has a unique way of typing. We frame typing patterns by the keystroke’s linguistic context, such as the word within which the keystroke was produced, or that word’s lexical category. For example, a typist might produce TH differently in a noun versus in a verb.

We utilize keystroke dynamics for two tasks: individual identification and cohort identification. The first task uses keystroke dynamics to better identify an individual. The latter task uses keystroke dynamics to try to predict a typist’s demographics, e.g. native language or gender, from their typing patterns.


Computational Linguistics vs Natural Language Processing

In order to frame my research goals, I’d like to throw my hat in the “Natural Language Processing vs. Computational Linguistics” ring. A great deal of ink has been spilled on the subject, so a bit more spillage can’t hurt.

Computational Linguistics

In English, a modifying adjective precedes the noun. Thus, computational linguistics is a form of linguistics. For me, computational linguistics is using computational power as a means to better understand linguistic phenomena. Rather than examining a few dozen sentences, a computer allows a linguist to study thousands or millions of sentences.

Importantly, a computer cannot do anything a human cannot do[citation needed]. A computer just performs human actions billions of times faster than a human. The goal of computational linguistics, in my mind, is to further our understanding of the very human phenomena of language, as practiced by humans. CL just uses the awesome speed of a computer to accelerate this understanding.

As an analogy, computational biology does not aim to reinvent biology. Rather, it harnesses the power of computers to better simulate biological events. Petri dishes are slow and laborious, whereas computers are fast and light. Rather than reinvent the building blocks of life, computational biology aims to better understand it.

Natural Language Processing

The driving goal of natural language processing (NLP) is to generate and process natural (as opposed to computational) language. This goal really doesn’t take into account how the processing is accomplished. If it’s accomplished using mechanisms similar to human language processing, great! If not, so be it.

As an example, let’s say we invented a microchip that was trillions of times more powerful than the human brain, and this microchip could perfectly parse every natural sentence it encountered. From an NLP perspective, this would be a complete victory, even though the microchip is performing calculations that are known to be impossible for human beings. This isn’t to say that an NLP researcher wouldn’t strive for greater efficiency. Rather, replicating human processes is not the end-goal.

My Research

I still haven’t figured out whether CL and NLP are inextricably interdependent or discrete domains. But I think the answer is…both.

I do not necessarily think that the best NLP solutions will come from mimicking human language processing and I do not think the best insights into language processing will come from more accurate and faster NLP.

We, as humans, are clumsy, emotional and slow, and probably don’t process language in an optimal sense. But perhaps an understanding of how and why we are dumb, emotional and slow will aid NLP research. As Chris Manning reminds us in a fun, short article in Computational Linguistics, early work on neural networks was strongly influenced by David Rumelhart’s work on cognition. My own research into typos and why we make mistakes was originally influenced by some of Rumelhart’s early work, as well.

My primary research goal is to use computational language processing to better elucidate underlying cognition. But a closely-tied, secondary research goal is to improve NLP and machine learning solely to make computers better at processing human language. There is a lot of overlap in these goals, and maybe my third goal should be to figure out where that overlap starts and ends…


Links


My Erdős number is at most four (A Goodkind to K McKeown to Z Galil to N Alon to P Erdős).
My Bacon-Erdős number is, unfortunately, still undefined.