Category: Academia

Captioning Around the Country: CART vs C-Print

In the past 6 weeks, I have interviewed or attended Open Houses at 8 different schools around the country. Don’t get me wrong, I am flattered and humbled by the positive responses I received from my PhD applications.

But: It. Was. Exhausting.

Nonetheless, it provided an opportunity to try out different captioning systems and see what captioning is like in places that are not New York City.

First off, at every school I visited, I was able to secure captioning accommodations. It’s a good lesson that as long as you’re proactive and explain exactly what you need, most schools are able to comply. Thank you to all of the administrators and coordinators who helped set this up.

That being said, all captioning is not created equal. The experience made me realize that I’ve been pretty spoiled in New York City, with a relative abundance of well-qualified captionists at my disposal. The following bullet points are largely to serve as a comparison of CART captioning and C-Print, because after extensive googling I found zero qualitative comparisons.

  • The first observation is not a comparison. Rather, it is a direct experience with the phenomena of “discount captioners,” as described by Mirabai Knight, one of the most proficient and strongly activist captionists I’ve used. So-called “CART firms” will troll court reporting schools for mid-level students and use them to charge cash-strapped schools extremely low rates. The result is a terrible experience for students, and a blemish on the reputation of CART captioning.
    • At one school, I actually pulled a professor aside as we were changing rooms and said, “I’m going to have to rely 100% on reading your lips, because I have literally no idea what the captioner is writing.” As Mirabai’s article explains, this is unfortunately all too common, as many schools do not realize that only highly-proficient, highly-trained captioners can provide a sufficient experience for deaf and hard-of-hearing students.
  • CART vs C-Print
    • Mirabai provides a bunch of great reasons why C-Print can fall short of CART captioning. I only used C-Print twice, whereas I’ve been using CART multiple times a week for the better part of 3 years. I’d strongly encourage anyone interested to check out Mirabai’s article.
      • Overall, C-Print was…fine. But when it comes to hearing, “fine” ≠ “adequate.”
      • C-Print does not advertise itself as a literal, word-for-word transcription. Rather, they only “ensure” that the most important things are transcribed. But “importance” is completely at the discretion of the captioner. There were a few occasions where I know the C-Print captioner did not transcribe words that I would consider important, such as the name of an institution where a researcher was located.
      • A C-Print captionist uses a QWERTY keyboard, and depends on a program where they type many abbreviations that the program expands to full words. This usually works well enough, but C-Print is definitely at least 1-2 seconds slower than CART. While 1-2 seconds may not sound like a long time, I would defy you to try having a conversation with someone where things lag 1-2 seconds behind. You’ll quickly see just how significant 1-2 seconds can be.
      • C-Print can be advantageous in noisy situations where an in-person captioner is not available. I used C-Print at a lunch, in an environment that definitely could not have used remote captioning. In this case, a slower, more summarizing transcription is better than a word-for-word transcription that cannot eliminate a high level of background noise.

tl;dr: C-Print captioning is an okay substitution when in-person captioning is not available. But in no way should an institution feel that providing C-Print captioning is the equivalent of providing the transcription provided by CART captioning.

Sound Discrimination vs. Meaningful Sound Discrimination, and Phonological Neighborhoods

This is a continuation of a series of posts exploring the process of relearning language and sound processing with my new hearing implants, Auditory Brainstem Implants. The first two posts can be found here and here. Although it’s difficult to distill my experiences down to a single theme, I am slowly realizing that a vast amount of understanding speech comes down to making useful discrimination of phonemes.

What is an phoneme, you might ask?

Great question! A phoneme is one of the most basic units of sound within phonology. The word red, for instance, consists of three distinct phonemes: /ɹ/, /ɛ/ and /d/. However, there is not always a one-to-one correspondence of letter to phoneme. For example, the word through also consists of only three phonemes, corresponding to thr, and oo. (/θ/, /ɹ/ and /u/ in IPA).

At the most basic level, we can discriminate between two different words if there is at least one different phoneme. When only one phoneme differentiates the pronunciation of two different words, these words are known as a minimal pair. The words knit (nɪt) and gnat (/næt/) are a minimal pair, because they only differ by one phoneme, the middle vowel.

But each phoneme can actually have different variations, called allophones. For example (stolen straight from wikipedia), the /p/ phoneme is actually pronounced differently in actually pin (/pʰɪn/) versus spin (/spɪn/). Most native speakers are unaware of these variations in pronunciation, and if a different allophone is used for the same phoneme, the word will probably still be understandable, but just sound “weird.” Two different words will always differ by at least one phoneme, not by just one allophone. For the sake of this post, I’ll call discriminating between allophones “non-useful sound discrimination.”

Useful Sound Discrimination

If some sound discrimination really isn’t all that useful, then what is useful? The ability to discriminate between phonemes that have a high neighborhood density. And what is neighborhood density? From a recent paper by Susanne Gahl and colleagues: “two words are considered neighbors if they differ by deletion, insertion, or substitution of one segment” (Gahl, et al. 2012). For instance, the word tad has a bunch of phonological neighbors, such as rad, fad, dad, toad and add. The word osteoporosis, on the other hand, has no phonological neighbors.

For me, the important thing is to relearn how to discriminate between phonemes that often live in the same phonological neighborhood. This is something that normal hearing individuals do effortlessly, and our very sophisticated auditory system is an expert at differentiating between these different frequencies in a sound signal.

For my limited auditory system, consisting of an ABI that replaces tens of thousands of hair cells with a few dozen electrodes, this discrimination is a nontrivial task. This hit home for me during a therapy session in which I could not, for the life of me, differentiate between the sounds /oo/ and /mm/. For my ABI, both of these sounds activated the exact same electrode pattern.

When I am practicing phoneme discrimination, my therapist covers his mouth, so I cannot also use lipreading. When I can use lipreading, discriminating between /oo/ and /mm/ becomes easy. Moreover, /oo/ and /mm/ rarely are phonological neighbors. That is, there are very few words where /oo/ could be replaced with /mm/, and this would result in a different, intelligible word. The only exception might be an addition/deletion, such as zoo and zoom. Nonetheless, zoo and zoom are not contextual neighbors, i.e. I cannot think of a sentence where zoo and zoom could be used to fill the same slot (The rhinoceros at the zoo stole my lollipop vs *The rhinoceros at the zoom stole my lollipop).

So, am I screwed?

Probably not. What’s remarkable about human communication and information transmission is that we find ways to adapt and filter out the most critical information. For instance, Esteban Buz and Florian Jaeger (Buz & Jaeger, 2012) found that context also plays a significant role in how much or how little we articulate or hyper-articulate certain words.  And as long as you’re not a “low talker,” I should be fine.

Learning Linguistics; Relearning to Hear


Speaking linguistics

Speaking linguistics

This is the first in a series of posts centered around learning theoretical linguistics, specifically phonology, and learning how to use my new hearing implants through post-implantation rehabilitation therapy.

I would consider the nexus of these two events to be unique, in that I am experiencing linguistics from both sides: on the one hand learning about theoretical sound systems in the classroom, while at the same time learning a concrete application of those theories in therapy. And as much as I’m enjoying the experience, I also wanted to document it. Mostly for me, but I’m sure that other people will find it interesting, as well.

Not many people have the opportunity (or burden) to have to learn how to hear again, from the ground up, making sense of stimuli that were passively received and processed for the first 30 years of their life.

To provide some clarification, sound is now an incredibly non-precise stimulus for me, where I can tell that a sound is being produced but cannot discriminate between that sound and a similar sound. A good analogy is to imagine the PA announcements in the NYC subways in the 1980s/1990s. If you don’t know what I’m talking about, this video might help to clarify.

Essentially, all different sounds seem as if they’re coming through the most lo-fi speaker system available, which is essentially what happens when you have to rely on organs other than our incredibly sophisticated ears to do the hearing.

I’ve already seen a few parallels between the two sides. A couple that stand out:

  • The International Phonetic Alphabet (IPA) is used to represent and categorize the sounds of an oral language. In each of my therapy sessions, we tackle a new category of sounds, such as fricatives or plosives, and learn to discriminate between the primary individual sounds. I love that an abstract rubric from my text book plays such an import role in re-learning how to hear.
  • My therapist mentioned “minimal pairs” the day after we learned about them in class. I found minimal pairs so interesting because they are such an intuitive, but non-obvious part of categorizing sounds. It’s cool that these are so essential for hearing therapy, as well.

This post is just the kick off. I have very little idea what direction to take things from here. What questions do you have? What kinds of things should I document as I go through both processes? Feel free to leave questions in the comments section below.

NAACL ’15 Roundup

I just returned from NAACL 2015 in beautiful Denver, CO. This was my first “big” conference, so I didn’t know quite what to expect. Needless to say, I was blown away (for better or for worse).

First, a side note: I’d like to thank the NAACL and specifically the conference chair Rada Mihalcea for providing captions during the entirety of the conference. Although there were some technical hiccups, we all got through them. Moreover, Hal Daume and the rest of the NAACL board were extrememly receptive to expanding accessibility going forward. I look forward to working with all of them.

Since this was my first “big” conference, this is also my first “big” conference writeup. Let’s see how it goes.

Packed ballroom for keynote

Keynote #1: Lillian Lee Big Data Pragmatics etc….

  • This was a really fun and insightful talk to open the conference. There were a few themes within Lillian’s talk, but my two favorite were why movie quotes become popular and why we use hedging. Regarding the first topic, my favorite quote was: “When Arnold says, ‘I’ll be back’, everyone talked about it. When I say ‘I’ll be back’, you guys are like ‘Well, don’t rush!'”
  • The other theme I really enjoyed was “hedging” and why we do it. I find this topic fascinating, since it’s all around us. For instance, in saying “I’d claim it’s 200 yards away” we add no new information with I’d claim.” So why do we say it? I think this is also a hallmark of hipster-speak, e.g. “This is maybe the best bacon I’ve ever had.”

Ehsan Mohammady Ardehaly & Aron Culotta Inferring latent attributes of Twitter users with label regularization

  • This paper uses a lightly-supervised method to infer attributes like age and political orientation. It therefore avoids the need for costly annotation. One way that they infer attributes is by determining which Twitter accounts are central to a certain class. Really interesting, and I need to read the paper in-depth to fully understand it.

One Minute Madness

  • This was fun. Everyone who presented a poster had one minute to preview/explain their poster. Some “presentations” were funny and some really pushed the 60-second mark. Joel Tetreault did a nice job enforcing the time limit. Here’s a picture of the “lineup” of speakers.

Long line of speakers

Nathan Schneider & Noah Smith A Corpus and Model Integrating Multiword Expressions and Supersenses

  • Nathan Schneider has been doing some really interesting semantic work, whether on FrameNet or MWEs. Here, the CMU folks did a ton of manual annotation of the “supersense” of words and MWEs. Not only do they manage to achieve some really impressive results on tagging of MWES, but they also have provided a really valuable resource to the MWE community in the form of their STREUSLE 2.0 corpus of annotated MWEs/supersenses.

Keynote #2: Fei-Fei Li A Quest for Visual Intelligence in Computers

  • This was a fascinating talk. The idea here is to combine image recognition with semantics/NLP. For a computer to really “identify” something, it has to understand its meaning; pixel values are not “meaning.” I wish I had taken better notes, but Fei-Fei’s lab was able to achieve some incredibly impressive results. Of course, even the best image recognition makes some (adorable) mistakes.
baby holding a toothbrush is mislabelled as

Middle caption: “a young boy is holding a baseball bat”

Manaal Faruqui et al. Retrofitting Word Vectors to Semantic Lexicons

  • This was one of the papers that won a Best Paper Award, and for good reason. It addresses a fundamental conflict in computational linguistics, specifically within computational semantics: distributional meaning representation vs. lexical semantics. The authors combine distributional vector representation with information from lexicons such as WordNet and FrameNet, and achieve significantly higher accuracy in semantic evaluation tasks from multiple languages. Moreover, their methods are highly modular, and they have made their tools available online. This is something I look forward to tinkering around with.

Some posters that I really enjoyed

Overall impressions

  • Deep learning and neural nets are still breaking new ground in NLP. If you’re in the NLP domain, it would behoove you to gain a solid understanding of them, because they can achieve some incredibly impressive results.
  • Word embeddings: The running joke throughout the conference was that if you wanted your paper to be accepted, it had to include “word embeddings” in the title. Embeddings were everywhere (I think I saw somewhere that ~30% of the posters included this is their title). Even Chris Manning felt the need to comment on this in his talk/on Twitter:

Takeaways for Future Conferences

  • I should’ve read more of the papers beforehand. Then I would have been better prepared to ask good questions and get more out of the presentations.
  • As Andrew warned me beforehand “You will burn out.” And he was right. There’s no way to fully absorb every paper at every talk you attend. At some point, it becomes beneficial to just take a breather and do nothing. I did this Wednesday morning, and I’m really glad I did it.
  • Get to breakfast early. If you come downstairs 10 minutes before the first session, you’ll be scraping the (literal) bottom of the barrel on the buffet line.

Shameless self-citation: Here is the paper Andrew and I wrote for the conference.

Would An Idiot Do That?

The Office bestowed many bits of wisdom upon us. My favorite gem is from Dwight Schrute, when he recounts the best advice he was ever given: Don’t be an idiot. He then expands upon this nugget:

Before I do anything, I ask myself, “Would an idiot do that?” And if the answer is yes, I do not do that thing.

Yes, this is Dwight being Dwight. But there is wisdom to be gained from this notion, which I am [proud/embarrassed] to admit I think about frequently.

When working — be it programming, linguistics, writing, etc. — it’s easy enough to burn out. If a program isn’t working as expected, I might try changing variables, at random, in a desperate attempt to get it to work. Or, worse, I’ll sit there staring at my screen.

This is how an idiot works

Good ideas rarely happen when you’re doing the same thing over and over again, e.g. flipping variables or staring blankly. If I catch myself in this sort of loop, I will, a la Dwight, stop doing That Thing. The stopping of That Thing can mean doing anything that is not That Thing, from stretching/walking around, to getting a cup of coffee, to going grocery shopping. The main idea is, The Thing I am doing, or my current approach to The Thing, is not working. Only an idiot would keep trying the same approach to The Thing, and expect a different, or more successful, outcome.

And the funny thing is, this works! If I’m stuck on a programming problem that I’m sure is unsolvable, it’s uncanny how often the solution presents itself 2 minutes after I return from my break. I guess it pays to not be an idiot.

The Knowledge Business and Peter Higgs

A few months ago, The Guardian published an interviewer with Peter Higgs, namesake of the Higgs Boson, or “God Particle.” In the interview, Higgs talks about how he wouldn’t be considered productive, or prolific enough to survive in today’s academic system, and wouldn’t have been granted tenure. The upshot of his statement is that the academic system today is obsessed with publishing, and would not reward a genius like Higgs.

However, I disagree with this assessment of the academic system. Being an academic is not about sitting in an overstuffed leather armchair and conjuring brilliant ideas from the ether. That’s the domain of the armchair philosopher. To me, being an academic means you have an obligation to (a) create ideas that are provable and testable, and (b) share those ideas with the world. There’s a saying that academics work “for the greater good” (as opposed to producing material gains). But to benefit the greater good, one needs to share his or her ideas. The “greater good” does not benefit from brilliant ideas that remain just that, brilliant ideas. Rather, the greater good benefits from the dissemination of knowledge.

To be sure, there are issues with the academic publishing business, And these issues certainly merit further discussion, and real changes. Nonetheless, peer-reviewed publishing still constitutes a very effective metric of academic productivity. And a university needs some sort of metric by which to judge output. After all, a professor is an employee of a university, and employees of all stripes and sizes are under the burden of demonstrating their value.

The ironic thing about Peter Higgs’ assertion is that he was employed by Edinburgh University for 36 years, largely because the university did not want to let go of someone who was a candidate for a Nobel Prize. In other words, every system will always find ways to accommodate unique individuals. Edinburgh University clearly recognized Higgs’ singular genius, and paid him a nice salary for 36 whole years, just to prove it!

In sum, there are certainly issues with academia, and with peer-reviewed publishing. Nonetheless, academics have a responsibility to those institutions which support them: To not only create brilliant ideas, but share those ideas, as well.

Now I’ll stop procrastinating, and get back to my writing….