Category: Deaf

Comparing Captioned Phones: ClearCaptions Vs. CapTel

The interwebs are severely lacking any objective comparisons of the two major captioned (landline) telephones on the market: The ClearCaptions Ensemble and the CapTel 840i.

I’ve been using the CapTel for a few years, but the ClearCaptions Ensemble has a 90-day trial period. So I figured I had nothing to lose by trying it out.

Before I get to the captioning quality, which is admittedly the most important aspect, here are a few notes on other aspects of the phones:

Appearance/User Interface
CapTel 840i

CapTel 840i

The CapTel phone is decidedly unsexy. It is pretty old, large and clumsy. I’ve been using the 840i, which does not have a touch screen, for a few years. However, when I went to the CapTel site, I see they now have a touchscreen version, the 2400i. I may have to try that in the future.

The ClearCaptions Ensemble looks much nicer. It is also all touchscreen, except for the power button. However, the touchscreen interface is horrendous. As I said to a friend, “It has a touch interface, but you wish that it didn’t.” When dialing a number, there is no delete if you make a mistake. In addition, the dialpad changes to an awkward double row of numbers when you’ve already entered a few of the numbers you’re trying to call. In short, there is zero usability advantage in the fact that the Ensemble has a touchscreen, and usually it is less usable than the clunky CapTel 840i.

ClearCaptions Ensemble

ClearCaptions Ensemble

Captioning Quality

Obviously this is the most critical aspect of a captioned phone. Below I’ve posted a video with a side-by-side comparison. I used a YouTube video of a person speaking, to ensure that the audio was identical for each trial. Go ahead and check out the video first. I apologize in advance for some of the shaky camera work. My hands were starting to get very tired (see below for an explanation).

As you can see, the speed and accuracy of the CapTel phone is superior to ClearCaptions. Not seen here is the dozen or so trials I did with the ClearCaptions phone, using a different, lower quality video that better portrayed a one-sided phone call. Most of the time, the ClearCaptions phone did not caption anything, and I had to start the call again. The CapTel phone never had any issues with the other video. (This is why my hands (and I) were getting so tired/shaky.)

Additionally, one of the aspects of the ClearCaptions phone that I was excited about is that it supposedly integrates human captioning with automatic/computer-generated captioning. This supposedly makes it faster.As a computational linguist/NLPer, this sounded great! However, as can be seen above, there is no speed or accuracy advantage. When making real calls with the ClearCaptions phone, there are many times when the automatic captions are completely incomprehensible.


While I love sleekness and gadgetry in my smartphone, the most important aspect of a captioned landline phone is reliability: It just has to work. The CapTel phone works faster and more consistently. That’s really all I need to know.

Captioning Around the Country: CART vs C-Print

In the past 6 weeks, I have interviewed or attended Open Houses at 8 different schools around the country. Don’t get me wrong, I am flattered and humbled by the positive responses I received from my PhD applications.

But: It. Was. Exhausting.

Nonetheless, it provided an opportunity to try out different captioning systems and see what captioning is like in places that are not New York City.

First off, at every school I visited, I was able to secure captioning accommodations. It’s a good lesson that as long as you’re proactive and explain exactly what you need, most schools are able to comply. Thank you to all of the administrators and coordinators who helped set this up.

That being said, all captioning is not created equal. The experience made me realize that I’ve been pretty spoiled in New York City, with a relative abundance of well-qualified captionists at my disposal. The following bullet points are largely to serve as a comparison of CART captioning and C-Print, because after extensive googling I found zero qualitative comparisons.

  • The first observation is not a comparison. Rather, it is a direct experience with the phenomena of “discount captioners,” as described by Mirabai Knight, one of the most proficient and strongly activist captionists I’ve used. So-called “CART firms” will troll court reporting schools for mid-level students and use them to charge cash-strapped schools extremely low rates. The result is a terrible experience for students, and a blemish on the reputation of CART captioning.
    • At one school, I actually pulled a professor aside as we were changing rooms and said, “I’m going to have to rely 100% on reading your lips, because I have literally no idea what the captioner is writing.” As Mirabai’s article explains, this is unfortunately all too common, as many schools do not realize that only highly-proficient, highly-trained captioners can provide a sufficient experience for deaf and hard-of-hearing students.
  • CART vs C-Print
    • Mirabai provides a bunch of great reasons why C-Print can fall short of CART captioning. I only used C-Print twice, whereas I’ve been using CART multiple times a week for the better part of 3 years. I’d strongly encourage anyone interested to check out Mirabai’s article.
      • Overall, C-Print was…fine. But when it comes to hearing, “fine” ≠ “adequate.”
      • C-Print does not advertise itself as a literal, word-for-word transcription. Rather, they only “ensure” that the most important things are transcribed. But “importance” is completely at the discretion of the captioner. There were a few occasions where I know the C-Print captioner did not transcribe words that I would consider important, such as the name of an institution where a researcher was located.
      • A C-Print captionist uses a QWERTY keyboard, and depends on a program where they type many abbreviations that the program expands to full words. This usually works well enough, but C-Print is definitely at least 1-2 seconds slower than CART. While 1-2 seconds may not sound like a long time, I would defy you to try having a conversation with someone where things lag 1-2 seconds behind. You’ll quickly see just how significant 1-2 seconds can be.
      • C-Print can be advantageous in noisy situations where an in-person captioner is not available. I used C-Print at a lunch, in an environment that definitely could not have used remote captioning. In this case, a slower, more summarizing transcription is better than a word-for-word transcription that cannot eliminate a high level of background noise.

tl;dr: C-Print captioning is an okay substitution when in-person captioning is not available. But in no way should an institution feel that providing C-Print captioning is the equivalent of providing the transcription provided by CART captioning.

Sound Discrimination vs. Meaningful Sound Discrimination, and Phonological Neighborhoods

This is a continuation of a series of posts exploring the process of relearning language and sound processing with my new hearing implants, Auditory Brainstem Implants. The first two posts can be found here and here. Although it’s difficult to distill my experiences down to a single theme, I am slowly realizing that a vast amount of understanding speech comes down to making useful discrimination of phonemes.

What is an phoneme, you might ask?

Great question! A phoneme is one of the most basic units of sound within phonology. The word red, for instance, consists of three distinct phonemes: /ɹ/, /ɛ/ and /d/. However, there is not always a one-to-one correspondence of letter to phoneme. For example, the word through also consists of only three phonemes, corresponding to thr, and oo. (/θ/, /ɹ/ and /u/ in IPA).

At the most basic level, we can discriminate between two different words if there is at least one different phoneme. When only one phoneme differentiates the pronunciation of two different words, these words are known as a minimal pair. The words knit (nɪt) and gnat (/næt/) are a minimal pair, because they only differ by one phoneme, the middle vowel.

But each phoneme can actually have different variations, called allophones. For example (stolen straight from wikipedia), the /p/ phoneme is actually pronounced differently in actually pin (/pʰɪn/) versus spin (/spɪn/). Most native speakers are unaware of these variations in pronunciation, and if a different allophone is used for the same phoneme, the word will probably still be understandable, but just sound “weird.” Two different words will always differ by at least one phoneme, not by just one allophone. For the sake of this post, I’ll call discriminating between allophones “non-useful sound discrimination.”

Useful Sound Discrimination

If some sound discrimination really isn’t all that useful, then what is useful? The ability to discriminate between phonemes that have a high neighborhood density. And what is neighborhood density? From a recent paper by Susanne Gahl and colleagues: “two words are considered neighbors if they differ by deletion, insertion, or substitution of one segment” (Gahl, et al. 2012). For instance, the word tad has a bunch of phonological neighbors, such as rad, fad, dad, toad and add. The word osteoporosis, on the other hand, has no phonological neighbors.

For me, the important thing is to relearn how to discriminate between phonemes that often live in the same phonological neighborhood. This is something that normal hearing individuals do effortlessly, and our very sophisticated auditory system is an expert at differentiating between these different frequencies in a sound signal.

For my limited auditory system, consisting of an ABI that replaces tens of thousands of hair cells with a few dozen electrodes, this discrimination is a nontrivial task. This hit home for me during a therapy session in which I could not, for the life of me, differentiate between the sounds /oo/ and /mm/. For my ABI, both of these sounds activated the exact same electrode pattern.

When I am practicing phoneme discrimination, my therapist covers his mouth, so I cannot also use lipreading. When I can use lipreading, discriminating between /oo/ and /mm/ becomes easy. Moreover, /oo/ and /mm/ rarely are phonological neighbors. That is, there are very few words where /oo/ could be replaced with /mm/, and this would result in a different, intelligible word. The only exception might be an addition/deletion, such as zoo and zoom. Nonetheless, zoo and zoom are not contextual neighbors, i.e. I cannot think of a sentence where zoo and zoom could be used to fill the same slot (The rhinoceros at the zoo stole my lollipop vs *The rhinoceros at the zoom stole my lollipop).

So, am I screwed?

Probably not. What’s remarkable about human communication and information transmission is that we find ways to adapt and filter out the most critical information. For instance, Esteban Buz and Florian Jaeger (Buz & Jaeger, 2012) found that context also plays a significant role in how much or how little we articulate or hyper-articulate certain words.  And as long as you’re not a “low talker,” I should be fine.

Sound Processing to Sentence Processing

Language processing in the mind

Language Processing

This is Part 2 of my series of articles chronicling the process of auditory rehabilitation therapy from the perspective of a linguist. Part 1, Learning Linguistics; Relearning to Hear, can be found here.

The past few weeks of therapy have provided a number of interesting experiences. I’m not sure I can tie them all into a cohesive narrative, so I’ll just focus on one area: sentence processing as distinct from sound processing.

Filling in the Blanks

I am fascinated by how the mind processes language. Further, I’m fascinated by how we build up a sentence, and go from a series of sound waves to building meaningful words.

A common exercise in rehabilitation therapy is for the therapist to say a sentence with their mouth covered (so I can’t get anything from lipreading) and have me (try to) repeat the sentence. But an interesting thing happened the other week. Let’s say the sentence was The teacher talked to the students. What I heard was:

___ ___ ___ ___ ___ students.

But once I heard/understood students I immediately could piece together the rest of what I had heard. If my therapist had stopped before students, I would have said that I hadn’t understood anything. But since I could piece together the sentence after I understood the final word, it means I clearly got something from the words I thought I completely missed. Clearly my mind was storing those sound patterns as “something.” Were they candidate words, each with an assigned probability weight? And once I understood students, the probability weights crossed a critical threshold and formed a meaningful sentence.

This experience is distinct from a very similar experience I’ve had, which reflects the well-known concept of “priming.” In that case, a certain stimulus restricts the domain of possibilities, and influences subsequent responses. For example, on the first repetition of a sentence, I got:

___ ___ ___ a pie

So I knew the domain of the sentence was food, baking, etc. This made it much easier to get the other words when the sentence was repeated. Similarly, I’ve experienced a form of syntactic priming, where I understood the logical or functional structure of the sentence initially, mostly from prosodic cues. In other words, I got:


Upon repetition of the sentence, I could restrict the domain of each word to a noun, a verb and another noun, respectively. Not that it isn’t super cool that our minds can do this, but it’s not quite as mystifying as the first example.

In the first example, all of the processing occurred “on the fly.” I didn’t need a second repetition to understand the sentence, but rather backfilled it using some sort of semantic representations of word-forms, that initially I thought I had completely missed. But clearly some form of information was transmitted through those sounds.

All of this seems to be bundled under the rubric of sentence processing. I am fascinated by this process and the amazing things our mind naturally does, that my mind is currently relearning to do.

Stay tuned for Part 3, on relearning allophones.

Learning Linguistics; Relearning to Hear


Speaking linguistics

Speaking linguistics

This is the first in a series of posts centered around learning theoretical linguistics, specifically phonology, and learning how to use my new hearing implants through post-implantation rehabilitation therapy.

I would consider the nexus of these two events to be unique, in that I am experiencing linguistics from both sides: on the one hand learning about theoretical sound systems in the classroom, while at the same time learning a concrete application of those theories in therapy. And as much as I’m enjoying the experience, I also wanted to document it. Mostly for me, but I’m sure that other people will find it interesting, as well.

Not many people have the opportunity (or burden) to have to learn how to hear again, from the ground up, making sense of stimuli that were passively received and processed for the first 30 years of their life.

To provide some clarification, sound is now an incredibly non-precise stimulus for me, where I can tell that a sound is being produced but cannot discriminate between that sound and a similar sound. A good analogy is to imagine the PA announcements in the NYC subways in the 1980s/1990s. If you don’t know what I’m talking about, this video might help to clarify.

Essentially, all different sounds seem as if they’re coming through the most lo-fi speaker system available, which is essentially what happens when you have to rely on organs other than our incredibly sophisticated ears to do the hearing.

I’ve already seen a few parallels between the two sides. A couple that stand out:

  • The International Phonetic Alphabet (IPA) is used to represent and categorize the sounds of an oral language. In each of my therapy sessions, we tackle a new category of sounds, such as fricatives or plosives, and learn to discriminate between the primary individual sounds. I love that an abstract rubric from my text book plays such an import role in re-learning how to hear.
  • My therapist mentioned “minimal pairs” the day after we learned about them in class. I found minimal pairs so interesting because they are such an intuitive, but non-obvious part of categorizing sounds. It’s cool that these are so essential for hearing therapy, as well.

This post is just the kick off. I have very little idea what direction to take things from here. What questions do you have? What kinds of things should I document as I go through both processes? Feel free to leave questions in the comments section below.