Add TypeWell to your feed

Byword: A blog by and for the TW community

The Magic Behind the TypeWell Dictionary (Part 2)

If you use social media, you might have seen TypeWell send out the occasional “thank you” to a transcriber who submits a new word that came up in one of their classes. This is a continuation of last week’s interview with Steve Colwell, who explained what makes the TypeWell abbreviation system so unique, and how he went about expanding the dictionary to over half a million words.

When someone sends TypeWell a word suggestion, how do you go about determining whether to include it in the dictionary?

I personally look up each word in a variety of dictionaries to try to verify that it's a real word. Over the years I've developed quite an interest in words, so I'm a fan of several word-of-the-day lists too, so I can usually tell pretty quickly what's up with a word. 

Sometimes the "word" is really a misspelling or lesser usage of the proper word, and I'll discuss those with the submitter to decide what to do. Even if a word is "improper", we'll often include it in the dictionary because if a teacher says an improper word, well, you need to capture that. So improper words are a feature, too.

What’s an example of an “improper” or grey-area word?

Let's see, an example of an improper word — well, this is the sort of thing: copaceticness. That's not a word, but someone might use it as a form of copacetic, which means something is “OK”. We tend not to add words like that to the dictionary unless people really use it occasionally. 

Dictionary

There's a large grey area of semi-words, with fuzzier boundaries than one might think. We have 500,000 words, and probably at least that many additional grey-area words that we've pruned to keep the dictionary high-quality. So often, a newly requested word is already in our grey-area list and I just have to move it — along with its plural, variations with -ity and -ness and other endings — to the main dictionary.

Any dictionary updates are all rolled together into the next TypeWell revision so everyone gets it right away. 

If you type a real word, TypeWell won't mess with it.

How do you “teach” the TypeWell dictionary a new word in all its variations? 

We include each separate variation of a word as a separate "word" in the dictionary. In the old days we auto-generated words with different endings, but now every variant is individually required to have some frequency of occurrence in the database. That has made the dictionary a lot more accurate. 

What are some examples of a dictionary “bug” and how do you find and fix them?

That's a big topic since there are many different kinds of issues, but one example is a word that is in the dictionary and shouldn't be. For instance, it might have PowerPoint and powerpoint, with different capitalizations. Which is right? If both are right, should we have both in the dictionary and let people comma-cycle from one to the other? It can be tricky because we need a solution that's very flexible so it works for many different ways of using TypeWell. 

Another example is an uncommon short word like diss that "gets in the way" because people would like to use that short term as an abbreviation for a more common longer word, like disease. If it's a really rare short word, the right answer is to show the more common longer word. But usually we try to hew to the rule that "if you type a real word, TypeWell won't mess with it," so then the short word wins. It takes a lot of this kind of tuning to make TypeWell work as smoothly as it does. Otherwise it would be getting in your way all the time.  

Will you share a couple examples of word-of-the-day lists that you particularly like?  

Judy and I like two in particular: wordsmith.org, and m-w.com (Merriam Webster). They have fairly advanced words so they don't run into the problem that most lists have, of rehashing the same couple thousand SAT-level words. Sometimes they're just regular words, like pratfall (a humiliating failure, from the original meaning for a fall onto one's bottom), and divulge. Other times they're more challenging, like cullet (scrap glass for remelting) and nimiety (excess). Ok, I admit, those are not words that have much use to most of us. Only for true word-nerds.

Word Of The Day

Merriam-Webster's "Word of the Day" page

Speaking of word-nerds, how do you learn new words to build your own vocabulary?

To remember words, I've been forced to use computer help — I use a program called Anki that is similar to flashcards. It makes it possible to learn a word a day and actually remember it, which I never used to be able to do. I highly recommend Anki for learning new languages or facts from classes, too.

All of Steve’s friends will tell you that he’s pretty unbeatable at Scrabble...
Any challengers?

Kate1-square

Kate Ervin

Kate became a TypeWell transcriber in 2004 and began training new transcribers in 2009. She has served as TypeWell's Executive Director since 2011.

comments powered by Disqus