Add TypeWell to your feed

Byword: A blog by and for the TW community

Releasing the Hounds: How Song Identification Apps Can Increase the Accuracy of Your Transcripts

If you’re old enough, like me, you can remember listening to a song on the radio and wondering, "What song is that? Who’s that artist?" What followed was usually a lot of waiting around for the DJ to say more, or else you might spend the rest of the week harassing your friends, humming a few off-key bars from memory and asking if someone, anyone, can name that tune.

And if you’re old enough, like me, free apps like SoundHound and Shazam, which can listen and identify music by its digital footprint at near-instantaneous speed, seem like sorcerer’s magic. 

As a remote service provider without visual access to what is happening in class, there’s no way for me to identify, by sound, the performances being shown.

For audiophiles, this is a great convenience—it allows you to quickly keep track of the new artists you like and build out your playlists. But Shazam and SoundHound have some practical application to the work we do as Transcribers, as well.

Imagine that I’m providing services for a Dance History class. To illustrate various well-known dances, the instructor uses a lot of course media. And because these videos are primarily musical numbers taken from classical ballet, show tunes, and other popular dance programs, there is little in the way of narration or context. Closed Captioning may be sparse or nonexistent. Transcriber summary will be a necessity.

To make matters even more challenging, as a remote service provider, I do not have visual access to what is happening in class. Not being a film or dance buff myself, there’s no way for me to identify, by sound, the performances being shown. But I do have my phone. And with it, I have access to the technology that can help me answer these questions in real time.

Pexels Photo

Scenario #1: 

I hear a man talking to what appears to be a little girl. She seems lonesome because she can’t dance. So the man begins singing to her.

I have no idea what film this comes from. I can’t even begin to guess. Something with Shirley Temple, maybe?

By activating one of the apps I’ve mentioned above and holding my phone up to my headset, I can, within seconds, identify the song being sung (“The Worry Song”) and the person doing the singing (Gene Kelly). From there, a quick Google search tells me that the film is Anchors Aweigh and the little girl’s voice I am hearing is not, in fact, Shirley Temple, but Sara Berner (who is providing the voice of the animated character Jerry Mouse from Tom & Jerry). 

My Transcriber Summary goes from this:

[Transcriber's Summary: A man talks to a little girl (or possibly a very young boy with a squeaky voice). The girl (or boy) seems lonely. She (or he) can't sing or dance. The man suggests that the girl (or boy) try being happy and begins singing about remaining positive. Presumably, they dance.] 

to this:

[Transcriber's Summary: From Anchors Aweigh, Gene Kelly talks to Jerry Mouse (from Tom & Jerry) about seeming lonesome. Jerry Mouse laments that he can't sing or dance. Kelly suggests that he try being happy and begins singing "The Worry Song" about remaining positive. They dance.] 


Scenario #2: 

I hear absolutely no dialogue and have no idea what is happening other than big band music being played. Since it’s a dance class, I’m guessing that someone is dancing. 

Fred Astaire and Ginger Rogers dancing to Cole Porter’s "Night and Day."

Thanks to technology, I can quickly fill in the gaps. My Transcriber Summary goes from this:

[Transcriber’s Summary: Music playing. Audience applauds.]

to this:

[Transcriber’s Summary: Cole Porter’s “Night and Day” from The Gay Divorcee plays. Fred Astaire and Ginger Rogers dance. Audience applauds.]

I can do this because the app I’ve chosen to use identifies the song and its artist. It also tells me that the album is named Ginger & Fred: 40 Songs from Musicals. A quick Google search for “Astaire and Rogers Night and Day” gives me the name of the film and confirms for me that this is an important dance number in cinematic history.


Scenario #3: 

The instructor shows a clip from Dancing with the Stars. I know just enough about that show to recognize that it is being shown, but I have no idea who is speaking or which performer they are judging. 

Without technology, my Transcriber Summary would look something like this:

[Transcriber’s Summary: Dancing with the Stars clip plays. Two performers dance to a pop song sung by a man who sounds a lot like Phil Collins. Applause. The judges all praise the male performer’s footwork and carriage. The performer says that dancing is much different from being an Olympic athlete and competing in whichever sport he competes in. It’s more fun and relaxing. The judges reveal their scores: 8, 9, and 9. Applause.  The performer is happy because he has overcome some technical struggles with his footwork. The host mentions that it is important to share amateur success stories, as well. 

Next, a woman from New Zealand discusses her experience tap dancing and studying ballet. She says her real love is hip hop. She taught herself by copying what she saw on television. She wants to work hard and create new goals. Applause. A pop song plays, sung a female artist. Applause.]

Dancing Stars Studio

"I know just enough about Dancing with the Stars to recognize it, but I have no idea who is speaking or which performer they are judging."

If I can use SoundHound or Shazam to identify even one of the songs, then I can Google that song title and “Dancing with the Stars” and, in only a few seconds, find all of the information I need to perform an accurate summary, like this:

[Transcriber’s Summary: Dancing with the Stars clip plays. Apolo Anton Ohno and Julianne Hough dance the quickstep to Phil Collins’s “Two Hearts.” Applause. The judges all praise Ohno’s footwork and carriage. Ohno says that dancing is much different from being an Olympic speed skater. It’s more fun and relaxing. The judges reveal their scores: 8, 9, and 9. Applause.  Ohno is happy because he has overcome some technical struggles with his footwork. Host Tom Bergeron mentions that it is important to share amateur success stories, as well. A clip plays.

Next, dancer Parri$ Goebel from New Zealand discusses her experience tap dancing and studying ballet. She says her real love is hip hop. She taught herself by copying what she saw on television. She wants to work hard and create new goals. Applause. Parri$ Goebel now dances live in the studio to Alicia Keys’s “Girl on Fire.” Applause.]

The technology is so powerful that before Alicia Keys even sings her first lyric, I have already identified the song and discovered that Parri$ Goebel (whose name I would not have otherwise spelled correctly) is the person who danced to it on DWTS.

While these tools are undoubtedly useful in editing a transcript after class or when providing post-production captioning, a transcriber should be able to use them to capture detailed information like this in real time, as well (instead of sitting idly and listening to the music play). By employing the assistance of these song-identifying apps, and by inviting them to listen as a supplement to our transcribing, we become better service providers, able to provide clearer communication access for the client and a more accurate picture of what the media signifies.

Kap%20square

Jason Kapcala

Jason Kapcala is Coordinator of Auxiliary Aids for West Virginia University’s Office of Accessibility Services. He is also the author of "North to Lakeville", a short story collection forthcoming on Urban Farmhouse Press.

comments powered by Disqus