Using Speech-to-Text Software for Transcription

Have you ever wished you could simply dictate an entire novel, rather than typing it out at a desk? Are you a podcaster or youtuber who needs to produce a transcript of your audio quickly and effortlessly?

I’m sure you are already familiar with dictation. At some point, all of us have tried dictating a text message into our phone to a friend (and laughed about it later if the software made a mistake).

But maybe you never considered other applications of speech-to-text software (STT), also known as automatic speech recognition (ASR).

The applications, as you can imagine, are endless. We think of our cellphones with this technology; the ability to send texts or record notes using only our voice is a small-scale personal benefit. The auto industry, the health industry, even the military are developing or are already using applied speech software. But you don’t have to be a large company to make use of this technology; if you have a small business, you can use STT too.

Let’s return to the publishing domain. In my experience as a proofreader, I have come across STT in the blogging sphere. Content creators record audio or video, usually in the form of an interview or even a monologue, and use STT software to convert the audio into copy for their blog. Their content is therefore more easily repurposed so they can generate revenue from multiple platforms. They don’t have to spend time typing out the entire interview, or hire a copywriter to create a summary of the topics discussed. STT saves them time and money.

Unfortunately, as with all computer-assisted technology, speech-to-text software is not without its flaws. Different factors will cause errors in the text produced by STT. Changes in the pace of your voice, whether you are reading something out or merely speaking casually, background noise, even your accent and of course your enunciation are just some of the factors that can affect the quality of your results.

Here is a little experiment if you are considering relying on STT to help you produce written content. Search the web for free STT software. If you do not already have audio that you want to convert, pick up a book or magazine and read out a few minutes’ worth of text.

After trying a couple of absolutely dismal STT sites, I signed up for Amazon’s STT software, called Amazon Transcribe. While I dictated, here is what the software churned out:

Dorothy was an innocent, harmless little girl who had been carried by a cyclone many miles from home.
And she had never killed anything in all her life.
With the little woman evidently expected her to answer.
So, Dorothy said the hesitation
You are very kind, but there must be some mistake. I have not killed anything.
Your house did anyway, replied that old woman with a laugh, and that is the same thing. See?
She continued, pointing to the corner of the house. There are two toes still sticking out from under a block of wood.
Dorothy looked and gave a little cry of fright.
There. Indeed, just under the corner of the great being. The house rested on two feet were sticking out shot and silver shoes with pointed toes.
Oh, dear. Oh, dear. Cry Dorothy, Clasping her hands together in dismay. The house must have fallen on her. Whatever shall we do?
There is nothing to be done, said the little woman calmly.
But how was she as Dorothy?
She was the lichen, which from the East as I said, as in the little woman
She has held all the munchkins in bondage for many years, making them slaves for her night and day.
Now they were all set free and are grateful to you for the favor.

Transcribed by AWS from The Wonderful Wizard of Oz by Lyman Frank Baum

As you can see, the software did a hard return whenever I paused, although I spoke fluidly; and while it got some of the punctuation correct, there are no quotation marks to be found. But worse, there are many errors between the words I spoke and the text that was created.

Was it faster than typing? Sure. Was it accurate? Not exactly!

STT is a useful tool, but will never replace the most powerful computers: our own brains. If you use speech-to-text software for content that you will need to publish one day, you will need to pass the transcription on to an experienced proofreader who will catch those silly mistakes and distracting errors. I promise, your readers will never guess that you took a shortcut with STT!

Leave a Reply

Your email address will not be published. Required fields are marked *