AI versus Human Transcription

There’s no denying that AI has vastly improved over the years since its first introduction. It is now a common tool on nearly all smartphones, Bluetooth speakers and electronic devices, with research carried out by Goldman Sachs (https://www.bbc.co.uk/news/technology-65102150) reporting that up to 300 million full time jobs could be replaced by artificial intelligence in the future.

How AI can help with transcription and audio typing

Many platforms now provide automated transcripts from online meetings, or captions for video, although, if you’re a bit of a grammar and spelling geek, you will have noticed it’s not always accurate. “Witch” instead of “which” anyone?

But when transcribing or audio typing clear, single speaker dictation with no jargon, slang or extensive technical terms, automated transcripts and captions (generated, for example, by Otter.ai, YouTube, Zoom and Teams) can be useful. Captioned video can make content more accessible to your audience, even if the captions are not completely accurate, and more simple transcripts may not need too much editing, so there could be a benefit over using a professional transcription service. Although be sure to check that transcript carefully!

AI and interview or focus group transcription

However, if it’s an interview that needs transcribing, with a second or third speaker (or maybe 10 if it’s a focus group), with some local or project specific terminology, then this is where automated transcripts can go very wrong. Speech is often transcribed continuously rather than separated by each speaker, and different voices are not recognised for individual identification, which can be required for focus groups. As for participants overtalking each other, well, this can completely confuse even the best of AI!

Combine this with phonetic guesses at specialist terminology or place names, which are not highlighted for double checking, and lots of mistakes can be missed with multiple speaker recordings. This can lead to the final transcript not reflecting what was actually said during the interview or focus group.

The human touch…

Often, with research interview and oral history transcription in particular, it is important to capture how things are said, as well as what is said. But AI transcription software is not always able recognise a soft chuckle, or when a speaker is starting to get upset, or angry. It is the human transcriber who can pick up and transcribe these vital indicators which make an oral history interview truly come to life for future generations, and for researchers to gather accurate information by recognising the emotion beyond the words being said by the interviewee.

And will AI ever be able to decipher what Joe Bloggs is saying whilst sitting in a busy coffee shop interviewing Jane Doe, who is sitting quite far from the microphone? Or understand strong accents? At the moment it certainly struggles.

At Business Friend, we have found it quicker – and therefore more cost effective to the client – for us to transcribe an interview or focus group from scratch rather than edit an AI generated transcript. So, for now at least, in the question of AI versus Human Transcription, humans are definitely leading the way!

Please do get in touch if you would like to discuss your transcription or audio typing requirements with a real, live person – we love helping our fellow humans!

← Control your Document! Oral History Transcription Tips →

AI versus Human Transcription