Anyone requiring voice-over for their productions will have no doubt encountered, and perhaps even considered using, an AI voice instead of a human voice actor.
The proliferation of tech start-ups pitching cheap, off-the-shelf, synthetic voice-overs have accelerated their marketing in recent years. Many of these companies claim that their ‘instant access voices’ are virtually indistinguishable from human voices.
Here at Voquent, we work with thousands of professional performers. For many, voice-over work is an important part of their income. AI technology has already impacted other industries, so, understandably, many performers are getting increasingly concerned about the longevity of their profession.
Is there a future tipping point where AI voice-overs will have a serious impact on the voice-over industry? Let’s build some perspective on what is happening.
Where it all started
One could argue that AI voices are ultimately born from Homer Dudley, a Bell Labs engineer who began developing the first vocoder in 1928. These would form the basis for many “robot voices”, which strongly influenced popular music, television, and filmmaking.
After this, Douglas Rain’s rendition of the HAL 9000 computer in Stanley Kubrick’s legendary film 2001: A Space Odyssey was the first time a mainstream audience could hear a notional concept of an AI voice that could replicate a degree of human intonation and inflexion.
Speech Synthesis has been an important part of mainstream life for around four decades. In 1977, toy giants Mattel began developing a game console called Intellivision, which included a voice synthesis module. This was released in 1982 to compete with the Atari 2600. The 1986 model of Texas Instruments Speak & Spell (famously featured in Steven Spielberg’s ET) was the first consumer product ever made with a self-contained LPC speech synthesiser IC. A few years later, the Atari ST and Commodore Amiga wowed audiences by pioneering their own primitive speech emulators.
In 1988, Stephen Hawking’s adoption of Intel’s Speech Plus system enabled him to regain an audible voice after developing a slowly progressing form of Motor Neurone Disease. This voice ended up becoming one of the most well know things about Professor Hawking.
Both Apple and Microsoft have researched, pioneered and developed synthetic AI voice systems for decades. Microsoft’s SAPI4 and SAPI5 speech synthesis are add-ons for Windows 95 and 98. And Apple’s 2005 release of the Mac OS X Tiger (10.4) featured an inbuilt voice-over for the first time.
Fast forward to the Apple launch of “Siri” in 2011, and later, Amazon’s Alexa in 2017, along with Google’s own synthetic voice and Microsoft’s many iterations of Cortana. Billions of dollars have been invested by these four companies alone in creating human voice emulators that sound as human as humanly possible. Oh, the humanity!
What is happening now?
AI (Artificial Intelligence) represents the fastest growing and the most important strand of technological development of recent generations. Possibly of any generation. Naturally, this includes an ever-expanding variety of uses for synthetic AI voices.
Now, many small, independent developers are jumping onto the Text-to-Speech (TTS) bandwagon, harvesting large databases and libraries of spoken audio in an attempt to create a more human-sounding voice emulator to build a passive income stream.
Tech companies like Voicery are using AI to develop bespoke, synthetic voices for brands. But are they any good?
A quick Google search will yield many companies claiming to provide fully synthetic and apparently authentic-sounding TTS voice-over services that they claim are indistinguishable from actual, human voices.
Are these claims to be believed?
While it’s true that some companies have achieved relatively impressive results with a limited selection of synthetic voices, it is the inability of the AI voice to understand the purpose or meaning of a script that cripples widespread adoption.
They cannot get excited or show empathy. They do not have real emotion.
Think about it. In just a few seconds of natural human speech, you listen to a voice influenced by the lifelong journeys and travels of that person’s life. Isn’t that incredible?
So, using the banal tones of a machine-generated voice is a sure-fire way to foster disengagement from your audience in the same way that audiences can subconsciously switch off to background noise (such as the radio).
AI voices do not carry the influence of distinct regional accents, nor can they carry the weight of real-world experiences that bridge the narrator’s emotions with the audience.
How are AI voices creating opportunities when they are designed to replace human voices?
What many expert voice actors we speak to do not realise is that AI voices are currently generating an insatiable appetite for real, human voices that address both global and regional audiences. A new generation of content creators are starting out their first, amateur productions reluctantly using AI voices to get started.
The need for businesses of all types to wield language, experience, and accents to connect to their consumers, customers, and viewers increases exponentially alongside the global adoption of mobile technology and video sharing platforms.
Professional voice actors should relish advances in AI speech technology, not fear it.
Cheap sounding AI voice-overs create an inevitable need for an authentic human voice replacement as soon as possible! And now there’s no reason not to; it is more affordable than ever.
You can easily hire less experienced voice actors relatively cheaply via a freelance marketplace or P2P (Pay-To-Play) site. However, it is often worth spending a bit more to obtain experienced voice-over talents. The higher the quality of the recording equipment and performance, the better it will be received by the audience.
Whatever the budget, though, any human voice will beat a synthetic one.
ALSO READ: Why Cheap Voice-Overs Will Cost You More
Will AI voice-overs ever challenge a human voices talent?
The human brain still contains vastly complex processing capabilities that exceed even the most powerful supercomputers on the planet when it comes to creative performances. The human species is probably more likely to render itself extinct long before it can accurately simulate the human brain.
As we’ve explored in previous articles about character voices and villains, there’s a lot of thought and planning that goes into the casting and performance of a professional voice-over. Leading performers and creators always want to ensure that their voices are unique and memorable.
An AI voice will, by definition, never be unique. Sure, Siri is a real voice and is unique to iPhones, but would you use Siri to provide the voice-over for your commercial?
AI voices provide a cheap way to get words read out loud, but this misses the point. The stilted cadence of a synthetic voice will probably never replace the natural rhythm of a human voice in our lifetime because the spoken word is much more than just a broadcast mechanism.
Voice-over is still, without question, art and will forever offer a deep and powerful connection to our hearts and minds.
It is doubtful an AI voice-over will ever emotionally connect with a listener as a human narrator can except for R2D2.
Human voices are complex and amazing.
No single recording of the same human voice will ever be identical. This is especially obvious when listening to someone sing. And that is the point. There are always subtle inconsistencies in the vocal delivery of a script or song, whether they are unintended or not. This is what listeners embrace.
Realistically mimicking the charm of human imprecision remains an agonising and impossible challenge for any AI voice-over developer. It’s even difficult enough to get right with musical instruments, such as drums and strings, with far less variety than a human voice.
We love technology here at Voquent, and we are curious to watch how the practical uses and benefits of AI voice-overs expand.
Regardless of how realistic (or not) a synthetic voice sounds, the next progressive step is always to replace it with a human voice-over. This will, quite literally, breathe life into the product or message.
Right now, the voice talent options available to content creators on a global scale is incredibly diverse and exciting.
Sometimes we include links to online retail stores such as Amazon. As an Amazon Associate, if you click on a link and make a
purchase, we may receive a small commission at no additional cost to you.