Anyone requiring voice-over for their productions will have no doubt encountered, and perhaps even considered using, an AI voice instead of a human voice actor.
The proliferation of tech start-ups pitching cheap, off-the-shelf, synthetic voice-overs have accelerated their marketing in recent years. Many of these companies are claiming that their ‘instant access voices’ are virtually indistinguishable from human voices.
Here at Voquent, we work with thousands of professional performers. For many, voice-over work is an important part of their income. AI technology has already impacted other industries, so it is understandable that many performers are getting increasingly concerned about the longevity of their profession.
Is there a future tipping point where AI voice-overs will have a serious impact on the voice-over industry? Let’s build some perspective on what is happening.
Where it all started
One could argue that AI voices are ultimately born from Homer Dudley, a Bell Labs engineer who began developing the first vocoder in 1928. These would form the basis for many “robot voices” which provided a strong influence in both popular music, television, and filmmaking.
After this, Douglas Rain’s rendition of the HAL 9000 computer in Stanley Kubrick’s legendary film 2001: A Space Odyssey was the first time a mainstream audience was able to hear a notional concept of an AI voice that could replicate a degree of human intonation and inflexion.
Speech Synthesis has been an important part of mainstream life for around four decades. In 1977, toy giants Mattel began developing a game console called Intellivision which included a voice synthesis module. This was released in 1982 to compete with the Atari 2600. The 1986 model of Texas Instruments Speak & Spell (famously featured in Steven Spielberg’s ET) was the first consumer product ever made with a self-contained LPC speech synthesiser IC. A few years later, the Atari ST and Commodore Amiga wowed audiences by pioneering their own primitive speech emulators.
In 1988, Stephen Hawking’s adoption of Intel’s Speech Plus system enabled him to regain an audible voice after developing a very slowly progressing form of Motor Neurone Disease. This voice ended up becoming one of the most well know things about Professor Hawking.
Both Apple and Microsoft have researched, pioneered and developed synthetic AI voice systems for decades. Microsoft’s SAPI4 and SAPI5 speech synthesis are add-ons for Windows 95 and 98. And Apple’s 2005 release of the Mac OS X Tiger (10.4) featured inbuilt voice-over for the first time.
Fast forward to the Apple launch of “Siri” in 2011, and later, Amazon’s Alexa in 2017, along with Google’s own synthetic voice and Microsoft’s many iterations of Cortana. Billions of dollars have been invested by these four companies alone in creating human voice emulators that sound as human as humanly possible. Oh, the humanity!
What is happening now?
AI (Artificial Intelligence) represents the fastest growing and the most important strand of technological development of recent generations. Possibly of any generation. Naturally, this includes an ever-expanding variety of uses for synthetic AI voices.
Now, many small, independent developers are jumping onto the Text-to-Speech (TTS) bandwagon, harvesting large databases and libraries of spoken audio in an attempt to create a more human-sounding voice emulator with the objective of building a passive income stream.
A quick Google search will yield many companies claiming to provide fully synthetic and apparently authentic-sounding TTS voice-over services that they claim are indistinguishable from actual, human voices.
Are these claims to be believed?
While it’s true that some companies have achieved relatively impressive results with a limited selection of synthetic voices, it is the inability of the AI voice to understand the purpose or meaning of a script which cripples widespread adoption.
They cannot get excited or show empathy. They do not have real emotion.
Think about it. In just a few seconds of natural human speech, you are listening to a voice influenced by the lifelong journeys and travels of that person’s life. Isn’t that incredible?
So, using the banal tones of a machine-generated voice is a sure-fire way to foster disengagement from your audience in the same way that audiences can subconsciously switch off to background noise (such as the radio).
AI voices do not carry the influence of distinct regional accents, nor are they able to carry the weight of real-world experiences that bridge the emotions of the narrator with the audience.
But AI voices are a positive thing – yes, really!
What many professional voice actors we speak to do not realise is that AI voices are, in fact, generating an insatiable appetite for real, human voices that address both global and regional audiences.
The need for businesses of all types to wield language, experience, and accents to connect to their consumers, customers, and viewers continues to increase exponentially alongside the global adoption of mobile technology and video sharing platforms.
Cheap sounding AI voice-overs create an inevitable need for an authentic human voice replacement as soon as possible! And now there’s no reason not to; it is more affordable than ever.
You can easily hire less experienced voice actors relatively cheaply via a freelance marketplace or P2P (Pay-To-Play) site. However, it is often worth spending a bit more to obtain an experienced voice-over artist. The higher the quality of the recording equipment and performance, the better it will be received by the audience.
Whatever the budget though, any human voice will beat a synthetic one.
ALSO READ: Why Cheap Voice-Overs Will Cost You More
Will AI voice-overs ever challenge a human voices talent?
When it comes to creative performances, the human brain still contains vastly complex processing capabilities that exceed even the most powerful supercomputers on the planet. The human species is probably (and increasingly) more likely to render itself extinct long before it can accurately simulate the human brain.
As we’ve explored in previous articles about character voices and villains, there’s a lot of thought and planning that goes into the casting and performance of a professional voice-over. Leading performers and creators always want to ensure that their voices are unique and memorable.
An AI voice will, by definition, never be unique. Sure, Siri is a real voice, and is unique to iPhones, but would you use Siri to provide the voice-over for your commercial?
AI voices provide a cheap way to get words read out loud, but this misses the point. The stilted cadence of a synthetic voice will probably never replace the natural rhythm of a human voice in our lifetime because the spoken word is much more than just a broadcast mechanism.
Voice-over is still, without question, an art and will forever offer a deep and powerful connection to our hearts and minds.
Human voices are complex and amazing
No single recording of the same human voice will ever be identical. This is especially obvious when listening to someone sing. And that is the point. There are always subtle inconsistencies in the vocal delivery of a script or song, whether they are unintended or not. This is what listeners embrace.
Realistically mimicking the charm of human imprecision remains an agonising and impossible challenge for any AI voice-over developer. It’s even difficult enough to get right with musical instruments, such as drums and strings, that have far less variety than a human voice.
We love technology here at Voquent and we are excited to see the practical uses and benefits of AI voice-overs expand. They have their place, particularly in informational settings.
Regardless of how realistic (or not) a synthetic voice sounds, the next progressive step is always to replace it with a human voice-over. This will, quite literally, breathe life into the product or message.
Right now, the voice talent options available to content creators on a global scale is incredibly diverse and exciting.
Written by Miles Chicoine
Edited by Al Black & Alex Harris-MacDuff