Your voice is unmistakable? Not for long!
There's something magical about voices. We grow up with the voices of our parents, recognize our friends by theirs and enjoy listening to memorable lines delivered by talented actors and speakers. So far, computer voices sounded anything but natural whether in the shape of assistants like Cortana or navigation systems. They always sounded alien, robotic and artificial. The Lyrebird software aims to change all that. What was once unmistakable is said to soon become perfectly imitable.
The technology, developed by a Canadian startup, allows us to glimpse into the future of synthetic voices. In the wild, lyrebirds are clever animals that can imitate sounds and voices and the software aims to achieve similar results with self-optimizing algorithms. Thousands of voices have already been analyzed to crack the "vocal DNA". What is it that makes voices so unmistakable and how can these features be created artificially? So far, the program not only imitates timbre but also various emotions like stress, joy or anger. And the project already made headlines when they successfully recreated the voices of Clinton, Obama and Trump.
Time for another self-experiment! First, I needed to register either through email, Facebook or Google+. I chose the latter since Lyrebird didn't accept my email address. After registration, I had to set up my voice profile by reading 30 sentences to the software. Currently, only English is supported (more languages will follow) but what the heck. I slogged my way through 30 nonsensical sentences that were supposed to make for one minute of audio material (it probably took me two). The system then processed my recordings and a couple of minutes later my profile was ready for use. Anything I entered, the software turned into speech using a fake copy of my voice. As a test, I picked a few lines from the US constitution. The result was rather sobering. Everything sounded monotonous and artificial and was accompanied by a metallic drone - it sounded nothing like me or so I thought. But when I played the recordings back to my colleagues, I got a surprise: they instantly recognized my voice! The software somewhat matched my tone and speech pattern albeit with a couple of minor glitches. I was stunned.
How easily are we fooled?
If you consider that Lyrebird is a fledgling company that received just a couple of one or two-minute recordings from me, through a software that is still its infancy and likely to improve with every new release, things get really exciting. With emotions and a little more variety regarding intonation and tempo, the software could quickly achieve a much greater degree of realism with multiple possible applications. The annoying and squeaky voices found in waiting loops, digital assistants and navigation systems could soon be replaced with much more pleasant and fully customizable voices. Picture navigation systems with the voice of a loved one, text instructions narrated by your favorite actor or Cortana sounding like your own children. All that would suddenly be possible. Even seriously ill people like Stephen Hawking would profit from this development that could eventually give those who either lost their ability to speak or suffer from speech impediments their original voices back. A lot of progress has already been made in this area but naturally sounding text-to-speech is still far and few between.
As always, all this progress is accompanied by worries and concerns. The technology could potentially open the floodgates to all kinds of fraud and manipulation. How will voice-authorized systems cope with this development? Telephone fraud will be harder to detect and you'll never know who's calling. What about telesales that usually require your explicit spoken consent? Will we soon be hearing manipulated recordings of politicians whose career will come to a sudden end as a result? Will voice recordings still be admissable as evidence in court? Will dead celebrities suddenly rise from the dead and advertise packet soup? The possibilities for abuse are endless.
Naturally, Lyrebird spokespeople are playing down the risks. They talk about watermarks that will render artificial voices easily detectable but what happens if the company or their technology gets sold to someone else? Google, Adobe and others already have similar systems in place and apps like "FakeApp" make it incredibly easy to swap faces between bodies or manipulate mouth movements in real time today. Naturally sounding computer-generated voices are the only thing missing to create the perfect illusion! In a time when discussions about fake news dominate the media, any technological innovation along these lines is doubly dangerous. Let's hope that 100% realistic fakes are still a few years away!
What I would like to know: how do you feel about this progress? Useful or dangerous?