There's something magical about voices. We grow up with the voices of our parents, recognize our friends by theirs and enjoy listening to memorable lines delivered by talented actors and speakers. So far, computer voices sounded anything but natural whether in the shape of assistants like Cortana or navigation systems. They always sounded alien, robotic and artificial. The Lyrebird software aims to change all that. What was once unmistakable is said to soon become perfectly imitable.
The technology, developed by a Canadian startup, allows us to glimpse into the future of synthetic voices. In the wild, lyrebirds are clever animals that can imitate sounds and voices and the software aims to achieve similar results with self-optimizing algorithms. Thousands of voices have already been analyzed to crack the "vocal DNA". What is it that makes voices so unmistakable and how can these features be created artificially? So far, the program not only imitates timbre but also various emotions like stress, joy or anger. And the project already made headlines when they successfully recreated the voices of Clinton, Obama and Trump.
Time for another self-experiment! First, I needed to register either through email, Facebook or Google+. I chose the latter since Lyrebird didn't accept my email address. After registration, I had to set up my voice profile by reading 30 sentences to the software. Currently, only English is supported (more languages will follow) but what the heck. I slogged my way through 30 nonsensical sentences that were supposed to make for one minute of audio material (it probably took me two). The system then processed my recordings and a couple of minutes later my profile was ready for use. Anything I entered, the software turned into speech using a fake copy of my voice. As a test, I picked a few lines from the US constitution. The result was rather sobering. Everything sounded monotonous and artificial and was accompanied by a metallic drone - it sounded nothing like me or so I thought. But when I played the recordings back to my colleagues, I got a surprise: they instantly recognized my voice! The software somewhat matched my tone and speech pattern albeit with a couple of minor glitches. I was stunned.
How easily are we fooled?
If you consider that Lyrebird is a fledgling company that received just a couple of one or two-minute recordings from me, through a software that is still its infancy and likely to improve with every new release, things get really exciting. With emotions and a little more variety regarding intonation and tempo, the software could quickly achieve a much greater degree of realism with multiple possible applications. The annoying and squeaky voices found in waiting loops, digital assistants and navigation systems could soon be replaced with much more pleasant and fully customizable voices. Picture navigation systems with the voice of a loved one, text instructions narrated by your favorite actor or Cortana sounding like your own children. All that would suddenly be possible. Even seriously ill people like Stephen Hawking would profit from this development that could eventually give those who either lost their ability to speak or suffer from speech impediments their original voices back. A lot of progress has already been made in this area but naturally sounding text-to-speech is still far and few between.
As always, all this progress is accompanied by worries and concerns. The technology could potentially open the floodgates to all kinds of fraud and manipulation. How will voice-authorized systems cope with this development? Telephone fraud will be harder to detect and you'll never know who's calling. What about telesales that usually require your explicit spoken consent? Will we soon be hearing manipulated recordings of politicians whose career will come to a sudden end as a result? Will voice recordings still be admissable as evidence in court? Will dead celebrities suddenly rise from the dead and advertise packet soup? The possibilities for abuse are endless.
Naturally, Lyrebird spokespeople are playing down the risks. They talk about watermarks that will render artificial voices easily detectable but what happens if the company or their technology gets sold to someone else? Google, Adobe and others already have similar systems in place and apps like "FakeApp" make it incredibly easy to swap faces between bodies or manipulate mouth movements in real time today. Naturally sounding computer-generated voices are the only thing missing to create the perfect illusion! In a time when discussions about fake news dominate the media, any technological innovation along these lines is doubly dangerous. Let's hope that 100% realistic fakes are still a few years away!
What I would like to know: how do you feel about this progress? Useful or dangerous?
Those who are overly concerned about this technology, you should remember. Ancient Aliens have already perfected it and they're returning in 3-billion years.
"The result was rather sobering. Everything sounded monotonous and artificial and was accompanied by a metallic drone - it sounded nothing like me or so I thought."
Probably well before your time but the first time I heard my own voice on a tape-recorder I had exactly the same response... and, indeed, recently hearing my phone message "...are out, please leave your message after the tone" still shook me.
I believe it's a well know fact that we hear our own voice in an entirely different way to any *external* sound.
Off subject a little - I've also recentl learnt that babies (before they can speak) make noises with the lilt and intonation of the language they heard in the womb - thus, for example, experts can differentiate between French babies and German ones...
Definitely another very interesting blog... do keep them coming, please!
Keith
I know the sound of my voice quite well since I already read a few articles for a radio station. And yes, I tried singing but gave up for total lack of talent. :)
Hallo Sven,
ich lebe in Australien, in England aus dem Deutschen und Englischen Vorfahren geboren und ich mag die deutsche Sprache, weil in Australien gibt es zu viele Slang und falsch geschriebenen Wörtern, schlimmer ist der Einsatz der amerikanischen Worte, die nicht korrekt geschrieben sind, Buck für Geld, Reifen für Reifen, Pflug für Pflug und Flugzeug für Flugzeug, plus der Rest von miese Rechtschreibung. Ein Techniker, der auf Flugzeugen arbeitet ist eine aero-Maschinenbauer, also, Ihre Sprache nicht English-American ist es nicht richtig Englisch.
.......Danke für das Lesen meiner langweiligen Rede und jetzt zu den wichtigsten Punkt, den ich für sie für einen Blog zu machen.
Ich habe vor kurzem sah eine on-line-store für elektrische Produkte und der Hauptgrund war auf der Suche nach der Reihe von Computern für den Verkauf zu sehen.
Fast endlos Scrollen, Scrollen, jedes Produkt wurde mit einem "Ausverkauft"-Schild bedeckt.
Es erscheint eine Fülle von on-line-Geschäfte, die Fülle der technischen Hilfe Websites, eine Fülle von vielen anderen veraltet, geschlossen, beendete, nicht operative Standorte, die entfernt werden soll.
Wir brauchen nicht die Frustration. Die alte Out", "Tech" Beratung veröffentlicht bei Google im Jahr 2011.
All dies trägt zu der Frustration, wenn physisch vorbei an Geschäften in der lokalen Mall, die mit einem grossen ALE' Zeichen auf dem Fenster, die Produkte in den Läden gibt es verkauft werden, sind sie nicht.
Vielen Dank für Ihre interessante Blogs und, mit
freundlichen Grüßen,
Johan in Australien.
That’s indeed a nuisance and we have these shops in Germany, too. They like to use bait and switch offers with suspiciously few items in stock or seemingly cheap products with horrendously high shipping costs. Unfotunately, online shopping isn’t always the carefree experience it ought to be.
Well, how about another aspect - copyright? Some people's voices are valued and paid for dearly, to record ebooks, read narrations in movies, etc. This software will allow the same as happened with automatic translators - preying on someone else's work or copyrighted characteristic, such as use of voice.
You’re right, copyright issues are possible. Another question is whether you’re allowed to use the voices of the deceased who can no longer decide that for themselves. John Wayne canvassing? It’s only a matter of time, I’m certain.
Unfortunately nothing can be uninvented. Once the idea is out there in the wild somebody will invent it for nefarious purposes. Better for it to be in open view. We need to find ways of dealing with the problems and utilising the advantages. It is often forgotten that many of these problems were inherent in the invention of the mail system but solutions were found to mitigate them. We have to be as clever as the Victorians.
Another useless 'toy' and, self-voice recorder software has been available in other products for many years, so this Lyrebird method doesn't make any sense for spending time and money to create it.
Dangerous! The liabilities far outweigh the assets!
"The real horror starts once you’re getting calls from yourself. :)"
I've had three today from people spoofing my exchange. Given that I don't know anyone on that cell exchange, they are all blocked using wild cards on the blocker.
Don't Panic
Wall-E will clear up after we have run away from this mess.
It'll make script read-throughs and self-recorded audition videos awesome one day. I can also see audio books being much more popular when customers can specify "who" they want to read the books to them!
Although the voice recognition part could be useful overall, this technology is very dangerous if in the hands of hacks or disreputable individuals. Look at the possibilities recordings could no longer be used in court as evidence etc.
As stated in article, quote, "The possibilities for abuse are endless." They are limitless and saying that watermarks will prevent it, lie. Once this software and technology gets out, the scammers and thieves of this world are going to party! All voice security will be rendered useless. Also, as pointed out in the article, the potential for the voice confirmation systems being abused is VERY SCARY. Just this alone could cost a person EVERYTHING!
Very dangerous potential! It may come to the point where we have to abandon electronic communication. It is already too vulnerable to interference by malignant governments and their agencies.
Considering the sheer number of spam/scam calls I've had in the past, this is a little scary.
The real horror starts once you’re getting calls from yourself. :)
Useful.