Arguably the most important part of a virtual assistant is its voice, especially if you want it to catch on in the mainstream. The most intelligent AI in the world won’t sit well with the average user if it sounds like a chainsaw robot when it answers questions, and that’s why we’ve seen big companies focus so much on making their sidekicks sound natural.
Google has been at the forefront of that thanks to their intense neural network technology, and it’s paid off with a new system from the search giant called Tacotron 2. That may be “tack-o-tron” and linguistically mean something specifically related to speech, but I’m going to assume Google’s naming stuff after tacos here.
The Tacotron 2 is a text-to-speech system that relies on two neural networks; the first network translates text into a spectrogram, and the second network turns the spectrogram into audio playback. It’s a complex two-part system that combines tech from both Google and Alphabet to pull off.
And, believe it or not, the AI is virtually indistinguishable from a human voice. There are a few audio samples that compare both, and even some phrases that change the inflection and important parts of a sentence based on capitalization and punctuation, and it’s just about perfect.
Of course, technology like this will keep getting better over time, but that doesn’t take away from how great Google is making things right now.