Google Assistant will start to sound a lot more natural thanks to WaveNet – a new text-to-speech (or speech synthesis) system created by Google’s DeepMind branch.
WaveNet differs from the traditional Concatenative TTS, which uses a large base of pre-recorded speech by a single voice actor, and Parametric TTS, which uses a computer generated voice, in that it can create individual waveforms using 16,000 samples per second.
WaveNet uses a large dataset of speech samples and has been trained for over 12 months to recognize which tones follow each other and what waveforms are realistic.
There’s a lot of technical mumbo jumbo behind the scenes you can read about in the Source link below. But how about a practical example of the effect of WaveNet on Google Assistant?
And after WaveNet
WaveNet will be used with US English and Japanese for the time being but Google will likely be making other languages more natural sounding in time.
It took DeepMind’s team 12 months to create WaveNet and optimize to work fast enough for broad use, just imagine what it can do with another 12 months.
Source | Via