Microsoft Pushes Further into Voice Synthesis, Challenging Google

Microsoft has rolled out updates to its various A.I. platforms for the enterprise, including Azure Cognitive Services. Some of those advances could place Microsoft in a position to compete directly with Google over next-generation features such as speech synthesis.

The Microsoft releases took place in the context of this year’s Ignite conference, which is aimed primarily at tech pros and developers.

Azure Cognitive Services now has a more advanced neural text-to-speech service, one that aspires to sound remarkably human. “Microsoft has reached a milestone in text-to-speech synthesis with a production system that uses deep neural networks to make the voices of computers nearly indistinguishable from recordings of people,” Xuedong Huang, a Microsoft technical fellow in Cloud and AI, wrote in a Sept. 24 posting on the Azure corporate blog. “With the human-like natural prosody and clear articulation of words, Neural TTS has significantly reduced listening fatigue when you interact with AI systems.”

In theory, this advance will make interactions with chatbots and virtual assistants “more natural and engaging,” Huang added. Microsoft’s deep neural networks can simultaneously perform voice synthesis and prosody (i.e., the stresses and tics of spoken language) prediction, offering a more seamless “conversation.”

If that sounds familiar, it’s because Google is attempting a similar thing with Duplex, an A.I.-driven platform that can speak to human beings in a very natural voice, often with hums and pauses. Although Google has positioned Duplex as a platform with narrow functionality—primarily making restaurant reservations and appointments over the phone—it’s clear that the technology could eventually expand to help desks and other customer-service interactions.

In a similar vein, Microsoft has also been updating its Bot Framework SDK, making it easier for developers to build increasingly complex bots. Chatbots and similar programs are something of a Holy Grail for many businesses, dangling the possibility of automating people-intensive functions such as customer service; however, the record of bots “in the wild” (so to speak) is mixed.

Despite the advances in A.I. and machine learning, it might still be some time before companies feel comfortable with replacing entire divisions of human workers with bots—even if those bots sound pretty human-like on the phone. In the meantime, Microsoft continues to pour its resources into enterprise-focused A.I., hoping to stay ahead of robust competitors in that arena.

For developers, this competition means a rapidly proliferating collection of interesting, A.I.-related tools. When companies burn billions on creating naturalistic platforms that can interact (however creepily) with human beings, it’s third-party developers among the chief beneficiaries.