miliparking.blogg.se - Google duo online

#Google duo online software#

The conditioning network influences the autoregressive network to produce audio that is consistent with the more slowly-moving input features. The autoregressive network is responsible for the continuity of the signal and provides the short-term and mid-term structure for the speech by having each generated sample depend on the network’s previous outputs. To better manage packet loss, we replace the NetEQ PLC component with a modified version of WaveRNN, a recurrent neural network model for speech synthesis consisting of two parts, an autoregressive network and a conditioning network. In those latter cases the speech becomes robotic and repetitive, a characteristic sound that is unfortunately familiar to many internet voice callers. To conceal the effects of packet loss, WebRTC’s NetEQ component uses signal processing methods, which analyze the speech and produce a smooth continuation that works very well for small losses (20ms or less), but does not sound good when the number of missing packets leads to gaps of 60ms or more. Like many other web-based communication systems, Duo is based on the WebRTC open source project. The WaveNetEQ model is fast enough to run on a phone, while still providing state-of-the-art audio quality and more natural sounding PLC than other systems currently in use.

Because Duo calls are end-to-end encrypted, all processing needs to be done on-device. WaveNetEQ is a generative model, based on DeepMind’s WaveRNN technology, that is trained using a large corpus of speech data to realistically continue short speech segments enabling it to fully synthesize the raw waveform of missing speech. To address these audio issues, we present WaveNetEQ, a new PLC system now being used in Duo. The receiver’s PLC module is responsible for creating audio (or video) to fill in the gaps created by packet losses, excessive jitter or temporary network glitches, all three of which result in an absence of data. The process of dealing with the missing packets is called packet loss concealment (PLC). Specifically, if new audio is not provided continuously, glitches and gaps will be audible, but repeating the same audio over and over is not an ideal solution, as it produces artifacts and reduces the overall quality of the call. In order to ensure reliable real-time communication, it is necessary to deal with packets that are missing when the receiver needs them.

Simplified diagram of network problems leading to packet loss, which needs to be counteracted by the receiver to allow reliable real-time communication. Of those calls, 20% lose more than 3% of the total audio duration due to network issues, and 10% of calls lose more than 8%. For example, 99% of Google Duo calls need to deal with packet losses, excessive jitter or network delays.

Issues such as these lead to lower call quality, since the receiver has to try and fill in the gaps, and are a pervasive problem for both audio and video transmission. However, packets often arrive at the other end in the wrong order or at the wrong time, an issue generally referred to as jitter, and sometimes individual packets can be lost entirely. These packets make their way over the network from the sender to the receiver where they are reassembled to make continuous streams of video and audio. To transmit a call across the internet, the data from calls are split into short chunks, called packets. Online calls have become an everyday part of life for millions of people by helping to streamline their work and connect them to loved ones.

#Google duo online software#

Posted by Pablo Barrera, Software Engineer, Google Research and Florian Stimberg, Research Engineer, DeepMind