Tencent's "Noise Hunters" Make Voices Much Clearer

2021.07.14

“The pork shop has started to chop meat, let's go!” Yannan Wang carefully positioned the recorder on the table in anticipation of collecting chopping sounds. Yannan and his team had already gathered the sounds of shouting, chopping meat, and footsteps for their project with Tencent’s Ethereal Audio Lab.

For Yannan and his co-workers out on the streets, noise is their “prey” — it’s meant to be stalked, captured and destroyed. These engineers are “noise hunters” and their superb “hunting” skills have a practical purpose: to help people around the world hear much better and more clearly.

A researcher at Tencent’s Ethereal Audio Lab collects outdoor noises.

Many years of research in the audio field have made Yannan extremely sensitive to sound. He believes that noise-reduction technology can bring positive change to many people’s lives.

Communications equipment is constantly being updated and improved. People talk everywhere, whether on a sidewalk, in a dense crowd or anywhere else, which is why learning how to reduce noise is essential if we’re to improve hearing.

“In the food market, the butcher’s voice can be heard clearly because our ears selectively block out the noise of chopping meat,” Yannan explained. “What our team has to do is develop technology that will behave like humans. If we want to eliminate the noise, we need to identify the noise and then actively intervene.”

While this solution seems simple, it has troubled engineers for many years because the difficulty lies in distinctly identifying noise from a human voice. Engineers at the Tencent Ethereal Audio Lab know that sound processing is difficult because sound data is one dimensional. Images are two dimensional and video has three, which makes it easier to separate the layers in those two formats.

To identify noise in audio, engineers first need to collect a large amount of noise data. They then cut, clean, and extract common features and put them into an algorithm model. Prior to the pandemic, Tencent’s engineers wandered around the office with a recorder every day to capture the sound of colleagues’ keyboards, closing doors and all the other noises found in a business setting.

Equipment at Tencent’s Ethereal Audio Lab.

Since that time, the demand for remote work has brought greater awareness to a product with hundreds of millions of users: Tencent Meeting. AI noise-reduction developed by the Ethereal Audio Lab is the core technology behind the scenes that drives the popular video conferencing platform.

Prior to the pandemic, conference calls were standard and worked well. People usually used a fixed telephone in a specific location and made the call through a public network. The process was clear, controllable, and didn’t require much technology.

The pandemic changed everything. Over the past 12-18 months, people held phone or video conferences from a wide range of different locations, with different technologies and different networks, which created a complex technical challenge for the Tencent Meeting team. Issues like delays, voice-packet loss, and stretched bandwidth were all relatively new problems for workers.

The biggest challenge is identifying where a person is joining a call from, such as an airport, public square, subway car, or other noisy place. All of the sounds get jumbled together with different frequencies, making identifying the human voice extremely difficult. One solution is to use a unified audio processing solution to distinguish and filter out noise in each scene with a complex model.

Shidong Shang, Senior Director of Tencent Media Lab, hard at work in the lab.

Tencent Meeting is using artificial intelligence to identify and enhance the human voice and reduce the other non-important sounds, which has already improved voice call quality by up to 50 percent. By analyzing and processing noisy bus stops, human voices, rain and other sounds collected by the noise hunters, the Tencent Meeting team was able to better identify these sounds on calls and eliminate them, thus clarifying the human voice.

The solution was made possible by the team’s strong technical engineering and research capabilities, which helped earn them the highest rank in a prestigious industry competition with a resolution accuracy rate of 96 percent. That means 96 percent of the time, Tencent Meeting is able to identify and remove unwanted noise.

“In the past, our work mainly focused on creating breakthroughs with new technologies and building products, but now we are focused on continuing to make this algorithm better and finding new ways to deploy AI noise reduction technology to help more people,” said Shidong Shang, Senior Director of Tencent Media Lab and Head of Tencent Ethereal Audio Lab. “We may even be able to improve the quality of life for the elderly.”