WDASnet

Authors: Yi Yang, Hangting Chen, Pengyuan Zhang
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China

We propose a Weighted-Direction-Aware speech Separation network (WDASnet) to achieve a DOA-assisted speech separation on sparsely overlapped mixtures in a multi-people meeting environment. First, based on the Convolutional Recurrent Neural Network (CRNN) DOA-estimation model, we provide a variant system by leveraging a weighted-pooling block which reduces the influence of silent and interference speaker frames. Second, we achieve an end-to-end utterance-wise DOA-estimation. Prior VAD, pre- or post-processing is not needed. Third, we take a deep look of our system into multi-people meeting environment. Fourth, we analyze the advantages and limitations of this model.

Demos

mixture (overlap=0.5)

ground-truth

BLSTM separation with average-pooling DoA estimation (sdr=5.72dB)

BLSTM separation with oracle AF (sdr=13.92dB)

BLSTM separation with the proposed weighted-pooling DoA estimation (WDASnet) (sdr=14.62dB)

Example1

This illustration is the same with Fig.2 in the paper.
Visualization of the estimated weight. Three pairs of blocks from left to right refer to speech of target speaker, silent frames and overlapped speech, respectively.

Example2

Example2 shows the variation of estimated weights in condition of different sirs (-5,0,5dB)

Contact

If you have any advice or questions, please feel free to contact me!