Deep learning-based method for multiple sound source localization with high resolution and accuracy
Soo Young Lee(Sound source localization)
Korea | Mechanical Systems and Signal Processing 161

■ View full text 

Mechanical Systems and Signal Processing 161, 107959, 2021


■ Researchers

Soo Young Lee

Department of Mechanical Engineering, Pohang University of Science and Technology (POSTECH)

Lee S., Chang J., Lee S., 



■ Abstract

Deep learning-based methods are attracting interest in sound source localization, showing promising results compared to conventional model-based approaches. While these deep learning-based methods have been mainly developed into two approaches, i.e., grid-based and grid-free methods, they inherently involve several limitations that the sound sources should be assumed on the grid points or the number of sound sources should be pre-defined when constructing a deep neural network’s architecture. Breaking away from the existing methods’ limitations, we propose a deep learning approach to fulfill multiple sound source localization with high resolution and accuracy, for whether the sound sources are located on the grid points or not. We first suggest a target function to obtain spatial source distribution maps, that can represent multiple sources’ positional and strength information, even when the sources are placed off the grid points. While the multiple sound source localization is expanded by the proposed source map into image-to-image pixel-level prediction task, we then propose a fully convolutional neural network (FCN) with an encoder-decoder structure to estimate the multiple sources’ positions and strength precisely. Based on the dataset acquired by one to three monopole sources on a square plane of 2.68  2.68 m, with a spiral array of 60 microphones at 1, 2, and 10 kHz, we assess both quantitative and qualitative results of the proposed model and demonstrate that our proposed model can achieve highly precise localization results regardless of frequency and the number of sound sources. Besides, we validate that high-resolution source distribution maps can be obtained by the proposed model, from which the positions and the strengths of sound sources are accurately predicted. Lastly, we compare the proposed model with several deconvolution methods, and the results show that the proposed deep learning model significantly outperforms the model-based methods.



인쇄 Facebook Twitter 스크랩

  전체댓글 0


댓글 입력란
사용자 프로필 이미지