arxiv:1803.10219

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Published on Mar 25, 2018

Authors:

Abstract

A novel end-to-end network, WaveMsNet, using multi-scale convolution and two-phase feature fusion achieves high classification accuracy on environmental sound datasets.

AI-generated summary

Deep learning has dramatically improved the performance of sounds recognition. However, learning acoustic models directly from the raw waveform is still challenging. Current waveform-based models generally use time-domain convolutional layers to extract features. The features extracted by single size filters are insufficient for building discriminative representation of audios. In this paper, we propose multi-scale convolution operation, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area. For leveraging the waveform-based features and spectrogram-based features in a single model, we introduce two-phase method to fuse the different features. Finally, we propose a novel end-to-end network called WaveMsNet based on the multi-scale convolution operation and two-phase method. On the environmental sounds classification datasets ESC-10 and ESC-50, the classification accuracies of our WaveMsNet achieve 93.75% and 79.10% respectively, which improve significantly from the previous methods.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1803.10219 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1803.10219 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.