Mirco Ravanelli is currently a post-doc researcher at Mila (Université de Montréal). His main research interests are deep learning, speech recognition, far-field speech recognition, robust acoustic scene analysis, cooperative learning, speaker recognition, unsupervised learning and is the author or co-author of more than 40 papers on these research topics.
PROCESSING RAW AUDIO WITH SINCNET
Modern deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech processing consists in discovering these representations starting from raw audio samples directly. Differently from standard hand-crafted features such as MFCCs or FBANK, the raw waveform can potentially help neural networks discover better and more customized representations. The high-dimensional raw inputs, however, can make training significantly more challenging. In this talk, I will discuss SincNet, a novel Convolutional Neural Network (CNN) that can efficiently process speech from audio waveforms using simple sinc filters. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data, making SincNet a very compact model that converges faster and performs better than a standard CNNs. The talk will also cover some recent improvements of SincNet and will discuss how we recently used this model for unsupervised learning of speech representations.