Deep Learning Methods For Audio EEG Analysis
Abstract
The perception of speech and audio is one of the defining features of humans. Much of the brain’s underlying processes as we listen to acoustic signals are unknown, and significant research efforts are needed to unravel them. The non-invasive recordings capturing the brain activations like electroencephalogram (EEG) and magnetoencephalogram (MEG) are commonly deployed to capture the brain responses to auditory stimuli. But these non-invasive techniques capture artifacts and signals not related to the stimuli, which distort the stimulus-response analysis. The effect of the artifacts be- comes more evident for naturalistic stimuli. To reduce the inter-subject redundancies and amplify the components related to the stimuli, the EEG responses from multiple subjects listening to a common naturalistic stimulus need to be normalized. The currently used normalization and pre-processing methods are the canonical correlation analysis (CCA) models and the temporal response function based forward/backward models. However, these methods assume a simplistic linear relationship between the audio features and the EEG responses and therefore, may not alleviate the recording artifacts and interfering signals in EEG. This thesis proposes novel methods using machine learning advances to improve the audio-EEG analysis.
We propose a deep learning framework for audio-EEG analysis in intra-subject and inter-subject settings. The deep learning based intra-subject analysis methods are trained with a Pearson correlation-based cost function between the stimuli and EEG responses. This model allows the transformation of the audio and EEG features that are maximally correlated. The correlation-based cost function can be optimized with the learnable parameters of the model trained using standard gradient descent- based methods. This model is referred to as the deep CCA (DCCA) model. Several experiments are performed on the EEG data recorded when the subjects are listening to naturalistic speech and music stimuli. We show that the deep methods obtain better representations than the linear methods and results in statistically significant improvements in correlation values.
Further, we propose a neural network model with shared encoders that align the EEG responses from multiple subjects listening to the same audio stimuli. This inter-subject model boosts the signals common across the subjects and suppresses the subject-specific artifacts. The impact of improving stimulus-response correlations are highlighted based on multi-subject EEG data from speech and music tasks. This model is referred to as the deep multi-way canonical correlation analysis (DMCCA). The combination of inter-subject analysis using DMCCA and intra-subject analysis using DCCA is shown to provide the best stimulus-response in audio-EEG experiments.
We highlight how much of the audio signal can be recovered purely from the non- invasive EEG recordings with modern machine learning methods, and conclude with a discussion on future challenges in audio-EEG analysis.