Ilker Demirkol (Universitat Politecnica de Catalunya, Spain)
Project overview: Emotion is the complex psychophysiological experience of an individual’s state of mind as interacting with biochemical and environmental influences. Most existing emotion detection methodologies are based on subjective self-reported data. It has been found that prosodic variations in speech are closely related to people’s emotion, thus automatic passive emotion detection becomes possible. In collaboration with researchers in the Clinical and Social Sciences in Psychology Department at the University of Rochester, the Bridge Project explores ways of detecting emotions from speech, without interpreting speech content, or using facial expressions or body gestures. This sort of emotion detection is likely to have a broader appeal, as it is less intrusive than interpreting speech content or capturing images. Health care providers and researchers can put emotion detectors and other behavior sensing technologies on mobile devices for patient monitoring or behavior studies. Also, emotion recognition technology will be an entry point for elaborate context-aware systems for future consumer electronic devices or services.
- Sound examples: female speaker performing the "pride" emotion
- Sound examples: male speaker performing the "sadness" emotion
The MATLAB GUI for Speech-based Emotion Classification
Step 1: File loading
The main panel of the GUI is shown in Figure 4. The user first chooses one speech file from the local directory. The gender of the speaker and the true emotion of the speech file will be automatically shown on the GUI. In this demo, the gender of the speaker is male, and his true emotion labeled in the LDC dataset is anger. Then the users enter their desired relative confidence threshold value, which is a value larger than or equal to 0. For example, we enter 0.2. A larger value means that we require a more stringent emotion detection result from the GUI.
Step 2: Emotion classification
The emotion classification consists of two steps: feature extraction and emotion classification using hybrid kernel SVM and thresholding fusion. As shown in Figure 5, the GUI plots selected speech features for each 60-ms long frame of the speech utterance, including pitch, energy, and the frequency of the first four formants.
Step 3: Output emotion classification results
The GUI outputs the gender-independent emotion classification result onto a valence-arousal coordinate. As Figure 6 shows, the predicted angry emotion falls into the active and negative coordinate.
MATLAB GUI Demonstration Video
The Noise-resilient BaNa Pitch Detection Algorithm
To test the noise resilience of our pitch detection algorithm, we add test speech data with different types of noise at different signal-to-noise ratio (SNR) values. For example, the following speech samples are generated by a female speaker performing the pride emotion, with 8 types of surrounding noise: babble noise, destroy engine noise, destroy operations noise, factory noise, high frequency channel noise, white noise, pink noise, and noise recorded in a Volvo vehicle. The SNR is 3dB.
- clean speech
- speech with 3dB babble noise
- speech with 3dB destroy engine noise
- speech with 3dB destroy operations noise
- speech with 3dB factory noise
- speech with 3dB high frequency channel noise
- speech with 3dB white noise
- speech with 3dB pink noise
- speech with 3dB vehicle noise
You can also listen to the following audio files with different SNR values of babble noise, which are performed by a male speaker with sadness emotion. The clean speech data is also listed.
- clean data
- speech with 20dB babble noise
- speech with 10dB babble noise
- speech with 3dB babble noise
- speech with 0dB babble noise
The source code for the BaNa pitch detection algorithm as well as the synthetic noisy speech files are available for download in the Code section
- N. Yang, W. Cai, H. Ba, I. Demirkol and W. Heinzelman, "BaNa: A Ready-to-use Noise Resilient Pitch Detection Algorithm for Speech and Music," in submiission [Data and Code].
- N. Yang, R. Muraleedharan, J. Kohl, I. Demirkol, W. Heinzelman and M. Sturge-Apple, "Speech-based Emotion Classification Using Multiclass SVM with Hybrid Kernel and Thresholding Fusion," Proceedings of the 4th IEEE Workshop on Spoken Language Technology (SLT), Miami, Florida, December 2012. [Paper] [Code]
NOTE: A bug was found in the database used to generate the results in this paper! We are working to redo the experiments and report the true accuracy of our approach for emotion classification using Multiclass SVM with Hybrid Kernel and Thresholding Fusion.
- He Ba, Na Yang, Ilker Demirkol and Wendi Heinzelman, " BaNa: A Hybrid Approach for Noise Resilient Pitch Detection," 2012 IEEE Statistical Signal Processing Workshop (SSP 2012), Michigan, USA. [Paper] [Data and code]
- A press release by University of Rochester: Smartphones Might Soon Develop Emotional Intelligence
- Professor Wendi Heinzelman on NBC Channel 10 News:
- Report by TechNewsDaily Emotion-Detecting Software Listens In
More TV interviews and news reports about the Bridge project will be added soon, media including Jay Thomas radio show on Sirius and XM Satellite radio in New York, ABC News Radio, and IEEE Spectrum, etc.