Datasets

URMP Dataset

We create a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises a number of simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. We anticipate that the dataset will be useful as “ground truth” for evaluating audio-visual techniques for music source separation, transcription, and performance analysis. A more detailed description and sample data is here.

Bach10

Bach10 dataset is a polyphonic music dataset which can be used for versatile research problems, such as Multi-pitch Estimation and Tracking, Audio-score Alignment, Source Separation, etc. This dataset consists of the audio recordings of each part and the ensemble of ten pieces of four-part J.S. Bach chorales, as well as their MIDI scores, the ground-truth alignment between the audio and the score, the ground-truth pitch values of each part and the ground-truth notes of each piece. The audio recordings of the four parts (Soprano, Alto, Tenor and Bass) of each piece are performed by violin, clarinet, saxophone and bassoon, respectively. A more detailed description is here. Dataset Download

Ground-truth pitches for the PTDB-TUG speech dataset:

The Pitch-Tracking Database from Graz University of Technology (PTDB-TUG) is a speech database for pitch tracking. It contains microphone and laryngograph signals of 20 English native speakers reading the TIMIT corpus. The database also provides reference pitch trajectories which were calculated from the laryngograph signals using the RAPT pitch tracking algorithm [1]. Here, we provide another version of the reference pitch trajectories, calculated using the Praat pitch tracking algorithm [2] on the microphone signals. We found that about 85% of the Praat-generated ground-truth pitches agree with the RAPT-generated ground-truth pitches. Praat-generated Reference Pitch Trajectories Download

[1] D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis (W.B. Kleijn and K.K. Paliwal, eds.), pp. 495–518, Elsevier Science B.V., 1995.
[2] P. Boersma, “Praat, a system for doing phonetics by computer,” Glot International, vol. 5, no. 9/10, pp. 341–345, 2001.

Non-stationary Noise:

For research on speech enhancement, we collected recordings of ten kinds of non-stationary noise: birds, casino, cicadas, computer keyboard, eating chips, frogs, jungle, machine guns, motor cycles, and ocean. The recording of aach noise is between one minute to three minutes long. Dataset Download.


Code

Multi-pitch Estimation & Streaming:

This code performs Multi-pitch Estimation (MPE) and Multi-pitch Streaming (MPS) on polyphonic music or multi-talker speech. For a piece of polyphonic audio composed of monophonic harmonic sound sources, this program first estimates pitches in each time frame, then it streams these pitch estimates across time into pitch trajectories (streams), each of which corresponds to a sound source. mpe_mps.zip
The MPE and MPS code is also available separately. mpe.zip, mps.zip

Multi-pitch Estimation & Streaming Evaluation:

This toolbox is for evaluating multi-pitch analysis results. It compares the estimated pitch content with the ground-truth pitch content and outputs some error measures. Help each file to see the details of their measurement. mpa_eval.zip

Soundprism:

This code implements the Soundprism online score-informed source separation system. soundprism.zip