ï~~Proceedings of the International Computer Music Conference (ICMC 2009), Montreal, Canada
August 16-21, 2009
POST-PROCESSING FIDDLE: A REAL-TIME MULTI-PITCH TRACKING
TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE
WITHIN LIVE PERFORMANCE SYSTEMS
Andrew N. Robertson, Mark D. Plumbley
Centre for Digital Music
School of Electronic Engineering and Computer Science
Queen Mary University of London
London, England
andrew.robertson@elec. qmul.ac.uk
ABSTRACT
We present a method for real-time pitch-tracking which generates an estimation of the relative amplitudes of the partials
relative to the fundamental for each detected note. We then
employ a subtraction method, whereby lower fundamentals
in the spectrum are accounted for when looking at higher
fundamental notes. By tracking notes which are playing, we
look for note off events and continually update our expected
partial weightings for each note. The resulting algorithm
makes use of these relative partial weightings within its decision process. We have evaluated the system against a data
set and compared it with specialised offline pitch-trackers.
1. INTRODUCTION
Polyphonic or multiple pitch-tracking is a difficult problem in signal processing. Most existing work in multi-pitch
tracking is designed for Music Information Retrieval which
takes place offline on large data sets. A method for multiple frequency estimation by the summing of partial amplitudes within the frequency domain was presented by Klapuri
[5], who makes use of an iterative procedure to subsequently
subtract partials within a pitch detection algorithm. Pertusa
and Inesta [7] list potential fundamental frequency candidates in order of the sum of their harmonic amplitudes.
Existing real-time algorithms for pitch detection include
fiddler, a Max/MSP object by Puckette et al. [8] based on
a Fourier transform which employs peak picking. Jehan [4]
adapted the algorithm to analyse timbral qualities of a signal. In the time domain, de Cheveign6 and Kawahara's Yin
[2] is a widely-known algorithm which uses auto-correlation
on the time-domain signal to calculate the most prominent
frequency. However, these algorithms are more suited to
monophonic signals and they are not reliable enough to generate a MIDI transcription of audio from a polyphonic instrument.
We proceed from the observation that any given pitch
will also create peaks at frequencies corresponding to its
partials. In our approach, we iteratively subtract partials
within the frequency domain in order to aid a real-time pitch
detector. A learning method is employed to optimise the
expected amplitudes of the partials of each detected note
by continually updating the weights whenever a note is detected. In addition, we model the variations within the amplitude and summed partial amplitudes of detected notes.
The weightings for each partial derived from observations
are used within the decision-making process.
Our motivation for this method is for use within live performance, to generate information about new notes played
by an instrument. This can then be used to provide accompaniment, either directly, or by aligning the information with
an expected part. Previous research into pitch tracking for
interactive music has highlighted the importance of minimal
latency and accuracy within noisy conditions [3]. Since our
algorithm is employed for real-time audio-to-MIDI conversion within a performance system, we require a fast detection of notes and fast computation time.
2. METHOD
2.1. Implementation and Pre-Processing
Our algorithm has been implemented in Java within a
Max/MSP patch and in doing so, we made use of thefiddle~
object [8] in the pre-processing stage. Ordinarily, fiddler
provides its own fundamental frequency estimation, but it
also gives the 'uncooked' data of the top N frequencies
from the peak picking process and their respective amplitudes above a suitable threshold. Sincefiddle~ has been optimised for fast processing within a real-time environment, it
is well-suited to providing an efficient FFT and noise reduction process used to provide the data for our partial-removal
system. We use a frame of 2048 samples with a hop-size
of 1024, so that our detection of notes is a fast as possible
227