ï~~Proceedings of the International Computer Music Conference (ICMC 2009), Montreal, Canada August 16-21, 2009 POST-PROCESSING FIDDLE: A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University of London London, England andrew.robertson@elec. qmul.ac.uk ABSTRACT We present a method for real-time pitch-tracking which generates an estimation of the relative amplitudes of the partials relative to the fundamental for each detected note. We then employ a subtraction method, whereby lower fundamentals in the spectrum are accounted for when looking at higher fundamental notes. By tracking notes which are playing, we look for note off events and continually update our expected partial weightings for each note. The resulting algorithm makes use of these relative partial weightings within its decision process. We have evaluated the system against a data set and compared it with specialised offline pitch-trackers. 1. INTRODUCTION Polyphonic or multiple pitch-tracking is a difficult problem in signal processing. Most existing work in multi-pitch tracking is designed for Music Information Retrieval which takes place offline on large data sets. A method for multiple frequency estimation by the summing of partial amplitudes within the frequency domain was presented by Klapuri [5], who makes use of an iterative procedure to subsequently subtract partials within a pitch detection algorithm. Pertusa and Inesta [7] list potential fundamental frequency candidates in order of the sum of their harmonic amplitudes. Existing real-time algorithms for pitch detection include fiddler, a Max/MSP object by Puckette et al. [8] based on a Fourier transform which employs peak picking. Jehan [4] adapted the algorithm to analyse timbral qualities of a signal. In the time domain, de Cheveign6 and Kawahara's Yin [2] is a widely-known algorithm which uses auto-correlation on the time-domain signal to calculate the most prominent frequency. However, these algorithms are more suited to monophonic signals and they are not reliable enough to generate a MIDI transcription of audio from a polyphonic instrument. We proceed from the observation that any given pitch will also create peaks at frequencies corresponding to its partials. In our approach, we iteratively subtract partials within the frequency domain in order to aid a real-time pitch detector. A learning method is employed to optimise the expected amplitudes of the partials of each detected note by continually updating the weights whenever a note is detected. In addition, we model the variations within the amplitude and summed partial amplitudes of detected notes. The weightings for each partial derived from observations are used within the decision-making process. Our motivation for this method is for use within live performance, to generate information about new notes played by an instrument. This can then be used to provide accompaniment, either directly, or by aligning the information with an expected part. Previous research into pitch tracking for interactive music has highlighted the importance of minimal latency and accuracy within noisy conditions [3]. Since our algorithm is employed for real-time audio-to-MIDI conversion within a performance system, we require a fast detection of notes and fast computation time. 2. METHOD 2.1. Implementation and Pre-Processing Our algorithm has been implemented in Java within a Max/MSP patch and in doing so, we made use of thefiddle~ object [8] in the pre-processing stage. Ordinarily, fiddler provides its own fundamental frequency estimation, but it also gives the 'uncooked' data of the top N frequencies from the peak picking process and their respective amplitudes above a suitable threshold. Sincefiddle~ has been optimised for fast processing within a real-time environment, it is well-suited to providing an efficient FFT and noise reduction process used to provide the data for our partial-removal system. We use a frame of 2048 samples with a hop-size of 1024, so that our detection of notes is a fast as possible 227
Top of page Top of page