Page  590 ï~~Audio Analysis for Rhythmic Structure Crawford Tait University of Glasgow tel:+44 (0)141 339 8855 x8333, email:crawf@dcs.gla.ac.uk Abstract: Though the utility of wavelet analysis in an audio context has been shown (Kronland-Martinet et al 87), little interpretation of the resulting coefficients has been attempted. This work utilises the semitone-based wavelet transform of (Newland 94) and investigates analysis of the coefficients in attempt to reveal aspects of temporal structure, which could ease editing and synchronisation with other media. Since it has been suggested that perception of rhythm is related to repetition, the autocorrelation function is considered. An example is shown in which the bar length in a jazz piece is automatically highlighted. Introduction The predominant way of viewing digital audio is still as a raw waveform plot of amplitude against time, leaving the burden of analysis to the user. Meanwhile, there are many areas of computer music which could benefit from automated rhythmic analysis in the time-frequency domain: for example, although none are discussed here, automatic transcription, inter-media synchronisation in performance, and automatic accompaniment could all be significantly aided by some knowledge of musically relevant points. In addition, there is a need for improved time domain representations in digital editing and the highlighting of rhythmic structure would help, since edit points are often musically significant. Wavelet Analysis A time-frequency domain representation is obviously desirable in this context, the two most common being the Fourier transform and, more recently, the wavelet transform. Wavelet analysis has been shown to provide improved properties over Fourier analysis since wavelets are localised in both the time and frequency domains, and the frequency scale is divided into octaves. Furthermore, a wavelet transform has recently been developed which allows division of the frequency scale into semitones providing perhaps the best time-frequency analysis for musical input. Frequency resolution is, however, obtained at the expense of time resolution: the highest octave band contains L/4 coefficients, where L is the number of input samples, with successively lower bands containing half the number of the band above and if these are subdivided into semitones, each band contains a twelfth the number of coefficients in the octave. Rhythm and Autocorrelation There is some degree of predictability in rhythmic audio and, having heard a piece up to some point, the listener will often know what is going to happen next. In other words, music is correlated to some degree, in that later elements are dependant on what has come before (Moore 82). In statistics, the autocorrelation function highlights such dependencies by quantifyng self-similarity under different time shifts and the next section discusses the use of a similar method in automating the interpretation of the wavelet coefficients. Method The modulus of each pair of complex wavelet coefficients of the input audio segment was computed, and a 2D array of time slices constructed. This array was then compared against copies of itself under different time shifts (up to half the total length). Time slices were treated as points in N dimensional space (N being the number of scale levels) and their average similarity calculated using the distance between them (squaring the difference between successive coefficients before summing and taking the square root). Example Figure 1 shows the envelope (obtained by plotting the maximum and minimum samples in a sliding window (Foster et al 82)) of the first 50 seconds of the jazz track "Danish Drive" by Ed Thigpen. F ig u r e I1 nL{A 4 l "ja 590 0IC MC PROCEEDINGS 1995

Page  591 ï~~The modulus of each pair of complex wavelet coefficients of this audio segment was computed, and mapped to a grey value. The non-zero part of the resulting plot is shown in Figure 2. Points to note are as follows: " many of the drum hits appear as vertical stripes (their energy is spread over a wide range of frequencies); " the repetitive bass line is discernible in the lower part of the scale range; " harmonics from brass instruments are visible in the upper levels of scale (starting around one third of the way through) and the melody can be followed: " a break when the drums and bass drop out is clearly identified towards the end, and nearer the end a number of tuned percussion hits can be seen slightly above the middle of the scale range. Figure 2 Figure 3 scale smlrt -.... +tirrie The modulus values before the point indicated by the arrowheads (just over the first eight bars) were analysed using the method described, resulting in the graph shown in Figure 3. As would be expected, the maximum value of the calculated function occurs at zero delay. However there are four other distinct delays under which the coefficients are found to be self-similar (using the measure described). Given that the tempo is an estimated 218 beats per minute, and there are seven beats in each bar, the peaks occur at delays corresponding to whole numbers of bars (the maximum delay shown corresponds to just under 8 seconds). It is worth noting that, in the bars analysed, only the bass line is strictly repetitive: there are occasional cymbal hits and alternative drum patterns, and a repetitive keyboard line with slight variations starts in the third bar. In addition, the piece is a live performance, and was sampled at a rate of only 8012 Hz. Conclusions and Further Work Previous attempts at analysis of digital audio have concentrated on pitch recognition (related to frequency) and transcription (related to both time and frequency), however it is felt that there is much to be gained by concentrating on the rhythmic structure existing in the time domain. It is hoped that the results presented here show the utility of a semitone-based wavelet transform for highlighting relevant points in time. Further work will concentrate on using higher quality input to improve resolution, as well as the problem of automating both the discovery of significant points and the rhythmic interpretation of the wavelet coefficients (Tait & Findlay 95). References and Acknowledgements (Foster et al 82) - "Toward an Intelligent Editor of Digital Audio: Signal Processing Methods": Scott Foster, W. Andrew Schloss & A. Joseph Rockmore: Computer Music Journal 6(1). (Goto and Muraoka 94) " A Beat Tracking System for Acoustic Signals of Music": Masataka Goto and Yoichi Muraoka; ACM Multimedia, 10/94. (Kronland-Martinet et al 87) - "Analysis of Sound Patterns Through Wavelet Transforms": R. KronlandMartinet, J. Morlet & A. Grossman; Intl. Jnl. Pattern Recognition & Artificial Intelligence, 1(2). (Moore 82) - "An Introduction to the Psychology of Hearing": Brian C. J. Moore; Academic Press, 1982. (Newland 94) - "Harmonic and Musical Wavelets": D. Newland; Proc.Royal Soc. London Series A - Math. & Phys. Sciences 444(1 922) pp605-620. (Tait & Findlay 95) - "Audio Analysis for Rhythmic Structure": Crawford Tait and William Findlay; University of Glasgow Department of Computing Science Technical Report no. TR-1995-1 1. I would like to thank my supervis or Bill Findlav for much inspiration and fruitful discussion. IC M C P ROC E E D I N G S 199559 591