ï~~Proceedings of the International Computer Music Conference (ICMC 2009), Montreal, Canada August 16-21, 2009 and bin bi was drawn from a source k at a time offset 1: P(ki = k, li = l|wi, bi, k-i,l-i, 4, i1) o P(ci = wi-l,biki= k, li= 1,0) x P(ki = k, li = 1|k-i, i, 1) (8) The joint likelihood of the base time ci = wi - 1 and the frequency bin bi is given by the component distribution 0k: P(ci = wi -1, b; ki = k, li =1, cbk) _ Okcibi (9) The likelihood of the pair k, 1 conditioned on r and the other source indicators k-i and time offsets l-i is,,aetktc:: i i.ice:: t;:::....: n::;: P(ki = k, li = lk-i, -i, 7) fw P(wr |)OWkldW nkl+ 77 N-1 +KLrI (10) Where nkl is the number of other quanta coming from source k with time offset 1. We can compute the integral in equation 10 analytically because the Dirichlet distribution is conjugate to the multinomial distribution. Using equations 9 and 10, equation 8 becomes: nkl 7 P (ki = k, li - 1 w i, bi, k-iI1-i, 0, 1 ) o c kcibi - 1 + K7 N-1+KL77 (11) We repeatedly resample the source indicator ki and time offset li for each observed quantum i conditioned on the other indicators k-i and Li until 20 iterations have gone by without the posterior likelihood P(k,l w, b, r, O) yielding a new maximum. At this point we assume that the Gibbs sampler has converged and that we have found a set of values for k and 1 that is likely conditioned on the data. Once we have drawn values from the posterior for k and 1, we compute the MAP estimate ^ of the joint distribution over sources and times w conditioned on k, 1, and the hyperparameter r7. Since the prior on w is a Dirichlet distribution, the MAP estimate W^ of w |k, 1, r7 is given by: )k/l cmax(0, nkl + 7- 1) (12) Here nkl is the total number of observed quanta that came from source k at time 1. Figure 2. Top: Spectrogram of 2.3 seconds of Young MC's "Bust a Move." Bottom: Spectrogram of 2.3 seconds of the same song reconstructed from spoken words from the TIMIT corpus using our SIMM model. and add the result for each source, we obtain a signal whose spectrogram approximates the spectrogram of the target. Figure 2 shows an example of the final result of this process. 3.3. Resampling r7 r7 controls the sparseness of our joint distribution w over times and sources. Rather than specify r7a priori, we place a gamma prior on r and adapt the hyperparameter sampling technique in [1] to resample r each iteration. 3.2. Sonifying the MAP Estimate 4. EVALUATION By sonifying ^, the MAP estimate of w, we can produce an approximate version of our input audio using only the short sources corresponding to the component distributions 0. 6kl gives the amplitude of source k at time offset 1, which corresponds to sample S(1 - 1), where S is the number of samples per window, and samples begin at sample 0. If we convolve each short input source k by a signal g such that _= 0 if mod(t,S)#0 (13) g(tk) k+ if mod(t,S) = 0 (13) Ultimately the effectiveness of our approach should be evaluated qualitatively. Sound examples generated by the method described in this paper are available at http://www.cs.princeton.edu/-mdhoffma/icmc200 9. We also performed a quantitative evaluation of our approach. We tested SIMM's ability to find an arrangement of the given components 0 to match the target spectrogram' by computing and sonifying a MAP estimate W^ of the joint distribution over times and components as described in section 3, then comparing the sum of the magnitudes of the 169
Top of page Top of page