ï~~Proceedings of the International Computer Music Conference (ICMC 2009), Montreal, Canada August 16-21, 2009 (b) Draw a base time ci and a frequency bin bi jointly from the spectrogram/joint distribution 0ki: {cb} N Mult(ckl ) (6) (c) Set the observed time wi for quantum i based on the base time ci and the time offset li: Wi = Ci + li (7) Figure 1. The graphical model for SIMM. Shaded nodes represent observed data, unshaded nodes represent hidden variables. Directed edges denote that the variable pointed to depends on the variable from which the edge originates. Nodes with two variable names denote tuples drawn jointly-for example, ci and bi are drawn jointly from a multinomial distribution with parameter ki, and depend on both ki and bki. Only bi is directly observed, so only that half of the node is shaded. Plates denote replication of each variable within the plate by the number at the lower right. quantum per window/bin; higher values of v yield a closer approximation to the continuous spectrogram and more expense. The order of these quanta is arbitrary, so we can model them as being drawn independently from our model. 2.2. Generative Process We assume we are given a set of K normalized magnitude spectrogram matrices Ok of size C x B, such that okcb is the magnitude in frequency bin b at window c in sound source k, and ECl - 1Okcb- 1 for each k e {1,..., K}. These spectrograms come from the sound sources we will use to reconstruct the target sound. The normalized spectrograms can also be interpreted as joint multinomial distributions over base times c and bins b. That is, okcb gives the probability of drawing a quantum i with base time c and frequency b given that the quantum is coming from the kth source sound. The generative process for SIMM is: 1. Draw a K x L matrix w defining a joint multinomial distribution over sources k and time offsets 1 from a symmetric Dirichlet distribution with parameter 77: w r - Dir(7,..., 7) (4) (0kl is the joint probability of drawing a quantum from source k with time offset 1. 2. For each quantum ie {1,...,N}: (a) Draw a source ID ki and a time offset li jointly from Mult(w): {ki,li} ~ Mult (w) (5) 3. For each time w and frequency B, count the quanta appearing at w and b to yield Ywb, the magnitude in the quantized spectrogram at w and b. Each observed quantum i appears at time wi and frequency bin bi, which are selected according to the process above. We assume that quanta always add constructively. This assumption ignores the possibility of phase cancellation between sources, but it makes our simple mixture modeling approach possible. We leave building a more complicated phase-aware model as future work. Figure 1 shows SIMM as a graphical model, which summarizes the dependencies between the variables. Given this generative process and an observed spectrogram ^, we will infer values for the process's hidden parameters k, 1, w. 3. INFERENCE AND SYNTHESIS Our primary objective is to find a good value for the matrix w, which defines the joint distribution over time offsets 1 and sources k. Once we have inferred w from the data, it will tell us by how much to time-shift and scale each short component to recreate the target sound. 3.1. Gibbs Sampler We use Gibbs sampling, a Markov Chain Monte Carlo (MCMC) technique that allows us to approximate a sample from the posterior distribution P(k, 1 w, b, b, 77), since this distribution is difficult to compute analytically. In Gibbs sampling, we repeatedly sample new values for each variable conditioned on the values of all other variables. After an initial "burn-in" period, the distribution of the sampled k and 1 converges to their true posterior distribution [4]. We can avoid sampling w, since we have placed a conjugate Dirichlet prior on w and can therefore compute the posterior predictive likelihood of {ki, li} given the other k's and l's (denoted k-i and l-i) and the hyperparameter 77. We therefore resample only the values for the source indicators k and the time offsets 1. This leads to faster convergence, since it lets us work in a lower-dimensional space. Once we have estimates for k and 1, we can compute the Maximum A Posteriori (MAP) value for w k, 1, 77. To resample each pair ki, li, we need to compute the joint posterior likelihood that the quantum i appearing at time wi 168
Top of page Top of page