ï~~Proceedings of the International Computer Music Conference (ICMC 2009), Montreal, Canada
August 16-21, 2009
(b) Draw a base time ci and a frequency bin bi jointly
from the spectrogram/joint distribution 0ki:
{cb} N Mult(ckl )
(6)
(c) Set the observed time wi for quantum i based on
the base time ci and the time offset li:
Wi = Ci + li
(7)
Figure 1. The graphical model for SIMM. Shaded nodes
represent observed data, unshaded nodes represent hidden
variables. Directed edges denote that the variable pointed
to depends on the variable from which the edge originates. Nodes with two variable names denote tuples drawn
jointly-for example, ci and bi are drawn jointly from a
multinomial distribution with parameter ki, and depend on
both ki and bki. Only bi is directly observed, so only that
half of the node is shaded. Plates denote replication of each
variable within the plate by the number at the lower right.
quantum per window/bin; higher values of v yield a closer
approximation to the continuous spectrogram and more expense. The order of these quanta is arbitrary, so we can
model them as being drawn independently from our model.
2.2. Generative Process
We assume we are given a set of K normalized magnitude
spectrogram matrices Ok of size C x B, such that okcb is the
magnitude in frequency bin b at window c in sound source k,
and ECl - 1Okcb- 1 for each k e {1,..., K}. These spectrograms come from the sound sources we will use to reconstruct the target sound. The normalized spectrograms can
also be interpreted as joint multinomial distributions over
base times c and bins b. That is, okcb gives the probability of drawing a quantum i with base time c and frequency b
given that the quantum is coming from the kth source sound.
The generative process for SIMM is:
1. Draw a K x L matrix w defining a joint multinomial
distribution over sources k and time offsets 1 from a
symmetric Dirichlet distribution with parameter 77:
w r - Dir(7,..., 7) (4)
(0kl is the joint probability of drawing a quantum from
source k with time offset 1.
2. For each quantum ie {1,...,N}:
(a) Draw a source ID ki and a time offset li jointly
from Mult(w):
{ki,li} ~ Mult (w) (5)
3. For each time w and frequency B, count the quanta
appearing at w and b to yield Ywb, the magnitude in
the quantized spectrogram at w and b.
Each observed quantum i appears at time wi and frequency
bin bi, which are selected according to the process above.
We assume that quanta always add constructively. This assumption ignores the possibility of phase cancellation between sources, but it makes our simple mixture modeling
approach possible. We leave building a more complicated
phase-aware model as future work.
Figure 1 shows SIMM as a graphical model, which summarizes the dependencies between the variables. Given this
generative process and an observed spectrogram ^, we will
infer values for the process's hidden parameters k, 1, w.
3. INFERENCE AND SYNTHESIS
Our primary objective is to find a good value for the matrix
w, which defines the joint distribution over time offsets 1
and sources k. Once we have inferred w from the data, it
will tell us by how much to time-shift and scale each short
component to recreate the target sound.
3.1. Gibbs Sampler
We use Gibbs sampling, a Markov Chain Monte Carlo
(MCMC) technique that allows us to approximate a sample
from the posterior distribution P(k, 1 w, b, b, 77), since this
distribution is difficult to compute analytically. In Gibbs
sampling, we repeatedly sample new values for each variable conditioned on the values of all other variables. After
an initial "burn-in" period, the distribution of the sampled k
and 1 converges to their true posterior distribution [4].
We can avoid sampling w, since we have placed a conjugate Dirichlet prior on w and can therefore compute the
posterior predictive likelihood of {ki, li} given the other k's
and l's (denoted k-i and l-i) and the hyperparameter 77. We
therefore resample only the values for the source indicators
k and the time offsets 1. This leads to faster convergence,
since it lets us work in a lower-dimensional space. Once we
have estimates for k and 1, we can compute the Maximum
A Posteriori (MAP) value for w k, 1, 77.
To resample each pair ki, li, we need to compute the joint
posterior likelihood that the quantum i appearing at time wi
168