~Proceedings ICMCISMCI2014 14-20 September 2014, Athens, Greece
The Ghost in the MP3
Ryan Maguire
Virginia Center for Computer Music
[email protected]
ABSTRACT
The MPEG-1 or MPEG-2 Layer III standard, more commonly referred to as MP3, has become a nearly ubiquitous digital audio file format. First published in 1993 [1],
this codec implements a lossy compression algorithm
based on a perceptual model of human hearing. Listening
tests, primarily designed by and for western-european
men, and using the music they liked, were used to refine
the encoder. These tests determined which sounds were
perceptually important and which could be erased or
altered, ostensibly without being noticed. What are these
lost sounds? Are they sounds which human ears can not
hear in their original contexts due to our perceptual limitations, or are they simply encoding detritus? It is commonly accepted that MP3's create audible artifacts such
as pre-echo [2], but what does the music which this codec
deletes sound like? In the work presented here, techniques are considered and developed to recover these lost
sounds, the ghosts in the MP3, and reformulate these
sounds as art.
1. TECHNICAL BACKGROUND
The MP3 standard, designed in the early 1990's by the
Moving Pictures Experts Group, has become an interesting object of critique in contemporary technology studies
[3]. How a standard which subtly reduces the audible
quality of sound files has remained in place, despite
massively increased bandwidths and storage capacity is
impressive, and highlights the foresight (and fortune) of
the format's creators. Due to a complex combination of
market and social factors, the majority of music listeners
today continue to prefer a standard which optimizes the
download times and storage capacity of their audio
devices [4]. These are often portable machines such as
the iPod, on which much listening occurs in noisy environments (gyms, subways, city streets) through (often
cheap) ear bud headphones and inexpensive preamplifiers. The loss of fidelity from these external factors, along
with the cleverness with which MP3s are coded, a socialization to the sound of MP3 files, and other factors have
obviated the need for an upgrade to higher fidelity
formats for most end users [5].
Regardless, the MP3 is not always the most appropriate
format for a given task, and a critical evaluation of the
technology and its limitations is warranted. Many listeners today listen exclusively to MP3 files, even in settings
Copyright: 0 2014 Ryan Maguire. This is an open-access article distributed under the terms of the Creative Commons
Attribution License 3.0 Unported, which
permits unrestricted use, distribution, and reproduction in any medium,
where the gains from a higher fidelity format would be
clearly perceptible. This lossy compression codec has
thus come to dominate unanticipated listening spaces.
Despite its heralded performance in listening tests, the
MP3 compression codec does generate audible artifacts
and remove perceptible sonic information. MP3 encoding
relies primarily on masking curves, used to calculate frequency and temporal masking [10]. By adjusting masking
thresholds, more or less information can be removed from
the uncompressed audio depending on the desired target
file size. At low bit rates, due to sample rate reductions
and low pass filtering, frequencies from the extreme
edges of the human hearing range are further attenuated.
For example, white, pink, and brown noise, when compressed to the lowest possible MP3 bit rate [6], sound
very different from the original random signal.
213.2
18841.6
x.649.9~
2 10766.6
80 75.0
-17.5
-20.0
-22.5
-25.O
-27.5
-30.0
-32.5
-35.O
-37.5
-40.0
261
Figure 1. White, Pink, and Brown Noise - Uncompressed WAV.
21533.2
18841.6
16149.3
13458.3
107 66.
u:807 5. C
-20.0
-25.0
-27.5
-30.0
-32.5
-3.
R-37,5
Figure 2. White, Pink, and Brown Noise - 8kbps MP3.
provided the original author and source are credited.
- 24 -