LZW Compression of Musical Files
Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. Please contact : [email protected] to use this work in a way not covered by the license.
For more information, read Michigan Publishing's access and usage policy.
Page 285 ï~~LZW COMPRESSION OF MUSICAL FILES Uri Shimony, Noam Elroy and Ehud Hamami Laboratory of Computer Music Engineering Department of Electrical.Engineering Technion - Israel Institute of Technology Haifa 32000, Israel INTRODUCTION This article describes the application of the Lempel-Ziv-Welch (LZW) compression algorithm [1], to musical data. The algorithm is known as a lossless algorithm, since no information is lost in a cycle of compression and decompression, and the resultant data is identical to the original one. There are several good reasons for compressing musical data. The first reason is to find the compression ratios (CR) of the musical files. The CR is defined as the ratio between the sizes of the source and the compressed files. Roughly speaking, the CR of a file gives an idea of how much of the information in the file is redundant. Thus, the greater the amount of redundant information in a file, the higher is its CR. Another reason for using compression of musical data is to reduce the size of memory required for the storage of information, or to reduce the time required for transmission of the information. For example, compressed musical sounds stored in musical instruments can be used as a basis for sound processing and generation. THE LZW COMPRESSION OF FILES One characteristic of the LZW compression algorithm [1] is that it uses a dictionary to replace groups of characters in the source file, called "words", by code numbers of fixed size in the compressed file. The fixed size code number can consist of 8 bits (in which case the dictionary is limited to 256 entries) or more. Compression can start with an empty dictionary. Words are then added one by one to the dictionary during the process of compression. This is called a compression with an open dictionary. Another form of compression uses a full or closed dictionary in which no new entries are allowed into the dictionary during the compression process. Compression is achieved when long character strings are replaced by shorter code numbers. One important advantage of the LZW algorithm is that it can adapt itself to any type of file without previous knowledge of its statistics. FORMATS OF MUSICAL FILES USED FOR COMPRESSION ALPHA-NUMERIC (AN) files. In this type of file music was expressed in alphanumeric ASCII. Each note was expressed by two bytes: The first byte contained the octave number between 1 and 7, and the note symbol C,D,E,F,G,A or B, including 285
Page 286 ï~~sharps, flats and rests. The second byte contained the duration in units of 1/64-ths, e.g., a quarter is written as 8, etc. Data to files of this type was written manually from sheet music. Scores with several instruments were entered instrument after instrument. DIGITAL ACOUSTIC (DA)files. Digital acoustic files were digitally-sampled sound signals. The source for these files was music recorded on regular mono audio cassettes. These analog signals were convened to one byte digital format at a sampling rate of 62.5 kHz, using Motorola's 8-bit microprocessor MC68HC 11, programmed by the Exormacs computer and executed via the Mice Emulator under IBM-XT control. Because of the high memory capacity required, these files consisted of only a few seconds of music. MIDI (MD)files. MIDI files were recorded from a KORGPOL4150, by connecting it to an IBM-XT computer. Special software copied the files to the hard disk while;they were being formed. FREQUENCY-DURATION (FD)files. Each note in these files was represented by a two-byte word. The first byte recorded the rounded frequency of zero crossings expressed in units of 256 Hz. Thus, the frequency is divided into categories, where each category was 256 Hz wide. The second byte recorded the duration of the notes in units of 4 msecl These files were created in real time by sampling the music from the same audio cassettes mentioned before. Zero crossings of the signal were used to calculate;the frequency in real time. After the frequency of a note changed by more than 256 Hz, its frequency and time duration were recorded. Each file consisted of about 20 to 30 seconds of music. The format of this type of file was imposed by the real time processing requirement. It is noted that the zero-crossing frequency is not the frequency of the fundamental harmonic of a tone, but is rather obtained from all the harmonics in a tone which cause zero crossings. RESULTS AND DISCUSSION OPEN DICTIONARY COMPRESSION. More than 30 musical files were compressed, from all four types and from various kinds of musical styles: Beethoven's 5th symphony; popular classical; classical and popular piano music; spanish guitar; popular Israeli, USA, and South-American singers; folk Russian instrumental and choir; rhythmic dances, etc. The curve in Fig. 1 shows how the size of a compressed file (vertical axis) increases as the compression 'of the source file advances (horizontal axis). A straight line with a constant slope would have yielded a constant CR, where its slope equals 1/(CR). However, Fig. 1 shows a slightly curved line with a gradually decreasing slope (i.e. increasing CR), 'which for a source file size of 25000 words and an open dictionary gives a CR of 5.985'2 and a dictionary of 5482 entries. This particular file was of the FD type, formed from a few tens of seconds from Beethoven's 5th symphony. One surprising result of this work was that all types of musical files of similar sizes, when compressed with an open dictionary, gave about the same CR. Source files of 25000 words gave CRs between 5.62 to 6.50, and the corresponding dictionary sizes were between 5841 and 5060 entries. It is noted that this is true for a very broad selection of musical styles and different types of encoding. Moreover, the open dictionaries increased all at similar rates and saturated practically at similar sizes. Preliminary observation shows that MD files in our selection tended to yield a narrower 286
Page 287 ï~~range of CRs, between 5.62 and 5.80, than other types. The high CRs obtained by compression of musical files is in distinct contrast with the CRs obtained from various written texts. Welch [1] reported about CRs of 1.0 for floating point arrays, 1.5 for object computer code, 1.8 for regular english text, 2.3 for program source code, 2.6 for system log data. Only COBOL source files gave higher CRs of between 2 to 6. (There was a great variation in the formats of the COBOL files; the high CR was obtained from files consisting of large blank spaces). DICTIONARIES CLOSED AT FIXED SIZES. When a dictionary is closed at a fixed size and the compression continues after closure, the CR obtained is lower;than that of an open dictionary. Moreover, the smaller the closed dictionary, the lower is the CR. As an example, compressing an MD file (mnr.src) of 25000 words, of Israeli music to lyrics of Ehud Manor, with an open dictionary, gave CR = 5.7354; closure at 1024 dictionary gave a CR of 4.5455; closure at 512 gave.a CR of 3.8462, and closure at 256 gave a CR of 3.2468. COMPRESSION WITH A FOREIGN DICTIONARY. This compression was carried out using a closed dictionary, usually of 1024 entries, which had been previously formed by compression of another file. The rationale behind this was to find correlations between CRs of different kinds of music. When we used a closed 1024 dictionary, obtained from an FD file (ans.src) of an Israeli singer, Arik Einstein, to compress all other files, including a different file of the same singer, no relationships between files could be found. The CRs varied between 4.22 to 5.37. It is thought that the reason was that the sizes of the musical files were too short to allow any "personal signature" to appear. REPEATED COMPRESSION WITH OPEN DICTIONARY. As an example we present results obtained from repeated compression of the same FD 25000 word file, (kld.src) formed from Richard Kleiderman's piano recording, with an open dictionary. Compression number 1 2 3 4 5 Compression ratio 6.3799 8.1635 8.8434 9.2722 9.6157 Dictionary size 5144 9149 12846 16372 16383 Further compression did not increase the CR and did not increase the dictionary. It seems that the essential nonredundant core of information in the file should be found from the highest CR. FREQUENCY DISTRIBUTION OF FD files. The information stored in the FD files was used to find the distribution of the frequencies of zero crossings in various pieces of music. Fig. 2 shows three sample spectra of FD files. The horizontal axis is the zero crossing frequency divided into categories (of bandwidth = 256 Hz). The vertical axis is the accumulated duration for each category, measured in milliseconds. The file named btv.src (Beethoven's 5th) has a wide spread spectrum, corresponding to the side spectrum of a full Symphonic orchestra. In comparison, the spectrum of the file mik.src (one of Michael Jackson's recordings) peaks around 1600 Hz, with less high frequencies and more low frequencies. The third spectrum, of file rshl1.src (Russian men's choir), peaks around 1200 Hz with much more emphasis on low frequencies than the previous spectra. The connection between the spectrum of a file and its CR is not clear enough; although there seems to be a trend where a wide-band spectrum has a lower CR than that of a narrow-band spectrum. 287
Page 288 ï~~REFERENCES: Welch, T.A. 1984. "A Technique for High Performance Data Compression", IEEE Computer. Acknowledgement: Authors of this article express their sincere gratitude to Professor Jacob Ziv, who is one of the inventors of the LZW algorithm, for his interest in the project and for many helpful discussions. EVALUATION OF "BTV.SRC" 75000 s s15? 2; I 519. -... -...1r.kC I Wi. Â~.i,. 4.:.!- +-t.i..... - F.+,% - o ----+ i:+ -,-.": "....... Â~:T:. t i+..L..! '," '',": SPECTRUM OF "btv. src" 9 19r00 3200 400 649 8000 COMPTOTAL DURATIO TIME PER FREQUENCY5 9852 DICTIONARY LENGk.sTN 5492 500 11 sas 199 sue assese 9 16 3200 4000 6400 0000 TOUTPUT INTEGERS PERION IE NPUT CHARACTERSCY Figure 1. SPECTRUM OF "Lv. src" 1599t12418J...mxrt.... j Tr"t1-T!. 14.-"'w, l e e s I i_ T... __.. _:_ _ ".......... _......... _ T..'" -,'- _......:.:. 9 1699 320 4999 6499 899 TOTAL DURATION TIME PER FREQUENCY SPECTRUM OF "ir.o" - ''OGO j4598:I7TI'I L 9 1609 3299 4999 6499 9999 _________________TOTAL DURATION TIME PER FREQUENCY __________ SPECTRUM OF "rsli no 10T T DUAINTM ERFEUNY__________ Fgr 21 288