Page  554 ï~~MacSonogram: a Programme to Produce Large Scale Sonograms for Musical Purposes Peter Lund6n and Tamas Ungvary Dept. of Speech Communication and Musical Acoustics Royal Institute of Technology (KTH) Box 700 14 S-100 44 Stockholm Sweden email: Abstract. The lack of a common notation system is a severe problem in both electroacoustic music and in other types of music which cannot be notated. One possible solution to this problem is the use of large scale sonograms. The usefulness of this tool in the field of music has been shown by Robert Cogan [Cogan 84][Cogan 86], Simon Waters and Tamas Ungvary [Waters, Ungvary 90]. This paper will present the MacSonogram software and discuss the problems of creating large scale sonograms for musical purposes. The programme can run on any Macintosh computer with 256 gray-level screen or a colour screen. Introduction Sonograms are well known in speech research and other related fields but are not widely used for musical purposes. The traditional techniques to produce sonograms are not directly transferable to the musical field. There are a number of problems that has to be solve to make the results useful for musical purposes. The difference of time span is the first problem. A typical sonogram in speech research are a few seconds long and has a resolution of about 10 mS while in the musical case a typical sonogram is 10 or may be 20 minutes long with the same order of resolution or slightly less. The second problem is the dynamic range of music which outrages the speech by several orders of magnitude. There are a number of solutions to this problem which will be discussed later. The third problem is the question of what should be seen in the sonogram. Speech researchers wishes to see the sound from a physical viewpoint while the musicians are more interested in a perceptual view. This leads to the standpoint that a large scale sonogram for musical purposes should include a psychoacostic model of hearing. Overview of the Programme The programme has two main parts. The first part analyses sound-files and produces sonogram-files. The second part reads sonogram-files and visualizes them on the screen or on paper. MacSonogram can store sonograms in several graphical file formats, e.g. TIFF and PICT. This facilitates retouching of sonograms by other graphical software. The format of the sonogram-files is closely related to the format of pixel-maps in the Macintosh operating system. The data are stored as slices of a huge pixel-map, and can be spliced directly into a background pixel-map which is then dumped onto the screen. This facilitates real-time visualization of sonograms on the screen (real-time visualization is not implemented in the current version). ICMC 554

Page  554 ï~~The Graphical Layout The sonogram has two views, the frequency-amplitude view and the amplitude view. The amplitude view which is optional, can be placed over or under the frequency-amplitude graph. The two graphs has common horizontal axis which represents the time. In the frequencyamplitude view the vertical axis is the frequency while the amplitude is represented as levels of gray-shade. In the amplitude view the vertical axis represents the overall amplitude. There are two vertical scales, one for each graph and one common horizontal time scale. All scales are optional and can be selected and formatted by the user. The Analysis The sound-file is processed in frames. A window function is applied to each frame and the spectral density function of the frame is computed by the means of a short time Fourier Transform (STFT) [Allen 77][Serra, Smith 90]. The STFT is computed as: N-1 X(m) = 1: w(n)xi(n+lH)e-jwmn, l=0,1,... (1) n =0 where w(n) is a real window. H is the hop size, or the step in time between successive segments and 1 is frame number. The result of the STFT is a series of spectra, one for each frame 1. From each spectrum the spectral density is computed by [Oppenheim, Schafer 75]: S (m1)=_IX(m)12-real (Xi(m ))2+ imag (X(m ))(2 N N where N is the size of the DFT. The window function is selected by the user. The user can define the length of the window which is M, where M is odd and must satisfy the inequality: 0 < M < N. The part of the frame that exeeds M is zero padded. The choice of window function affects the resolution in both the frequency and the time domain. The optimal values ofN and M are determined by the choice of window function, i.e. in the case of a Hamming window M should be approx. N/4 [Allen 77]. For a more elucidative discussion of the effects of different window functions see [Harris 78]. The next step in the analysis is the mapping of the spectral density function to the frequency channels. Each row in the Sonogram corresponds to a frequency channel. The amplitudes of the frequency channels are compute by: qk+i-1 Qt(k ) =2: Si(m), k = 0,1,..., K-1 (3) n =qk where K is the number of frequency channels and qk is the first and qk+1-l is the last DFT bin to be included in frequency channel k. The indexes qk are determined by the fmin, fimax, freqscale, K and N. The q's defines the mapping fromthe spectral density function to the frequency channels. ICMC 554b

Page  554 ï~~The Analysis Parameters The user can control the analysis by the following parameters: " Definition of the segment of a sound-file to be analysed, the starting point, the ending point of the segment and the sound-file. * The range of the amplitude and the type of scale to be used, logarithmic or linear. This defines the mapping from amplitude to levels of gray-shade. * The analysis window function. In the current version the user can choose between rectangular, Hanning, Hamming, Bartlett and Blackman window functions [Karl 89][Oppenheim, Schafer 75]. " The hop size, or the time advance, of the analysis window (H). This defines the distance in time between successive columns in the final output. " The range of the frequency domain, it's upper (finax) and lower limits (fmin) and the type of scale to be used (freq-scale), i.e. logarithmic, linear or Bark. " The size of the DFT (N) and the size of the analysis window (M). The resolution of the frequency and the time domain are depending on this parameters. " The number of frequency channels (K). This parameter determines to the number of rows in the graphical output and it is related to the range and scale of the frequency domain. Example The following example is a sonogram made by MacSonogram. It shows the first twenty seconds ofRolfEnstrom's electroacoustic piece "Directions" which is realised at EMS in Stockholm. The material consists mostly of synthetic sound produced by the PDP-15 hybrid-system at EMS. Amongst the interesting details in the sonogram are the arch like structures which have it's origin in phase shifting. The frequency axis is logarithmic and spans between 50 and 15000 Hz, the resolution in time is about 40 mS. Example 1 The first twenty seconds ofRolfEnstrom's "Directions as sonogram (only frequency-amplitude view showed). ICMC 554c

Page  554 ï~~Conclusion and Future Plans Even if MacSonogram can produce surprisingly clear sonogram of complex musical structures, there has to be refinements in future versions to improve the usability of the programme. When the sound-file to be analysed has a large dynamic range, the sonogram very often contains a lot of inaudible sounds which makes it difficult to read it. The user can not simply solve this problem by decreasing the amplitude range. This will leave out the soft nuances from the sonogram. Some kind of dynamic behaviour is needed. We believe that the optimal solution to this problem is to include a model of psychoacoustic masking effects, both spectral and temporal, and a model of loudness. This will increase the quality of the sonograms and make them more easy to read. To facilitate this the analysing part of the programme has to be reimplemented. A filter bank replacing the STFT analysis would be a solution. The current version of MacSonogram is an experimental version. This means that there are a lot of parameters to be defined by the user, but most of the them have a default value. As we get more experienced with the programme, we can device algorithms for automatic definition of several parameters. This will make MacSonogram much more easy to use. References [Allen 77]. Allen, J. B. "Short Term Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform." IEEE Trans. on Acoustics, Speech, and Signal Processing, vol ASSP-25, no. 3, June 1977. [Cogan 84]. Cogan, R. "New Images of Musical Sound.", Havard University Press 1984. [Cogan 86]. Cogan, R. "Imaging Sonic Structure.", Proceedings of the 1986 International Computer Music Conference, San Francisco: Computer Music Association. [Harris 78]. Harris, F. J. "On the Use of Windows for Harmonic Anaysis with the Discrete Fourier Transform." Proceeding of the IEEE 65(11). [Karl 89]. Karl, J. H. "An introduction to Digital Signal Processing." San Diego: Academic Press, Inc, 1989. [Lund6n, Ungvary 89]. Lund6n, P. and Ungvary, T. "Sonogram 7.0, Technical Description." KACOR report 14/89, Dept. of Speech Communication and Musical Acoustics, Royal Institute of Technology. [Oppenheim, Schafer 75]. Oppenheim, A. V. and Schafer, R. W. "Digital Signal Processing." New Jersey: Prentice-Hall, Inc. 1975. [Serra, Smith 90]. Serra, X. and Smith J, III. "Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic plus Stochastic Decomposition." Computer Music Journal 14:4, MIT Press 1990. [Waters, Ungvary 90]. Waters, S. and Ungvary, T. "The Sonogram: A Tool for Visual Documentation of Musical Structure." Proceedings of the 1990 International Computer Music Conference, San Francisco: Computer Music Association. ICMC 554d