Page  368 ï~~Applying Psychoacoustic Principles to Soundfield Reconstruction Steven Trautmann Stanford University wishbonÂ~ccrma.stanford.edu steveÂ~trdc.ti.com Abstract A soundfield reconstruction approach is taken in order to improve perceived amplitude and localization through loudspeakers. For a given frequency, an initial approximation is found, minimizing the difference between a target sound field and what can be achieved with the speaker setup in a particular listening environment. The target soundfield is then replaced with one which is perceptually the same, but easier to achieve given that listening enviroment. This process can be iterated, and in some formulations shown to converge. A key issue in this approach is what constitutes a perceptually identical target soundfield. Three possible changes in a soundfield are discussed with the goal of creating a new target soundfield which is more obtainable using loudspeakers in a given setup, while maintaining many perceptual attributes of the original soundfield. These changes are based in part on a priori knowledge of the likely position and orientation of audience members as well as an understanding of what the perceptual goals are. An example is given where a simple model represents an audience seated in rows. An initial least squares solution is found, minimizing the difference between the target soundfield and that achieved by four speakers. A new target soundfield is generated maintaining the relative phase along the rows while the relative phase between rows is allowed to change, since that is not important to anyone's perception. This in itself improves the amplitude at the ear positions without making much of a difference in the total absolute relative phase error between the ears. To improve the perceived angle of the source, a simple model of angle perception is used. 1 Introduction The history of simulating environments through loudspeakers can be thought of as following two main approaches. First is that of trying to get close to the exact signal at the ear of the listener as would be perceived in the target environment. There are many variations on this approach including the following [Schoeder and Atal, 1963], [Berkhout, 1988], [Romano, 1987], [Cooper and Bauck, 1992]. Secondly, attempts have been made at simulating the preceived qualities of the environment, [Chowning, 1977], [Jot and Warusfel, 1995]. These two approaches are not distinct, but can be hard to reconcile. One possible way to apply both ideas is to attempt to recreate a target soundfield which is different from the original target in ways that minimally affect perception. This replacement should be done such that the sources (loudspeakers) are better able to match the new target than the original target. 2 Some Iterative Approachs Consider an abstract environment consisting of sources and sample points. If there is a set linear relationship between each speaker's output at a given frequency and the phase and amplitude at any sample point due to that speaker, a least squares technique can be used to find the minimum 'distance' between a set of target phases and amplitudes at the sample points and what can be acheived by the sources at those points. This minimum 'distance' reflects both the effect of phase and amplitude. Perceptually it is sometimes better to get accurate amplitudes at the expense of phase accuracy. A simple algorithm exists to create a new soundfield which guarantees that the amplitude will improve or at least not get worse for whatever (good) approximation method is used (it works with appoaches other than least squares as well). The following four points describe this algorithm. Trautmann 368 ICMC Proceedings 1996

Page  369 ï~~1. Apply whatever approximation method to get a reconstruction (using the sources) of the original signal at the sample points. 2. Next compute a new soundfield using the phase information of the reconstructed field, but maintaining the amplitude from the original field. This produces a new target field which has the same amplitudes at each sample location as the original target, but the phases are those of the reconstructed field. 3. By making this hybrid the target, and applying the same method to get as close as possible a reconstruction, any improvement in total error must be with respect to amplitude since the phases were made identical at all the sample locations. 4. The process can be iterated as many times as desired, each time improving the amplitude response. However the total error with respect to both amplitude and phase increases each time. Mathematically, this can be described as follows: Let the desired amplitudes (indexed over k) be alk, where the initial target is alke-jOlk The acheived minimum solution set for some norm is a2ke-iO2k Then the new target is alke-jO2k, replacing the phase with that of the acheived solution. Now by defining the difference in amplitude to be Z IIalk - a2kII = Z I[alke - 02k - a2ke-3O2k II and with the next solution set a3ke- j3k having the property Z IIalke-JÂ~2k -a2ke-'2ekI Z, IIalke-3Â~2k - a3ke-J3k II (because the minimum of Z Ialke-382k - a3ke-j3k II can't be worse than what we already have) then it must be that Z Ilalk-a2kII Z IIclke-JU02ka3ke-j03k II Now since Z IIalke-302k -a3ke-j 3k II > E Ilalk -a3kII (with equality only when 92k = 03k) then Z Ilalk - a2kII > Z E alk - a3kII. Note that if the optimization technique in conjunction with this algorithm finds a globle minimum, this algorithm uses that as a starting point to find a point with less error of a different kind. Thus, only a local minimum of just the amplitude error under some criteria is obtained. An important variation on this technique is to try to maintain relative phases of the target field at sets of sample points, while allowing relative phases between sets to change with the new target. Doing so also will converge since the target is again closer to what the speakers can achieve and the next iteration can do no worse. 3 Psychoacoustic Factors Since the only parameter being used to generate the new target field is the possibility of changing the relative phase between distinct sets (relative phase being maintained from the original target within each seperate set, as well as the original amplitudes), the more distinct sets which can be made the better since there is greater freedom in generating the new target. There are at least three ways to seperate the sample points into sets in order to maximize the number of distinct sets, while trying to maintain the relative phases and amplitudes individual listener's ears. 1. In the situation where the audience is seated in lines, but the exact location of their ears is not known, the relative phase between these lines is not necessary to keep. 2. Since most pairs of ears are about the same distance appart, the relative phase must be perserved at a certain period along the line where ears lie rather than everywhere in the line. Each line can be broken into these sets provided that the sampling interval along that line is less than half that the average distance between a pair of ears. The number of sets per line is simply the number of samples per the average ear seperation. While this increases the number of sets, the benifit from these sets may not be very large since there is very little room to maneuver between samples. Moreover as a practical matter this may interfer with HRTFs' for localization, so the benifits here must outweight the costs. 3. Another useful way to increase the number of distinct sets is to allow breaks along the line. Since the relative phase across such a break is 'disgarded' to benifit relative phase elsewhere, chances are that the relative phase across any given break will worsen. Thus the person whose ears cross a break will get worse relative phase than otherwise. However if these breaks are chosen such that the error was very bad there anyway, then the worsening relative phase will not be a problem since there was no way that that person could have good enough relative phase to matter. (Also the points between which relative ICMC Proceedings 1996 369 Trautmann

Page  370 ï~~Table 1: Comparison of several least squares based approaches. Method lAx - yj2 (lAxi - Iy2 old-newI I0old nw straight l.s. 0.00333691 0.00221786 0.769307 0.993029 Algorithm 1 0.00114433 0.00109584 0.793131 1.13395 expanded Algorithm 1 0.00135572 0.00100008 0.76473 1.30951 phase is the worst tend to be the ones which are holding up the relative phase error at the other points. Each iteration the target field for a set is replaced with one having the best fitting phase and if the worst relative phase points are removed from a set, the maximum benifit is achieved on the rest of the set.) 4 A Simple Model A simple example of this procedure was done for a setup where the speakers are at (in arbitrary units) (25,25), (25,75), (75,25) and (75,75) with 9 sample points arranged in three rows in a square from (45,45) to (55,55), so each sample point is 5 units from its neighbors. The original source is at (100,50). The wavelength used in this example is 12 units. Several methods were applied to 'improve' the speaker reconstruction of the original sampled sound field at the sample points. Only one iteration of the above iterative methods was used. An least squares solution is taken, and compared in several ways to using several procedures. The procedures in table 1 are just the least squares solution, one application of the first algorithm where the phase of the target at the samples is replaced by the phase achieved in the least squares reconstruction while maintaining the original amplitudes, and generating a new least squares solution with this new target. The third method, labled expanded Algorithm 1, maintains the relative phase along the rows of samples, but otherwise minimizies the distance between the achieved amplitude and original amplitude as before. Other methods were also done, so the interested reader is refered to [Trautmann, 1995] for those methods outside the scope of this paper. The error was measured between the new target and the least squares approximation to that target in the second and third methods, since a comparison to the original target can be misleading for some error measures. The error measures are described in the columns of table 1. First the squared absolute difference between the most recent target and the least squares approximation to that target is shown. Next is the summed squared amplitude difference (note that the amplitude of all targets is not changed). The third column is a measure of the average absolute relative phase error along the neighboring sample points in rows only, i.e. between k = 6 pairs of sample points (relative phase error between the rows is ignored). The fourth column indicates the average absolute perceived angle of the source based on a simple model using only the relative phases between the same neighboring points within a row. A simple model of perception based on the relative phase of the paired sample locations was devised. Given two sample points distance d apart, the difference in phase between the two caused by a monopole at some distance location is due to the different path lengths to the source. If the source is far away, the distance in path length is approximately d cos(O), where 9 is the angle to the source from one of the sample points and the wavelength is wl. The difference in phase between the two samples is then dcos(O) Thus, to find wi 9 from the phases at the sample points, 01, 42 respectively, if (01 -42) = dcwl then it will be that 9 = arccos (wl(Â~ -Â~)) d It is interesting to note that not all relative phases produce 'real' perceived angles. For all possible angles, the relative phase has a certain limit, which if exceeded, causes (W I > ( giving complex number when the inverse cosine is taken. This was dealt with by just taking the absolute value of the error (as was done for all the errors), since the result is a number greater than ir which still gives some indication of how far off the result is in terms of this model of perception, since for perceivable results the error cannot exceed ir in this model. These results are quite interesting. Iterative algorithms improved the total error of column 1 (when compared with the new target) and the amplitude error of column 2. Regarding relative phase between the selected samples, Algorithm 1 which just threw out the phase information got worse, which is no Trautmann 370 ICMC Proceedings 1996

Page  371 ï~~surprise. The iterative algorithm which maintained the relative phase between these points in its new target got an improved result, but only slightly. This indicates that maintaining relative phase in the target is not enough to improve relative phase, but tends to hold it at around the same total error while improving the amplitude error significantly. In this particular case, the use of the above technique resulted in a very slightly improved average relative phase error, but the phase error got worse between four of the six pairs. For other setups it was found that the total relative phase error tends not to change much using just this 'expanded algorithm'. Finally, the perceived angle actually got worse by the greatest amount using the expanded algorithm! The other approaches here did not affect the perceived angle much. This highlights the large distinction between improving a measure of average absolute relative phase error, and average absolute perceived angle error. The situation becomes even more involved when relative amplitude is included in the model of perceived angle of arrival. 5 Conclusion The reason for maintaining relative phase among a set of points is to maintain or improve localization for each iteration. However, if just a straight least squares approach is taken, then the amplitude is the biggest beneficiary in most cases. The achieved relative phase among the sets may actually worsen, but only when the amplitude improves. Although what actually happens depends on the situation, a few examples seem to indicate that maintaining the relative phase among a set tended to maintain the overall phase error at about the same level, while allowing for improvement with respect to amplitude error. Compared with the situation where each individual sample point had its target phase replaced with its own achieved phase, then the improvement in amplitude correctness was much better, but the degradation of relative phase was much worse. Fortunately better results with relative phase and perceived angle (again based on the simple model described here) were achieved when using the iterative approach in conjunction with a weighted least squares approach which involved minimizing the change in differences between the selected pairs along the rows as well as at individual points [Trautmann, 1995]. It is hoped that these simple experiments can eventually improve the perceptual characteristics of a reproduced soundfield by closer approximating amplitudes and critical relative phases while allowing absolute phase and relative phase between unrelated parts of the soundfield to give extra freedom to the speakers. References A. J. Berkhout, "A Holographic Approach to Acoustic Control," Journal of the Audio Engineering Socitey, vol. 36, no. 12, pp. 977-995, Dec. 1988. John Chowning, "The Simulation of Moving Sound Sources," Computer Music Journal, vol. 1, no. 3, pp. 48-52, 1977. Jean-Marc Jot, and Olivier Warusfel, "Spat: A Spatial Processor for Musicians and Sound Engineers," Presented at CIARM'95, Ferrara, Italy, May 1995. Duane H. Cooper and Jerald L. Bauck, "Generalized Transaural Stereo," Presented at 93th Convention of Audio Engineering Society, San Francisco, Preprint 3401 (J2), Oct. 1992. Anthony Romano, "Three-Dimensional Image Reconstruction in Audio," Journal of the Audio Engineering Society, vol. 35, no. 10, pp. 749-759, Oct. 1987. M. R. Schroeder, B. S. Atal, "Computer Simulation of Sound Transmission in Rooms," IEEE International Convention Record, pt. 7, pp. 150-155, 1963. Steven D. Trautmann, "Some Aspects of Applying Psychoacoustic Principles to Soundfield Reconstruction," Ph.D. dissertaion, Stanford University, Dec. 1995. ICMC Proceedings 1996 371 Trautmann