Arbitrary Phase Vocoders by means of Warping

Gianpaolo Evangelista, Monika Dörfler, Ewa Matusiak

Abstract


The Phase Vocoder plays a central role in sound analysis and synthesis, allowing us to represent a sound signal in both time and frequency, similar to a music score – but possibly at much finer time and frequency scales – describing the evolution of sound events. According to the uncertainty principle, time and frequency are not independent variables so that any time-frequency representation is the result of a compromise between time and frequency resolutions, the product of which cannot be smaller than a given constant. Therefore, finer frequency resolution can only be achieved with coarser time resolution and, similarly, finer time resolution results in coarser frequency resolution.While most of the conventional methods for time-frequency representations are based on uniform time and uniform frequency resolutions, perception and physical characteristics of sound signals suggest the need for nonuniform analysis and synthesis. As the results of psycho-acoustic research show, human hearing is naturally organized in nonuniform frequency bands. On the physical side, the sounds of percussive instruments as well as piano in the low register, show partials whose frequencies are not uniformly spaced, as opposed to the uniformly spaced partial frequencies found in harmonic sounds. Moreover, the different characteristics of sound signals at the onset transients with respect to stationary segments suggest the need for nonuniform time resolution. In the effort to exploit the time-frequency resolution compromise at its best, a tight time-frequency suit should be tailored to snuggly fit the sound body.In this paper we overview flexible design methods for phase vocoders with nonuniform resolutions. The methods are based on remapping the time or the frequency axis, or both, by employing suitable functions acting as warping maps, which locally change the characteristics of the time-frequency plane. As a result, the sliding windows may have time dependent duration and/or frequency dependent bandwidth. As an example, in a constant Q frequency band allocation, the ratios of center band frequencies over bandwidth remains constant, so that the frequency bands become wider and wider as center frequency increases, similarly to the frequency distance of 12-tone scale notes or of octaves.While time-frequency allocation can be performed in an arbitrary way, the ability to reconstruct the original signal from Vocoder analysis data is essential in sound processing and transformation applications. Moreover, even the analysis or the production of spectrograms benefits from the perfect reconstruction property if one needs to be confident that no important information is hidden, which serves to completely describe the signal.

Full Text:

PDF (Italiano) PDF


DOI: http://dx.doi.org/10.13128/Music_Tec-13210



Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC-BY- 4.0)

 
Firenze University Press
Via Cittadella, 7 - 50144 Firenze
Tel. (0039) 055 2757700 Fax (0039) 055 2757712
E-mail: info@fupress.com