Abstract
Concatenative sound synthesis is a promising method of musical sound synthesis with a steady stream of work and publications for over five years now. This article offers a comparative survey and taxonomy of the many different approaches to concatenative synthesis throughout the history of electronic music, starting in the 1950s, even if they weren't known as such at their time, up to the recent surge of contemporary methods. Concatenative sound synthesis methods use a large database of source sounds, segmented into units, and a unit selection algorithm that finds the units that match best the sound or musical phrase to be synthesized, called the target. The selection is performed according to the descriptors of the units. These are characteristics extracted from the source sounds, e.g. pitch, or attributed to them, e.g. instrument class. The selected units are then transformed to fully match the target specification, and concatenated. However, if the database is sufficiently large, the probability is high that a matching unit will be found, so the need to apply transformations is reduced. The most urgent and interesting problems for further work on concatenative synthesis are listed concerning segmentation, descriptors, efficiency, legality, data mining and real time interaction. Finally, the conclusion tries to provide some insight into the current and future state of concatenative synthesis research.
Acknowledgements
Thanks go to Matt Wright, Jean-Philippe Lambert, and Arshia Cont for pointing out interesting sites that (ab)use CSS, to Bob Sturm for the discussions and the beautiful music, to Mikhail Malt for sharing his profound knowledge of the history of electronic music, to all the authors of the research mentioned here for their interesting work in the emerging field of concatenative synthesis, and to Adam Lindsay for bringing people of this field together.
Notes
2This excludes the many existing web collections of sounds accessed by a search term found in the title, e.g. http://sound-effects-library.com
3Concatenative speech synthesis techniques are directly used for singing voice synthesis in Burcas (http://www.ling.lu.se/persons/Marcusu/music/burcas), Flinger (http://www.cslu.ogi.edu/tts/flinger), and abused in http://www.silexcreations.com/melissa
6For instance, Nemesys, the makers of Gigasampler,7 pride themselves on having sampled every note of a grand piano in every possible dynamic, resulting in a 1 GB sound set.
24Examples can be heard on http://www.ircam.fr/anasyn/concat
28The license type of each unit should be part of the descriptor set, such that a composer could, e.g. only select units with a license permitting commercial use, if she wants to sell the composition.
29For instance, one particular difficulty is that in real-time synthesis, the duration of a target unit is not known in advance, so that the system must be capable of generating a pleasing stream of database units as long as there is no user input.