Revista eletrônica de musicologia

Volume XII - Março de 2009

home . sobre . editores . números . submissões



Information Computing Technology (ICT) for Music Composition and Seamless Performance Interfaces


Reinhold Behringer, Adam Stansbie, Nikos Stavropoulos, Michael Ward*




The rapid advances of computing technology allow novel ways of generating music, applied in the processes of composition, production, and performance. These novel technologies enable a more seamless interaction of the musician with the computing system, allowing a more direct translation of the artistic intentions into music output. Examples of such technologies are computer vision, computer graphics, and augmented reality. Two areas of development are hereby key to the implementation of such seamless interaction: interfaces which follow closely the conventional workflow and interaction paradigms employed by musicians, and improved machine intelligence which allows preempting user's action and raising the status of the machine to a type of partnership. This article highlights a few examples of such novel technologies and their application in music, with an outlook into the future possibilities.

1. Introduction

Information computing technology (ICT) is taking a more prominent role in the music community. The power of affordable computing hardware and the advancement of algorithms for machine intelligence have allowed the development of more sophisticated human-computer interaction paradigms, shifting the work load from the musician to the computing system. The goal of these developments is to allow musicians to interact with the computing system in a way that is similar to the traditional work flow, but incorporating the new possibilities of ICT. Ultimately this would allow the musicians to implement their conception of the musical ideas or interpretation in the most intuitive and natural way.

A large amount of research and development has been done within the past decade, pushing the boundaries of the creative possibilities. Since 2001, the annual International Conference on New Interfaces for Musical Expression (NIME) has become a forum for discussing an presenting the latest advancements in new interfaces for music [1] . The creation of novel compositions has been made possible, and machine intelligence has raised the status of ICT systems closer to towards becoming a partner in the musical process rather than being a simple utilitarian device. These technologies have a deep impact on the teaching of music, as they allow a novel interaction with music and can hereby deepen the understanding of musical fundamentals.

1.1. Technologies

ICT systems can either emulate existing paradigms of interaction with music instruments by replicating the way in which conventional instruments are being played, or can use completely new ways of interaction which were not know previously. The replication of existing instruments can, for example, result in a flute-like device, which can be played like a real flute, including the emulation of the effect of partially covered holes [2] .

On the other hand, novel technologies can expand the possibilities of the musicians by providing new ways of getting their artistic intentions into the composition of performance. Computer graphics, for example, can provide guidance and feedback for the musical interpretation. Tracking technologies (magnetic, RF, computer vision) can be used to obtain users' input without any haptic interaction but solely through gestures, which is a very intuitive way of shaping musical characteristics. Audio processing can be used to follow a live performance and enable a music instrument to be played automatically in a music ensemble with human players. For the end user (listener of recorded music), audio processing algorithms can automatically classify music by style and similarity, so that the listener can get an aid in selecting a given musical preference without explicit knowledge of the title of the music works.

Computing technology offers a vast number of opportunities for the musicologist who wishes to conduct analysis of musical works. Automatic music classification, feature analysis, beat tracking and score following may facilitate musical analysis across a broad spectrum of musical styles. One example is the detection of melody in polyphonic audio [27] . However, such techniques are typically used to handle information pertaining to more “traditional” forms of music - works that contain a fixed meter, tonal content, harmonic structure, and so on. Paradoxically, the many changes in the application and availability of sound technologies have engendered composers with a pallet of sonic materials and possibilities that go beyond the familiar pitch-based models. By interfacing with computers, the modern composer now grapples with an infinite spectrum of sonic possibilities and thus the traditional methods of composing and performing music have been radically extended [22] .

1.2. Applications

The interaction of musicians with ICT systems can be targeted to the control of a single musical instrument or a whole ensemble of instruments. It can be for the purpose of creating a composition, for performing an existing composition, or for live improvisation.

Applications for musical professionals often employ gesture tracking to shape the music creation, as for example done in the conductor's jacket project [3] . For non-professional musicians, ICT offers new interactive way of learning about music and instrument playing. Augmented Reality, for example, has been applied in systems for learning to play guitar [4] [5] . Computer graphics, motion capture, and high-speed networking has been merged to build a multi-media learning environment for remote collaboration and performance [29] . An installation of a system where users can conduct an orchestra using a magnetically tracked baton has been set up at the Vienna House of Music [6] . The creation of music can be used for therapeutic / rehabilitation purposes, as shown in the system “Music-Maker” developed by Boston University [7] .

This paper will describe some of the ICT key concepts employed for composition and performance and will highlight a few examples of applications.

2. Sensing the Musicians' Intention

A large variety of sensors is available to capture the immediate input of a musician's actions into an ICT system: haptic sensors for sensing finger pressure or breath blow, visual sensors (cameras) for obtaining visual input, magnetic or RF sensors for using tracking devices in a specifically set-up infrastructure. A key issue is hereby how these data can be mapped onto parameters shaping the resulting music. There are a number of elementary parameters such as pitch and volume (loudness), but musical instruments have a large variety of other parameters, which in “real” acoustic music instruments are often linked with each other: higher harmonics of the sound (varying with volume and pitch, but also controllable by human input), attack/sustain/decay curve of the sound.

2.1. Parameter Mapping

In an electronic music instrument, the possible user inputs are translated into sounds. Since the input can be parametrised in various ways in order to influence the sound creation, a mapping is required to determine how the sounds are generated as a consequence of the user's input parameters [8] which often come from sensor data acquisition. A general concept of the music parameter space has been shown by Malinkowski [9] : the three main areas of musical domains are frequency, time, and dynamics.

For more complex instruments or instrument clusters, the mapping becomes even more important and influential in the sound generation process. Simple control hierarchies of the mapping, however, become less useful, as the large possible options resulting from many input parameters are very difficult to control. Chadabe has argued for combining coordinated controls of multitudes of variables [10] to generate simple interaction. This is supported by the result of the development of a system in which the inputs of the user were mapped to a group of parameters: it was found that such a mapping provided much more pleasing end engaging experience to the users than if each control would have to be manipulated separately [11] . There is a discussion about how useful a fully deterministic mapping is [10] for advanced innovative music generation approaches.

A very specific interface to music expression has been developed in the Expression Synthesis Project (ESP) at USC [14] : the metaphor of driving a car is applied in a system in which a simple vehicle driving simulator is used to control music performances. An intuitive operation is possible, linking throttle/break to tempo, dynamics by the resulting acceleration, and an independent controller for articulation. On a monitor, the music itself is represented as a road ahead, with road bends indicating parts of expected slower tempi, and the driving through is the actual performance.

2.2. Applications for Non-Musicians

In order for non-professional musicians to be able to play music with an electronic music system, the system needs to provide the right balance between automatically playing (for a satisfactory and rewarding user experience) and user control. In the simplest cases, this user control allows to determine the tempo of a pre-recorded experience.

Such a system which allows non-musicians to conduct an orchestra is the “Personal Orchestra” installation in the Vienna “House of Music”. With a magnetic baton, the users indicate the tempo and the expression of the “performance”, which is an actual orchestral recording, being played-back modified through acoustic time stretching of the recording [17] .

A system for therapeutic purposes for patients with limited motoric dexterity requires to be designed for very limited user input. The system “Music-Maker” by Boston University [7] has been developed with this limitation in mind. Computer vision ( EyesWeb [13] ) is used to convert the motion of the users into control of musical instruments.

3. Music Generation

3.1. Composition

The power of ICT systems can be harnessed in creating composition that are using material and methods not possible without computers.

One example of a composition involving ICT at all stages of the compositional and performance process is “Heat” for piano and live electronics (2002) by Nikos Stavropoulos.

The starting point for this work was a single field recording of a damaged heating system. With pipes burst by frost, air leaked into the circuit when the system was turned on. As this air made its way through the water in the pipes an interesting sound phenomenon was set in motion. This exhibited an intricate rhythmical profile, and melodic contour. The composer's aim was to translate the erratic behaviour of this corrupted system into a composition that would display similar organisational principles. The piano was preferred as it could accommodate and complement the percussive and resonant character of the original timbre.

The original recording does not appear in the final piece, but rather was processed with Max/MSP in order to extract rhythmic and melodic information. This process was based on two Max objects developed by Miller Puckette at the University of California : Fiddle – an object for pitch following and sinusoidal decomposition; and Bonk : a percussion follower object.

These were used to track the frequency curve and attacks, respectively, of the original material, and transform this data into MIDI notes. The resulting MIDI file provided the score for the piece. An upper and lower threshold can be set in the Bonk object to set an input amplitude range. An attack is triggered only when the signal amplitude rises over the upper threshold and then drops below the lower one. A wider threshold range would be less sensitive to variation in amplitude in the signal, generating a smaller number of attacks, whilst a narrow range would be more responsive to subtle amplitude variations in the input signal. The upper and lower threshold settings were initially set to 71 and 41, respectively. The high threshold setting is reduced by 1 every six seconds, while the low threshold is reduced by 10 every two minutes. When the value for the high threshold reaches 10 the process is reversed. The result of this is a linear increase of the density of triggered attacks peaking half way though the piece before it linearly decreases to the end.

The resulting MIDI file was imported into the Sibelius scoring software and was edited in order to achieve greater musical sensibility whilst maintaining the character of the original recording. During performance, the sound of the piano is fed through microphones to the computer where it undergoes transformation through an array of GRM plug-ins running in Pro-Tools. A number of parameters are automated and change gradually over time, causing variations in the timbral output of the process. The processed material is mixed with the amplified piano signal and presented over loudspeakers. The process, whose intention is to exhibit an explicit relationship between the piano performance and the electronics, does not use buffers or extended delays. The abstract sound world is intended as an extension of the piano sound and is invigorated by it.

Structural thinking and generation of material are brought together in the application of the process used to extract the piano part from the original recording. The overall structure of the work is exclusively dependent on the process and the input file and is not dictated by intrinsic sonic characteristics and issues of formal and dramatic structure. Although the material that derived from the process described above was somewhat modified, the nature of the composition was maintained to reflect the structure of the specific sonic scene depicted in the input file.

The variations in timbre generated by feeding the signal through the automated set up of the GRM plug-ins does not inform the structural development of the work but simply provides a reflection of every event in the structure on a different realm. In this case, the dynamic spectromorphology of the material is not a carrier of structure but provides a timbral extension of the performer's gestures.

The starting point for this particular work was the idea of a live performer and live electronics environment where the relationship between these two elements is transparent and accessible by the listener at all times. Although this is achieved throughout the length of the work, the musical structure, carried by the piano part, does not reveal the method of organisation , as this is not directly related to the sound world of the work but integral to the processes involved in the generation of the score. Here, auditory perception is accommodated in the moment form and large-scale character / structure consistency rather than in dialectic, linear musical argument. The world of live signal processing in the context of a musical work presents a challenge when it comes to forming relationships between structure and material as a third dimension is added to the equation that directly affects the first two, that of real time electronic sound manipulation. In the case of Heat, real time signal processing is accommodating the material and the listener can always relate the acousmatic world to the performer's actions but is independent of structural functions.

3.2. Performance

Music and sound can be created by emulations of analogue synthesizers or more complex computational algorithmic sound generation. The performance of such electro-acoustic music can be realized as a live generation from synthesizers or samplers. This allows in principle additional input and interaction through performers or the audience. The creation of ‘tape' music, or acousmatic music, enables a composer to fix structural elements that are too detailed or precise to be realized in real-time. It has been argued that in such cases the performance takes place in the recording studio at the moment that the composer interacts with the various technologies [24] . Once complete, such works are typically presented over a series of loudspeakers (sound diffusion), thus enabling the composer to reinforce the performance gestures as they were created in the studio. As Smalley states [23] :

“ it is a question of adapting gesture and texture so that multi-level focus is possible for as many listeners as possible…In a medium which relies on the observation and discrimination of qualitative differences, where spectral criteria are so much the product of sound quality, the final act becomes the most crucial of all” [23]

In addition to synthesis techniques, computer music can be generated through the joint processes of recording and re-contextualizing sound materials; this contrasts with more traditional compositional methods of notating musical elements. The composer of computer music is able to concentrate upon those sonic details that are beyond the control of the instrumental composer; this would not be possible without the development of computational processes that provide access to the internal architecture of sounds [25] .  More recently, rather than merely focusing (or trying not to focus) upon the referential nature of recorded sound, composers have been working to construct new sounds by selecting and combining significant features from a range of diverse sonic materials. The resulting musical materials are often heterogeneous in character, since they may contain elements that are not reconcilable without such technological intervention. Ten Hoopen [22] argues that this ability to disguise and re- contextualise sound materials is the most salient feature of electroacoustic music and one which is often further emphasized through an absence of visual clues or physical manifestations.

Sonic features can be extracted and then combined, or replaced, to synthesize new sounds that have indeterminate origins. One may, for example, wish to retain the gestural, morphological character of one particular sound and impose it upon another sound that has a distinctive spectral makeup (gestural imposition). Alternatively, one may wish to extract the spectral components of one particular sound and merge these elements with the spectral components of a different sound thus augmenting its timbral character (spectral merge). The computer offers various new methods of interpolating diverse materials; dynamic interpolations enable a particular sound to gradually transform into a different sound over a specified period of time. Alternatively, static interpolations enable a similar transformation where the characteristics of a sound gradually alter over a series of controlled iterations [26] .

Even today, many sound-processing techniques are based upon manipulations of sound materials in the ‘time-domain'; these are manipulations that affect how the sound is represented on a temporal lattice and include splicing, reversing, re-ordering and stretching sounds. By conducting a Fourier Transform, a computer can represent any waveform as a sum of elementary sinusoidal waveforms and thus facilitate sound manipulations that are not based upon time-domain manipulations, but upon spectral-domain manipulations. . Processing sound in the spectral domain gives the composer control over an increasingly broad set of parameters; one can now analyze the spectral data and initiate convolutions and interpolations based upon these parameters.

In his 2005 work, “Isthmus”, Adam Stansbie demonstrates a clear example of both feature extraction and re-synthesis of sound materials. The entire work, consisting of three movements, is derived from solo and ensemble performances of various stringed instruments. During the first movement, the sonic materials maintain a close relationship to the timbre and performance techniques of these instruments, extending both natural morphologies and textures, yet often regressing to reveal the source recordings. On these occasions, the physical performance spaces, and the performer's position within it, can be identified along with clearly delineated references to contemporary string music.

During the second movement, departures from recognizable points of reference occur as sound transformations become more complex. During this period, the instruments cease to occupy a determinate region in space and gradually become less recognizable. The gestural features that were evident in the first movement are gradually extracted and replaced with alternative gestures to create percussive and sometimes metered morphologies that are only just reminiscent of the original source sounds.

The third movement removes most of the remaining references to the source materials. In addition to the removal of gestural characteristics and spectral information, new sounds have been forged by combining certain features that were extracted from the original source materials. The title, “Isthmus”, may be viewed as a metaphor referring to Stansbie's attempts to connect instrumental and computer music through the process of sound transformation.

4. Score Following

A computing system which is supposed to allow interaction in the context of live polyphonic music performances needs to be aware of either the current harmonical /musical context (live improvisation) or the current location in the music score (replay of a pre-scored music piece).

Many musicians nowadays use an interesting combination of computer-generated sounds, samples and loops combined with the more ‘traditional' acoustic and electro-acoustic instruments. While these computerised techniques create musically and technologically engaging compositions in the studio or in the home studio, it can be difficult to recreate these ideas in the live environment. The main issue relates to the tempo of the piece and the ability to alter the arrangement or ‘feel' of the piece in the live environment. With traditional band setups using ‘traditional' instruments, the musicians typically react to a drummer who is usually responsible for maintaining tempo. If drummers were to play with sampled data they would have to ensure that they were playing at the tempo at which the sample was originally recorded to maintain the arrangement of the song. This is especially true of more layered compositions using many samples which may be controlled by MIDI . In order to play at the correct tempo, the drummer would either have to play to a tempo-click through headphones or have some visual indicator of the tempo. This can be a restrictive method as the drummers would then concentrate on the tempo indicator rather than the performance.

This problem can be solved by developing acoustic pitch and beat tracking, which captures the audio of a human musician ensemble through a microphone and is able to determine at which part of a score the ensemble is currently playing.

Much work has been done on monophonic or instrument specific pitch tracking yet pitch tracking of an ensemble remains an area of future research.

There are a number of software tools for calculating the tempo of a music piece during composition and production (e.g. from MOTU), but these operate retrospectively (they scan stored, pre recorded audio to determine the tempo). Real-time manipulation of speed and pitch can be achieved using current applications so that frequency-based characteristics of the samples can be preserved while time based parameters are changed so that the compositional structure and key remains.

One of the challenges in this acoustic score following is that musical styles often have a large degree of variation of beat patterns – complex patterns and syncopic beats cause problems in simplistic approaches to this problem. Real time beat tracking is an area of current research with some success achieved with musical style specific algorithms [30] .

Research activities have also addressed the issue of developing score following for non-instrumental music, e.g. in a song [28] .

5. Augmented Reality (AR)

Augmented Reality (AR) is a concept that merges computer-generated output with the user's perception of the real environment in such a way that the computer-generated rendition appears to be merged with the actual spatial environment. This technology has many applications in the industrial field, but also applications in the music field can be envisioned.

Head-worn displays require tracking and might be considered intrusive by the musician. Projective AR systems can seamlessly place information onto the actual music instrument, but also require tracking of the instrument, unless it is statically fixed in the environment.

A very straight-forward application of AR is to teach how to play an instrument: an AR overlay can project information into the players field of view to give guidance on playing a music piece by highlighting positions of fingers on the instrument. Examples of this application are systems for teaching guitar playing [4] . Visual markers placed on the instrument serve as indicators for the instrument tracking, and the system. The tracking precision can be improved by using natural features of the guitar, as proposed by Motokawa and Saito [5] .

A system which supports the composition process with AR methodology has been developed by Berry et al. [12] . On a tabletop, musical patterns represented by cards can be moved and arranged by the composer to create an array of those elementary patterns. Computer vision methods are applied to track these cards, and the music is created and played back based on the motion and position of these cards. This interaction allows an experience of musical structure as a tangible space, in which physical and visual cues complement the experience of the produced music.

6. Summary and Outlook

The recent ICT developments and advances re. power , computing performance, software systems, and interaction approaches have opened new possibilities for both music composition and performance. The potential of these technologies for the creative process is significant, as they will enable more complex musical tasks to be performed by computers.

The impact of these technologies will contribute to the computer becoming a partner in human ensembles, being able to follow musical score as the human musicians play their part. The computer as a musician will be able to play its part in synchronous play with human players and adapt its playing to subtle tempo and expression chances. This will remove the need for a human player to monitor and trigger synthesizers and will generate a completely new musical performance experience, where “artificial musicality” can complement human musicianship. On the composition side, there can be more complex paradigms put into the musical structure, allowing the composer to create musical textures of high complexity.

7. References

[1] International Conference on New Interfaces for Musical Expression (NIME). <accessed 10.Sept.07>

[2] Menzies , Dylan, Howard, David. “ Cyberwhistle – An Instrument for Live Performance”. Colloquium on Musical Informatics XII , Sept 1998.

[3] Marrin-Nakra , T. Inside the Conductor's Jacket: Analysis, Interpretation and Musical Synthesis of Expressive Gesture . Ph.D. Thesis, Media Laboratory. Cambridge, MA, Mass. Inst. of Technology.

[4] Cakmakci , Ocan ; Bèrard , François; Joëlle Coutaz . “ An Augmented Reality Based Learning Assistant for Electric Bass Guitar.” In Proc. of 10th International Conference on Human-Computer Interaction , 2003

[5] Motokawa , Yoichi; Saito, Hideo. “Support System for Guitar Playing using Augmented Reality Display”. 5th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR06) , pp.243 - 244, Oct. 2006

[6] Virtual Conductor (Personal Orchestra). <Sept. 2007>

[7] Gorman, M., Betke , M., Saltzman, E., and Lahav , A., Music Maker – A Camera-based Music Making Tool for Physical Rehabilitation, Boston University 2005, , <Sept. 2007>

[8] Wanderley , M. 2002. "Mapping Strategies in Interactive Computer Music." Organised Sound , 7(2):83-84

[9] Malinowski, S. “The Conductor Program – computer-mediated musical performance.” <Sept. 2007>

[10] Chadabe , Joel. “ The Limitations of Mapping as a Structural Descriptive n Electronic Instruments.” In Proc. of the 2002 Conference on New Instruments for Musical Expression (NIME-02), Dublin , Ireland , May 24-26, 2002

[11] Hunt, Andy, Wanderley , Marcelo, Paradiso , Matthew. “ The Importance of Parameter Mapping in Electronic Instrument Design .” In Proc. of the 2002 Conference on New Instruments for Musical Expression (NIME-02), Dublin , Ireland , May 24-26, 2002

[12] Berry , R; Makino, M; Hikawa , N; Suzuki, M. “The augmented composer project: the music table.”
In Proc. of 2nd IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR), 2003. 7-10 Oct. 2003 Page(s): 338-339

[13] A. Camurri , B. Mazzarino , G. Volpe. “ Analysis of Expressive Gesture: The EyesWeb Expressive Gesture Processing Library ,” in A. Camurri , G. Volpe (Eds.), “Gesture-based Communication in Human-Computer Interaction” , LNAI 2915, Springer Verlag , 2004. <Sept. 2007>

[14] Chew, E., Liu, J., François , A.R.J. “ ESP: Roadmaps as Constructed Interpretations and Guides to Expressive Performance”. 1st ACM Audio and Music Computing Multimedia Workshop, 2006, Santa Barbara, pp. 137.

[15] Borchers , J, Marrin , T.M. “You're the Conductor”. 2003.

[16] Lee, Eric; Marrin-Nakra , Teresa; Jan Borchers . “You're the Conductor: A Realistic Interactive Conducting System for Children.”  In NIME 2004 International Conference on New Interfaces for Musical Expression , Hamamatsu , Japan, June 2004 .

[17] Jan O. Borchers , Wolfgang Samminger and Max Mühlhäuser : Conducting A Realistic Electronic Orchestra. UIST 2001 14th Annual Symposium on User Interface Software and Technology , Orlando , FL , November 2001

[18] Wanderley , M. M. and Depalle , P., Gestural control of sound synthesis, Proceedings of the IEEE, vol. 92, pp. 632- 644, 2004.

[19] Scherteinlaib , S., Gutierrez, M., Vexo V., Thalmann , D. “Conducting a Virtual Orchestra”. IEEE Multimedia , July/September 2004 (Vol. 11, No. 3)   pp. 40-49.

[20] Puckette , M. “Real-time audio analysis tools for Pd and MSP”. Proceedings   of   the International Computer Music Conference ,  University of Michigan , 1998, Article available at: < Sept 2007>.

[21] Stavropoulos, N. PhD thesis, The University of Sheffield. 2005.

[22] Ten Hoopen , C. Perceptions of Sound. Source, Cause and Human Presence in Electroacoustic Music . . PhD dissertation. Amsterdam : University of Amsterdam . 197.

[23] Smalley, D . 1986. “ Spectro -morphology and structuring processes.” In S. Emmerson ( ed .). The Language of Electroacoustic Music . London : Macmillan.

[24] Harrison, J. 1999. “Diffusion: theories and practices, with particular reference to the BEAST system.” In.Chuprun (ed.) eContact 2.3 [online]

[25] Wishart , T. 1996. On Sonic Art. Amsterdam : Harwood, p.8.

[26] Wishart , T. 1994. Audible Design. York : Orpheus the Pantomime, p. 132

[27] Paiva , R.P. Melody Detection in Polyphonic Audio . PhD thesis, U. of Coimbra , 2007.

[28] Puckette , Miller. “Score Following Using the Sung Voice.” ICMC Proceedings, 1995.

[29] Khan, Ali; Ong, Bee; Ng, Kia; Bellini, Pierfrancesco; Nesi, Paolo and Mitolo, Nicola. “ Using 3D Visualisations of Motion Data for Collaborative Multimedia Music Learning and Playing,” in Proceedings of COST287-ConGAS 2nd International Symposium on Gesture Interfaces for Multimedia Systems (GIMS) 9-10 May 2006, Leeds, UK

[30] Collins, N. (2006) Towards a Style-Specific Basis for Computational Beat Tracking. In: Proceedings of the 9th International Conference on Music Perception & Cognition. ICMPC and ESCOM. Bologna , Italy , pp. 461-467. ISBN 8873951554

*Leeds Metropolitan University {r.behringer, a.stansbie, n.stavropoulos, m.ward}