previous day
next day
all days

View: session overviewtalk overview

09:10-10:30 Session SMAC Papers: Voice 1


Location: 1E207
Technology-Based Real-Time Visual Feedback in the Education of Singers

ABSTRACT. Learning to sing requires the acquisition of perceptual-motor skills. The development of such skills is notably facilitated when meaningful visual feedback is provided. The current state of voice science, combined with recent technological advances have paved the way for visualization of relevant physiological and acoustical events. Nowadays, non-invasive real-time visual displays of breathing behaviors, subglottal pressure, vibratory patterns and acoustical properties of the voice are available to both teachers and voice students. In this presentation, examples of such displays and the associated technological tools will be demonstrated. For example, the RespTrack system for real-time display of abdominal and ribcage movements will be presented (Johan Stark, Columbi Computers, Sweden). The relationship between breathing behavior, lung volume and subglottal pressure will be discussed, as well as its relevance to the education of singers. Visualization of vocal fold vibratory patterns by electroglottography (EG2-PCX2, Glottal Enterprises, USA) and its application to the training of phonation types or register transitions will be presented. Also, the recently developed FonaDyn freeware will be explored for documenting singers’ development (Sten Ternström, Sweden). The usefulness of various spectrographic displays will be discussed. Finally, the possible implementation of all these means in current educational settings will be discussed.

The impact of room acoustics on choristers' performance: from rehearsal space to concert hall

ABSTRACT. While there has been extensive research on the acoustic quality of various performance spaces and concert halls, studied from the audience perspective, less work has been published on the musicians' on-stage acoustic impression and its impact on musicality and performance quality. On stage acoustic conditions vary among performance spaces, and, more often than not, between the latter and rehearsal spaces. As a result, there have been studies investigating adaptation mechanisms developed to match specific acoustic conditions during a performance. This paper discusses the potential impact of acoustic mismatches between rehearsal spaces and concert halls from the perspective of singers and choirs. Based on past research exploring the use of virtual acoustic environments, as a means for investigating this deviation and the way it affects one's performance, a tool is being designed aiming to virtually place users in various spots within a virtual choir on a virtual stage, by augmenting audio recordings with auditory spatialization and room-acoustic cues. Preliminary feedback for the need of this tool along with results from its alpha testing phase are being discussed.

Singing voice range profiling toolbox with real-time interaction and its application to make recording data reusable

ABSTRACT. The Singing Voice Range Profiling Toolbox is a software suite that provides real-time interaction for the profiling of singing voices. It utilizes sound field measurement, microphone calibration, and acoustic characteristics of the recording system to analyze and visualize various vocal parameters. The measurement of the sound recording field and background noise provide information to decide the acceptable distance for optimal performance depending on the directivity pattern and frequency response of microphones. The voice profiling includes essential parameters such as fundamental frequency (fo), sound pressure level (SPL), cepstral peak prominence (CPP), and EGG-Oq (if EGG is available). In addition, the toolbox provides real-time feedback on the analyzed characteristics, including visualizations of F1 and F2, which are essential and valuable parameters in studying singing voices.

Additionally, the toolbox has facilities for assisting in training and self-learning. The facilities allow users to gain a deeper understanding of the voice profiling process and improve their skills over time. The Singing Voice Profiling Toolbox provides a valuable tool for voice scientists, recording engineers, and singing voice educators, enabling them to make recording data reusable and further advancing the field of voice research.

11:00-12:00 Session SMAC Papers: Voice 2
Location: 1E207
Measurement of the vocal tract impedance at the mouth - from 1995 to today

ABSTRACT. The acoustic output signal of orators and singers contains information from the voice source as well as from the articulatory organs. The extraction of formants to achieve information about voice timbre and vocal-tract configuration is among the essential methods in speech and language processing as well as in singing-voice research. However, the validity of formant analysis is limited by voice features and analysis parameters. The measurement of the acoustic impedance of resonating structures allows one to obtain all relevant information from their resonances. The design of a method for impedance measurement at the human lips by Joe Wolfe and John Smith dates back to 1995, and since then variations of the concept have been applied in fundamental singing-voice research (high soprano and tenor voices, overtone singing), as well as for investigations in other domains such as voice rehabilitation. This review article collects methods and applications over the last 25 years, giving reference to the many colleagues and students who have contributed with theses and supporting work. The presentation is accompanied by a demonstration of the method.

Preliminary acoustic analysis of articulation differences in spoken and sung French language by Greek classical singers.

ABSTRACT. Singing in a foreign language can pose an important challenge to a singer. Vowel sounds that do not exist in one’s native tongue are usually the most frequent reason for sounding foreign, while the erroneous production of such phonemes might lead to crucial intelligibility issues. This preliminary study examines the degree to which the substantial knowledge of a foreign language (French) can assist a Greek-speaking classical singer in performing authentically in it. To this end, 16 male native Greek-speaking classical singers recorded excerpts from the standard French-language repertoire of their voice type. Participants provided both spoken and sung audio samples of their voice. The Formant analysis focused on 4 special cases of difficulty of the French language: vowels /ã/, /ə/, /e/, and /y/ - all foreign to the Greek language. The study revealed the special cases in which there exist pronunciation differences between speakers and non-speakers of the foreign language, as well as articulation differences between speaking and singing voices.

A pilot study of vocal vibrato incorporating nonlinear time series analysis

ABSTRACT. Research on vocal vibrato suggests that regularity and periodicity can affect its auditory perception and pleasantness. Currently, vibrato regularity has been mostly measured by analyses such as jitter, and shimmer. However, nonlinear time series analyses associated with complexity and determinism could provide further insights on its regularity and dynamics. This preliminary study investigates the application of phase spaces and recurrence plots to illustrate patterns of vibrato behaviour in solo singing, and how these dynamics can be studied using nonlinear metrics such as sample entropy and recurrence quantification analysis.

Sixty-eight vibrato notes from three operatic pieces sung by Luciano Pavarotti were analysed. Rate, extent, jitter, shimmer, sample entropy, and determinism were calculated for all notes, and phase spaces and recurrence plots created. Results revealed trends and transitions of vibrato behaviour and time-varying characteristics not observable using previous metrics. Classic nonlinear time series analyses methods seem to be promising tools to better understand characteristics of vibrato complexity, which could be valuable to pedagogy and understanding stylistic traits of different genres.

Phonation and Collision Threshold Pressures associated with Menopause

ABSTRACT. Introduction: Sex steroid hormonal variations at menopause can cause oedema of the vocal folds, and this may lead to a reduction of their mobility. This study compares phonation and collision threshold pressures (PTP and CTP, respectively) between pre (n = 26; age = 44 mean ± 3 SD) and postmenopausal (n = 27; age = 54 mean ± 3 SD) female professional voice users (FPVUs), namely teachers and singers, allocated into these two groups according to the application of both clinical and hormonal criteria. Methodology: Audio, electroglottographic (EGG) and intra oral pressure signals were recorded. Participants made repetitions of the syllable [pa] while performing a diminuendo at various pitches, thus allowing determination of the lowest pressures producing vocal fold vibration and vocal fold contact, i.e., PTP and CTP respectively. Intraoral pressure during the occlusion for the consonant [p] was accepted as a measure of subglottal pressure (Psub). Only those peaks providing good estimates of Psub were considered for analysis. Results: As data was not normally distributed, comparisons between pre and postmenopausal FPVUs was made with a Mann-Whitney test for PTP and CTP values and for each individual pitch. No significant differences were found between groups in all pitches. Discussion and Conclusions: The results suggest that the significant depletion in concentrations of oestrogens during menopause previously associated with vocal folds oedema seems not to result in impairment of vocal folds motility.

12:00-13:00Lunch Break Oktav
13:00-15:00 Session SMC Papers 1

SMC Welcome address and first plenary paper session. 

Location: Kungasalen

ABSTRACT. This article describes an interactive notation paradigm, which aids in structuring flexible performances for an arbitrary number of participants and any combination of acoustic or electronic sources. A simple system allows a ‘maestro’ to organise an ensemble and to communicate information to the members by means of an interactive window projected on a surface visible to all (performers and audience). The following text describes the motivation and design of the notation strategy, its implementation in the SuperCollider environment and discusses some compositional, performative and pedagogical issues with reference to a recent work; in this context the ‘system’ is considered to be the ‘piece’ itself.

Score-Informed MIDI Velocity Estimation for Piano Performance by FiLM Conditioning

ABSTRACT. Piano is one of the most popular instruments among people that learn to play music. When playing the piano, the level of loudness is crucial for expressing emotions as well as manipulating tempo. These elements convey the expressiveness of music performance. Detecting the loudness of each note could provide more valuable feedback for music students, helping to improve their performance dynamics. This can be achieved by visualizing the loudness levels not only for self-learning purposes but also for effective communication between teachers and students. Also, given the polyphonic nature of piano music, which often involves parallel melodic streams, determining the loudness of each note is more informative than analyzing the cumulative loudness of a specific time frame.

This research proposes a method using Deep Neural Network (DNN) with score information to estimate note-level MIDI velocity of piano performances from audio input. In addition, when score information is available, we condition the DNN with score information using a Feature-wise Linear Modulation (FiLM) layer. To the best of our knowledge, this is the first attempt to estimate the MIDI velocity using a neural network in an end to end fashion. The model proposed in this study achieved improved accuracy in both MIDI velocity estimation and estimation error deviation, as well as higher recall accuracy for note classification when compared to the DNN model that did not use score information.

Accessible Sonification of Movement: A case in Swedish folk dance

ABSTRACT. This study presents a sonification tool – SonifyFOLK – designed for intuitive access by musicians and dancers in their sonic explorations of movements in dance performances. It is implemented as a web-based application to facilitate accessible audio parameter mapping of movement data for non-experts, and applied and evaluated with Swedish folk musicians and dancers in their exploration of sonifying dance. SonifyFOLK is based on the WebAudioXML Soni- fication Toolkit and is designed within a group of artists and engineers using artistic goals as drivers for the sound design. The design addresses challenges of providing an accessible interface for mapping movement data to audio parameters, managing multi-dimensional data and creat- ing audio mapping templates for a contextually grounded sound design. The evaluation documents a diversity of sonification outcomes, reflections by participants that im- ply curiosity for further work on sonification, as well as the importance of the immediacy of the both visual and acous- tic feedback of parameter choices.

The "Collective Rhythms Toolbox": an audio-visual interface for coupled-oscillator rhythmic generation

ABSTRACT. This paper presents a software package called the "Collective Rhythms Toolbox" (CRT), a flexible and responsive audio-visual interface that enables users to investigate the self-synchronizing behaviors of coupled systems. As a class of multi-agent systems, CRT works with networks of coupled-oscillators and a physical model of coupled-metronomes, allowing users to explore different sonification routines through real-time parameter modulation. Adjustable coefficient matrices allow for complex coupling topologies that can induce a diverse range of dynamic rhythmic states and audio-visual feedback facilitates user engagement and interactive flow. Similarly, several real-time analysis techniques provide the user with visual information pertaining to the state of the system in terms of group synchrony. Ultimately, this paper showcases how parameterizing coupled systems in specific ways allows different computer music and compositional techniques to be carried out through the lens of dynamical systems-based approaches.

A Programmable Linux-Based FPGA Platform for Audio DSP

ABSTRACT. Recent projects have been proposing the use of FPGAs (Field Programmable Gate Array) as hardware accelerators for high computing power real-time audio Digital Signal Processing (DSP). Most of them imply specific developments which cannot be re-used between different applications. In this paper, we present an accessible FPGA-based platform optimized for audio applications programmable with the FAUST language and offering advanced control capabilities. Our system allows fast and simple deployment of DSP hardware accelerators for any Linux audio application on Xilinx FPGA platforms. It combines the Syfala compiler – which can be used to generate FPGA bitstreams directly from a FAUST program – with a ready-made embedded Linux distribution running on the Xilinx Zynq SoC. It enables the compilation of complete audio applications involving various control protocols and approaches such as OSC (Open Sound Control) through Ethernet or Wi-Fi, MIDI, web interfaces running on an HTTPD server, etc. This work opens the door to the integration of hardware accelerators in high-level computer music programming environments such as Pure Data, SuperCollider, etc.

A Comparative Computational Approach to Piano Modeling Analysis

ABSTRACT. Piano modeling is a topic of great interest in musical acoustics and sound synthesis. Besides challenges in modeling its mechanism, it is also difficult to understand how far the models are from the actual acoustic instrument and why. Identifying the most prominent aspects of the piano’s sound and evaluating the sound-generation fidelity of associated models are usually addressed with studies based on listening tests. This paper shows how computational methods can provide novel insights into piano analysis and modeling, which can be used to complement perceptual analyses. In particular, our approach identifies audio descriptors that present discriminative differences between types of pianos when these are excited with specific stimuli. The proposed method is used to analyze a collection of recordings from upright acoustic and synthetic pianos, excited with single-played notes, triads, and repeated notes. Results show that the sound generated by the considered types of piano presents major differences in terms of spectral descriptors and constant-Q transform coefficients.

A Real-Time Cochlear Implant Simulator - Design and Evaluation

ABSTRACT. This article describes the implementation of a flexible real-time Cochlear Implant (CI) simulator, and it's preliminary evaluation set to investigate if a specific set of parameters can simulate the musical experience through CIs using Normal Hearing (NH) subjects. A Melodic Contour Identification (MCI) test is performed with 19 NH subjects to identify melodic contours processed by the simulator. The results showed that the participants had a decrease in precision in determining musical contours as the intervals between notes decreased, showing that the reduced spectral resolution increases the difficulty to identify smaller changes in pitch. These results fall in line with other studies that perform MCI tests on subjects with CI, suggesting that the real-time simulator can mimic the reduced spectral resolution of a CI successfully. This study validates that the implemented simulator, using a pulse-spreading harmonic complex as a carrier for a vocoder, can partially resemble the musical experience had by people with hearing loss using CI hearing technology. This suggests that the simulator might be used to further examine the characteristics that could enhance the music listening experience for people using CIs.

Song Popularity Prediction using Ordinal Classification

ABSTRACT. Predicting a song's success based on audio descriptors before its release is an important task in the music industry, which has been tackled in many ways. Most approaches utilize audio descriptors to predict the success of a song, typically captured by either chart positions or listening counts. The popularity prediction task is then either modeled as a regression task, where the popularity metric is precisely predicted, or as a classification task by, e.g., transforming the popularity task to distinct classes such as hits and non-hits. However, this way of modeling the task neglects that most popularity measures form an ordinal scale. While classification ignores the order, regression assumes that the data is in interval (or ratio) scale. Therefore, we propose to model the task of popularity prediction as an ordinal classification task. Further, we propose an approach that utilizes the relative order of classes in an ordinal classification setup to predict the popularity (class) of songs. Our presented approach requires a machine learning model able to predict the relative order of two pieces of music, and hence can flexibly be applied using many types of predictors. Furthermore, we investigate how different ways of mapping the underlying popularity metrics to ordinal classes influence our model. We compare the proposed approach with regression as well as classification models and show its robustness w.r.t. different numbers of ordinal classes and the distribution of the number of songs assigned to them. Additionally, we show that, for some prediction settings, our approach results in a better predictive performance than classical regression and classification approaches, while it achieves similar predictive performance on other settings.

Sonifying energy consumption using SpecSinGAN

ABSTRACT. In this paper we present a system for the sonification of the electricity drawn by different household appliances. The system uses SpecSinGAN as the basis for the sound design, which is an unconditional generative architecture that takes a single one-shot sound effect (e.g., a fire crackle) and produces novel variations of it. SpecSinGAN is based on single-image generative adversarial networks that learn from the internal distribution of a single training example (in this case the spectrogram of the sound file) to generate novel variations of it, removing the need of a large dataset. In our system, we use a python script in a Raspberry PI to receive the data of the electricity drawn by an appliance via a Smart Plug. The data is then sent to a Pure Data patch via Open Sound Control. The electricity drawn is mapped to the sound of fire, which is generated in real-time using Pure Data by mixing different variations of four fire sounds - a fire crackle, a low end fire rumble, a mid level rumble, and hiss - which were synthesised offline by SpecSinGAN. The result is a dynamic fire sound that is never the same, and that grows in intensity depending on the electricity consumption. The density of the crackles and the level of the rumbles increase with the electricity consumption. We pilot tested the system in two households, and with different appliances. Results confirm that, from a technical standpoint, the sonification system responds as intended, and that it provides an intuitive auditory display of the energy consumed by different appliances. In particular, this sonification is useful in drawing attention to "invisible" energy consumption. Finally, we discuss these results and future work.

15:00-16:00 Session SMC Keynote 1: Miller Puckette

  KEYNOTE   Miller Puckette, University of California San Diego, USA

Location: Kungasalen
16:00-17:00 Session SMC Papers 2
Location: Kungasalen
Web Applications for Automatic Audio-to-Score Synchronization with Iterative Refinement

ABSTRACT. The task of aligning a score to corresponding audio is a well-studied problem of particular relevance for a number of applications. Having this information allows users to explore the materials in unique ways and build rich interactive experiences. This contribution presents web applications that deal with the problem by implementing a two-step synchronization process. The first step implements a score-informed alignment while the second one can be seen as a further refinement, particularly useful for a previous manual or semi-automatic synchronization. These web implementations are specifically conceived to work with the IEEE 1599 standard, which allows for multiple instances of scores and audio renderings to be mutually synchronized together. By adopting web technologies, users are not tied to any specific platform. Evaluations of the performances and current limitations of these processes will be presented.

Developing and evaluating a Musical Attention Control Training game application

ABSTRACT. Musical attention control training (MACT) is a Neuro-logic Music Therapy (NMT) technique to strengthen attention skills for people who may have attention defi-cits, for instance related to ADHD or Parkinson Disease (PD), activating different parts of the brain and stimulat-ing neural connectivity. While multiple interventions per week would enhance the effect of MACT, attending sev-eral sessions a week with a therapist can be challenging. Applied game interventions implementing MACT, which can be played at home, could offer complementary train-ing to the limited number of therapy sessions. While applied games have been shown to facilitate successful interventions for cognitive impairments, to date no game exists based on MACT. We propose a novel approach to research the plausibility of applied games to support NMT, conclude game requirements for the specific needs of People with PD (PwPD), and introduce a game that emulates a MACT session. We carried out a pilot exper-iment to gauge how users interact with the game and its efficacy in attention control training with non-PD partici-pants, letting them play 10 game intervention sessions within two weeks. Although no significant short-term attention effects were observed in this timeframe, user evaluations and metrics of game performance suggest that gamified MACT could be a promising supplement to conventional MACT for improving attention skills to optimize quality of life of PwPD.


ABSTRACT. One long-term goal of physics-based sound synthesis and audio effect modeling has been to open the door to models without a counterpart in the real world. Less explored has been the fine-grained adjustment of the constituent physical laws that underpin such models. In this paper, the introduction of a nonlinear damping law into a plate reverberation model is explored, through the use of four different functions, transferred from the setting of virtual-analog electronics. First, a case study of an oscillator with nonlinear damping is investigated. Results are compared against linear dissipation, illustrating differing spectral characteristics. To solve the systems, a recently proposed numerical solver is employed, that entirely avoids the use of iterative routines such as Newton-Raphson for solving nonlinearities, thus allowing very efficient numerical solution. This scheme is then used to simulate a plate reverbation unit, and tests are run, to investigate spectral variations induced by nonlinear damping. Finally, a musical case is presented that includes frequency-dependent damping coefficients.

Ding-dong: Meaningful Musical Interactions with Minimal Input

ABSTRACT. Digital Musical Instruments have given us the power to create unique musical systems of performance, often for people with no musical experience. The prevalence of gestural interfaces with a high number of parameters and limitless mapping possibilities has blossomed in this context. Yet this same flexibility at times leads to creative paralysis, contrary to the presentation of these interfaces as transparent vessels for untapped musical imaginations.

This paper outlines a new work, 'The Doorbell', created to investigate how minimal input might produce meaningful musical results. Building on work around constrained interfaces and one-button controllers, the work affords a fun, performative musical experience using only the input of a single button, encouraging anyone to discover and perform a surprising depth of musical possibilities through a household object.

By stripping back input variables and taking advantage of natural musical affordances of the doorbell, 'The Doorbell' questions what elements of the interface offer new areas of exploration for DMIs more generally, and how musical narrative and precomposition are contributing factors to a meaningful musical interaction.

17:30-18:30 Session SMC Concert 1


Note: For the exact times of the pieces, please refer to the concert program.

Location: Lilla Salen
Matters 5

ABSTRACT. Matters 5 is a fixed media composition for 8 to 24 speakers. The piece formally consists of two parts, algorithmically generated with similar granular synthesis based on two short double bass samples. The scattered groups of grains – statistically distributed following combinatorial rules – have been processed by 24 reverb units with independent and partially extreme parameter changes. The intended room impression is one of surreality and blurring, corresponding to the formal development.

Stoppages Vol. 1

ABSTRACT. ‘Stoppages Vol. 1’ is a collection of unedited recordings synthesized by an algorithm that produces streams of numbers that can approach or reach infinity via divide by zero errors. Due to the physical limitations of modern silicon microprocessors, computers cannot produce values of infinite size. With 32-bit processors (the most common as of now), the largest value a CPU can produce is 2^32. When this maximum value is approached, met, or surpassed, the computer (being the 100% deterministic machine that it is), doesn’t know what to output. It has reached a state of not-knowing and abruptly reaches a stoppage. What you’re hearing is indeterminacy emanating from the CPU’s physical inability to reproduce this audio data, not computer generated pseudo-randomness. This is similar to a record player’s inability to accurately play back a groove in a vinyl record that is impeded by static or dust; to a tape deck’s inability to accurately play back a cassette with mangled magnetic particles; to a CD player’s inability to accurately play back a scratched disc. The failures of vinyl, tape, and CD expose the physical materials that contain the sonic objects we are intended to hear by replacing them with auditory defects and malfunctions. In the case of this music, fractures and discontinuities in the digital medium are amplified and brought to the ear’s forefront. The algorithms executed in this music expose the material collapse of a silicon microprocessor pushed to its limit.

Windmills of Lapua

ABSTRACT. “Direct impacts of wind farms can include collision and barotrauma (damage to tissues from air pressure changes around turbines); indirect impacts can include habitat loss (roosts, commuting routes and foraging areas) and fragmentation.” –

This piece comments on the impact of wind farms on the bat population. It uses field recordings of windmills taken in Lapua, Finland using microphones that pick up normally inaudible frequencies (such as contact and electromagnetic mics), combined with various field recordings of bats.

Originally, the piece explores the use of field recordings with the 3D IKO speaker and investigates how a sense of place can be created using a speaker that projects sound from the inside outwards using the sound reflections of the performance space. This work was made possible thanks to the Develop your Creative Practice grant from Arts Council England.

Status I

ABSTRACT. The work is a study on spatialisation techniques and consists of nine stereophonic tracks and fifteen statuses. The horizontal structure is given by the succession of statuses representing different perspectives of a sound sculpture (never presented in its entirety). Like a succession of images that, through different perspectives, show a set of details of a marble sculpture. Similarly, the statuses consist of a vertical layering of sound elements (one per track) frozen in time and determining the sound sculpture from certain listening points. The performer's function is to experience each sound status through diffusion and spatialisation in space (spatialisation as a form of augmented listening). The performer must choose the spatial interpretation of each status, the spatialisation strategy to be applied and the sound system in which to perform the piece.

note: the uploaded track is a demonstration example of the interpretation/spatialization of the work through a binaural system (Ambisonic ToolKit ATK for Reaper with HRTF Cipic 0021 - KEMAR Large Pinnae Dummy).