previous day
all days

View: session overviewtalk overview

09:00-11:00 Session SMC Papers 6
Location: Kungasalen
Sound Design Strategies For Latent Audio Space Explorations Using Deep Learning Architectures
PRESENTER: Kıvanç Tatar

ABSTRACT. The research in Deep Learning applications in sound and music computing have gathered an interest in the recent years; however, there is still a missing link between these new technologies and on how they can be incorporated into real-world artistic practices. In this work, we explore a well-known Deep Learning architecture called Variational Autoencoders (VAEs). These architectures have been used in many areas for generating latent spaces where data points are organized so that similar data points locate closer to each other. Previously, VAEs have been used for generating latent timbre spaces or latent spaces of symbolic music excepts. Applying VAE to audio features of timbre requires a vocoder to transform the timbre generated by the network to an audio signal, which is computationally expensive. In this work, we apply VAEs to raw audio data directly while bypassing audio feature extraction. This approach allows the practitioners to use any audio recording while giving flexibility and control over the aesthetics through dataset curation. The lower computation time in audio signal generation allows the raw audio approach to be incorporated into real-time applications. In this work, we propose three strategies to explore latent spaces of audio and timbre for sound design applications. By doing so, our aim is to initiate a conversation on artistic approaches and strategies to utilize latent audio spaces in sound and music practices.

A real-time cent-sensitive strobe-like tuning software based on spectral estimates of the Snail-Analyser

ABSTRACT. This paper presents a real time software to tune musical instruments. The visual rendering emulates a tuner commonly used by the music industry, namely, the strobe-tuner. In the classic case, a tuned note is characterised by the immobility of a dial: through a stroboscopic effect, the rotation speed of the dial is made proportional to the deviation in Hertz between the target pitch and the played note. Here, we propose to use spectral estimates of the Snail-Analyser to have a similar rendering with respect to the deviation in cents, with an adjustable sensitivity. These estimates are derived from the demodulated phase calculated by Fourier analysis. The software allows the choice of the targeted frequencies for each note, according to musical considerations such as temperaments, musical modes, but also octave stretching, etc. It has been prototyped using the MAX software and developed with the Juce framework to target both desktop and mobile environments.

DJeye: Towards an Accessible Gaze-Based Musical Interface for Quadriplegic DJs

ABSTRACT. Despite the recent advancements in the development of accessible musical interfaces for individuals with limited motor capabilities such as quadriplegia, DJ-ing still remains a relatively inaccessible musical activity. To address this issue, we propose the design and implementation of DJeye, an eye tracking-based software interface which allows typical basic mixing operations such as crossfading, filtering, looping, track seeking, and more. The interface is founded upon established design principles for gaze-based musical interfaces, and introduces specific eye interaction methods involving winking. Although currently in a prototype stage, we conducted case studies to evaluate the proposed interaction methods and explore which functions may be of interest to end-users. The study was conducted with the participation of amateur DJs, who were subjected to interviews, think-alouds, and questionnaires. The results of the case studies are analyzed in the paper to provide insights into future directions and developments.

Principal Component Analysis of binaural HRTF pairs

ABSTRACT. Principal Component Analysis (PCA) has been often used for HRTF compression and individualization. Most commonly, when creating the PCA input matrix, each ear is handled as an independent observation which essentially doubles the available observations but also the principal component weights that need to be calculated in order to reconstruct the complete dataset. It would therefore be interesting to investigate the extent to which they can be handled jointly when creating the HRTF input matrix. Here, we explore one way to handle this possibility by comparing the standard method of handling ears in the PCA input matrix to a variation in which ears are handled jointly. We performed simulations using three different HRTF databases involving linear and logarithmic HRTF magnitude spectrum and calculated the number of components required to explain 90\% variance and the measure of spectral distortion. Results show that the proposed approach is not as efficient in terms of the number of components required, however, spectral distortion is not affected as much with the alternative representation. Furthermore, the resulting components provide insight into how the HRTFs of the two ears are related.

Multi-Source Contrastive Learning for Musical Audio

ABSTRACT. Contrastive learning constitutes an emerging branch of self- supervised learning that leverages large amounts of unlabeled data, by learning a latent space, where pairs of different views of the same sample are associated. In this paper, we propose musical source association as a pair generation strategy in the context of contrastive music representation learning. To this end, we modify COLA, a widely used contrastive learning audio framework, to learn to associate a song excerpt with a stochastically selected and automatically extracted vocal or instrumental source. We further introduce a novel modification to the contrastive loss to incorporate information about the existence or absence of specific sources. Our experimental evaluation in three different downstream tasks (music auto-tagging, instrument classification and music genre classification) using the publicly available Magna-Tag-A-Tune (MTAT) as a pre-training dataset yields competitive results to existing literature methods, as well as faster network convergence. The results also show that this pretraining method can be steered towards specific features, according to the selected musical source, while also being dependent on the quality of the separated sources.

Music Boundary Detection Using Local Contextual Information Based on Implication-Realization Model

ABSTRACT. In this study, we propose a novel melodic boundary detection method using the analysis results of the Implication-Realization (I-R) Model as features of machine learning. Melodic boundary detection is a task for identifying perceptual boundaries inside a note sequence, such as the phrase endings. An input feature that can express melodic expectation is important for detecting perceptual boundaries. Thus, we propose a melodic boundary detection method that incorporates features based on the I-R model, a model of local melodic expectation. To investigate the usefulness of the I-R features, we studied the impact of the I-R features on melodic boundary detection performance. The results showed that the addition of the I-R model improves the boundary detection F-measure by three points, exceeding the previous state-of-the-art.

Daisy Dub: a modular and updateable real-time audio effect for music production and performance

ABSTRACT. This paper presents the development of a versatile and modular real-time audio effect unit called the Daisy Dub for music producers, performers and DJs. The device utilises the state of the art Daisy Seed development board by Electrosmith and comes with a custom-made PCB and hardware case. It features a range of real-time audio effects, with an emphasis on creative delays and includes a range of modulation effects and filters in its feedback path. In addition, the unit is compact and portable, with an interactive graphical interface, four knobs and two arcade buttons for performance control, an encoder for menu diving, and an OLED screen for spectrum analysis. It features quadrophonic audio processing in real-time and is portable and powered by USB Type-C. DSP is developed with Gen by Cycling '74 and took inspiration from state of the art hardware audio delay units and modular real-time audio effect processors. A series of DSP and hardware evaluation tests were performed along with two usability tests, from an early cardboard prototype to the final manufactured device, to evaluate the effectiveness and usability. Our research demonstrates the feasibility and potential of creating a versatile and modular real-time audio effect for music production and live performance. The Daisy Dub offers a modern take on contemporary real-time audio effects, emphasising a delay effect, and we believe it contributes to the field of audio effects and music-making in general.

Quantifying the Extended Acceptance of Pioneering Art Music Through the Creation of Electroacoustic Music

ABSTRACT. Since 2009, the laboratory to which the first author of this paper belongs has been running a workshop for creating electroacoustic music in collaboration with inexperienced people from various generations. The authors developed this workshop as a participatory art project activity in which instructors and participants collaborate to create a work of music following a workshop and perform it at a concert, and it has been operating as such since 2015. Although electroacoustic music is considered obscure and currently has few listeners or creators, the workshop saw many inexperienced participants enjoying making electroacoustic music. This paper details our study’s verification of the hypothesis that participants’ level of acceptance of “difficult music” increased after experiencing the workshops. The reason for attempting this verification is not to affirm contemporary music itself and impose existing values surrounding it on a diverse population, but to clarify the cognitive effects of electroacoustic music and create an educational method based on these effects. In this verification, a questionnaire using the semantic differential (SD) method was administered to the creators before and after creating music. The results showed that the creators’ impressions of the listening materials improved after creating music.

Exploring polyphonic accompaniment generation using generative adversarial networks

ABSTRACT. Recently, various neural network architectures have shown capability of achieving compelling results in the field of automatic music generation. Motivated by this, in this work we design a generative framework that is structurally flexible and adaptable to different musical configurations and practices. At first, we examine the task of multi-track music generation without any human input, by modifying and proposing improvements to the MuseGAN architecture, an established GAN-based system, which we use as our baseline. Afterwards, we extend our developed framework to a cooperative human-AI setup for the generation of polyphonic accompaniments to user-defined tracks. We experiment with multiple structural variants of our model, and two different conditional instruments, namely piano and guitar. For both unconditional and conditional cases, we evaluate the produced samples objectively, using a set of widely used musical metrics, as well as subjectively, by conducting a listening test across 40 subjects. The experimental results, using the Lakh Pianoroll Dataset, reveal that our proposed modifications lead to improvements over the baseline from an auditory perspective in the unconditional case, and also provide useful insights about the properties of the produced music in the conditional setup, depending on the utilized configuration.

Vibrotactile feedback enhances perceived arousal and listening experience in music

ABSTRACT. In this study, we measured the effects of vibrotactile feedback on the perception of music performance. Vibration signals were produced by transducers under a tabletop or under a chair and played simultaneously with an audio recording of solo cello music. Vibration types were either the signal recorded from the front plate of the cello simultaneously with the audio recording or white noise following the recorded amplitude. Perceived arousal was measured continuously from N=30 participants. In comparison to non-vibrating control conditions, especially sound-matching vibrations enhanced perceived arousal significantly. Increased amplitude of vibrotactile feedback had a positive but small effect on perceived arousal. In a post-experiment interview, participants described a higher sense of presence and embodiment with sound-matching vibrations.

11:00-12:00 Session SMC Keynote 3: Lise-Lotte Norelius

  KEYNOTE   Lise-Lotte Norelius, eam composer and performer of live-electronics and percussion, Stockholm, Sweden

Location: Kungasalen
12:00-12:45 Session SMC Concert 5


Lise-Lotte Norelius' keynote talk continues with the piece You are the Flower. 

Note: For the exact times of the pieces, please refer to the concert schedule.

A Dialogue, In Linear A

ABSTRACT. Linear A is a writing system that was used by the Minoans (Cretans) from 1800 to 1450 BCE to write the hypothesized Minoan language. Linear A was the primary script used in palace and religious writings of the Minoan civilization. It was succeeded by Linear B, which was used by the Mycenaeans to write an early form of Greek. No texts in Linear A have been deciphered, and thus, it is unknown what this dead language should sound like.

A Dialogue, In Linear A utilizes text from libation artifacts and (presumably) religious objects, imagining what these fragments might have sounded like when spoken by various text-to-speech dialects. A heightened ritual is created, trading any semblance of semantic content for timbral transformation of the voice. A Dialogue, In Linear A seeks to blend features of musique concrète, aural theatre, vocal synthesis, field recordings, and an unknowable ceremonial rite among the sands of time.

The aim of the video was to connect mouth positions with syllables, by matching the sound generated by the text-to-speech converter with the audio of unrelated speech video clips videos. We decided to divide the audio in 40 millisecond fragments as it corresponded with a single frame, so to have the same index for each audio and video segment. The idea was to perform similarity matching between the sound from the converter as a target and use the video sound as a corpus. Davor Vincze designed a Max patch using camu tools within MuBu library, such that it automatically recorded the index of the selected corpus segment while I was feeding the CSS system with the recording of the target soundfile. After several tries, we concluded that the matching was most successful according to centroid and loudness as search descriptors.

With the resulting lists of segmented indices, Andrew Watts was able to cut and reshuffle the six videos in the exact same order. He then montaged the alternative soundfile (from text-to-speech converter) to the newly montaged video. The result was fascinating, as it was obvious that there is congruency between the mouth movements on the video and the character of the sound. However, the fact that the video frames were no more in its nominal order, created uncanny mouth movements as would be expected by something alien or otherworldly.

At SMC2023 the ideal performance location would be Nathan Mielstein - chamber music hall, since this work is for fixed media (video and stereo audio).

Impression of the Pagoda

ABSTRACT. Pagoda as a piece of architecture reflects history, aesthetics, religion, philosophy among many other cultural elements. Pagoda as a concept reminds me of each unique yet contemplative journey visiting different temples. The rituals of recitation and chanting practiced in the temples offer the observers interfaces connecting themselves and the surroundings in different ways. In this piece, the impressions of the three pagodas are depicted in different audiovisual approaches through the real-time interactive performance. Performance video link:

Post-Music #33:2.2

ABSTRACT. Year: 2021 Duration: 15'15" Format: fixed media, stereo audio, single-channel video Link:

Program note: "In a post-apocalyptic world, a man scavenges for sound-making objects to soothe his child."


This work is a shortened version of 'Post-Music #33', originally commissioned and premiered by BRD Scene9 Residency, Bucharest (RO) in March 2020. It is however a reworked, stand-alone piece, and can be played back as such without reference to the longer version.

Please note that all the audiovisual material in this piece is intended as fixed media. No part of it should/could be played live.

For playback, the piece requires: - a computer that can play back 4K video smoothly; - two speakers in stereo arrangement and a subwoofer; - a projector with at least 1920 x 1080 resolution (16:9 aspect ratio); - a large projection screen.

This submission is for the Nathan Mielstein - Chamber Music Hall.

12:45-14:00Lunch Break Oktav
14:00-15:30 Session SMC Papers 7
Location: Kungasalen
A digital toolbox for musical analysis of computer music: exploring music and technology through sonic experience

ABSTRACT. Following earlier research in which we used software to create interactive aural analyses of key works from the computer music repertoire, this paper introduces a set of digital tools, TIAALS (Tools for Interactive Aural Analysis), designed to facilitate other musicologists, who may not themselves be programmers, in producing their own interactive aural analyses of computer music, or indeed music more generally.

Using software has significant advantages over purely written or paper-based analyses: it enables people to learn about both the music and the technology behind it through interactive sonic experience. The software tools can be used to present musical structures aurally using interactive charts, elements of which can be heard by clicking on them to play relevant audio segments. Charts can also be dynamic, changing to reflect the temporal progression of the work or controlled manually to highlight different features. Sonograms can also be ‘live’ and interactive, allowing readers to manipulate the sound. The tools facilitate the incorporation of (annotated) videos, such as interviews/demonstrations by the composers and/or others involved in the musical or technical production of the work. Our previous analyses have also emulated the techniques composers used so ‘readers’ can play with them, learning about the creative processes not just as abstract theory but through practical engagement with sound.

This paper introduces the main elements of TIAALS and shows how it can be used to incorporate many of the features described above into integrated analytical packages. We explain how the tools work and consider the potential, as well as limitations, of the toolbox and the interactive aural approach.

Dynamical Complexity Measurement with Random Projection: A Metric Optimised for Realtime Signal Processing

ABSTRACT. There are many metrics available for observing the degree of complexity of a signal, with multiple applications in computer music. Previous work demonstrated that the Effort To Compress metric could be used to modulate the behaviour of a feedback instrument, however the algorithm is challenging to run in realtime. This research explores the many metrics available, and evaluates a selection of them in their suitability for realtime signal processing in musical instruments. A new metric is proposed and evaluated: Random Projection Complexity.

VocalHUM: real-time whisper-to-speech enhancement for patients with vocal frailty

ABSTRACT. VocalHUM is a smart system aiming to enhance the intelligibility of patients’ whispered speech in real-time, basing on audio data only. Its purpose is to facilitate the patient-caregiver communication, and it is primarily designed for patients in a temporary or prolonged state of physical and vocal frailty (e.g., respiratory infections, geriatric weakness, or partial/total paralysis). The result-ing whisper-to-speech algorithm is language-independent and combines real whispered speech, synthetized vowels, and consonant enhancement techniques.

TickTacking -- Drawing trajectories with two buttons and rhythm

ABSTRACT. The navigation of two-dimensional spaces by rhythmic patterns on two buttons is investigated. It is shown how direction and speed of an object can be controlled with two fingers by duplets or triplets of taps, and how the generated rhythms can be used to monitor the object movement. The trajectories of the object moving on the plane resemble those of sailing boats, and the proposed rhythmic navigation system is tested with a target-following task, using a boat-racing trace as the target. The interface is minimal and symmetric, and can be adapted to different sensing devices, exploiting the symmetry of the human body and the capability to follow two concurrent rhythmic streams.

Polyspring: a Python toolbox to manipulate 2-D sound database representations

ABSTRACT. Corpus-based concatenative sound synthesis is typically used with a projection or reduction of the sound parameter space to a 2-dimensional map where sound segments form point clouds that can be visualized and explored with a mouse or a touch interfaces. While this is satisfying with visual feedback, where possibly sparse and heterogeneous sound spaces can be easily controlled, this remains challenging or impractical without visual feedback and using whole-body movements.

We present polyspring, a Python toolbox dedicated to manipulating the distribution of a set of points in a 2-Dimensional plane. This package implements an algorithm based on a spring network simulation that can redistribute points according to a density target within a given bounded region while preserving the initial order between points. We made several modifications and additions to the previously published unispring algorithm to allow for concurrently interacting with the dataset and manipulating the distribution in real time. The toolbox is open-source and can be used with Max/MSP. We also present different applications of this toolbox in movement-based sound interaction.

XR etudes for augmented piano

ABSTRACT. Studi sulla Realtà Nuova (in English, Etudes on the New Real) is a cycle of etudes created to explore the possibilities of XR in the context of live multimedia performance. Although many etudes have been created, only three etudes have been performed in public. Two of them are for augmented piano. The performer wears an Augmented Reality (AR) headset and hand trackers. This setup allows them to interact with Virtual Objects (VOs) and control visuals and audio. The environment used for the XR performance allows one to control up to 7 different video output and an n-speaker 3D audio configuration. VOs, as well as audiovisual outputs, react to the performer’s gesture. The different aspects of performance develop following a pseudo-narrative outline. One of the key points of interest is the relationship between performer and audience, specifically in terms of transfer of the immersive experience: while the performer is immersed in an AR space, the audience can only see 2D screens. The experimented solution consisted in providing multiple projections and spatial audio. The paper will describe the etudes realized for augmented piano, from a technical and artistic point of view. Space will be also given to the discussion of issues in instrumental performance in an XR setting.

Automatic legato transcription based on onset detection

ABSTRACT. This paper focuses on the transcription of performance expression and in particular, legato slurs for solo violin performance. This can be used to improve automatic music transcription and enrich the resulting notations with expression markings. We review past work in expression detection, and find that while legato detection has been explored its transcription has not. We propose a method for demarcating the beginning and ending of slurs in a performance by combining pitch and onset information produced by ScoreCloud (a music notation software with transcription capabilities) with articulated onsets detected by a convolutional neural network. To train this system, we build a dataset of solo bowed violin performance featuring three different musicians playing several exercises and tunes. We test the resulting method on a small collection of recordings of the same excerpt of music performed by five different musicians. We find that this signal-based method works well in cases where the acoustic conditions do not interfere largely with the onset strengths. Further work will explore data augmentation for making the articulation detection more robust, as well as an end-to-end solution.

Real-Time Implementation of the Kirchhoff Plate Equation using Finite-Difference Time-Domain Methods on CPU

ABSTRACT. In this paper, we develop real-time applications including virtual instruments and plate reverb of the Kirchhoff plate equation with loss and tension by means of a numerical simulation using finite-difference time-domain (FDTD) methods, and they are implemented on central processing units (CPUs) and optimized by loop unrolling or advanced vector extensions (AVX), enabling these applications to execute in real time at fast speeds. These applications are developed as puredata (Pd) externals which can serve as objects in puredata, a real-time graphical programming environment for audio and graphics. In these applications, multiple inputs (excitation or audio signal) and outputs whose positions are free to change in real time are allowed, and physical parameters can be dynamically manipulated in real time, which makes users possible to make both realistic sound and new sound that is not possible to generate in the real world. Additionally, these Pd externals can also be used as modules to build Pd patches, which provides more possibilities for experimental artists.

15:30-16:00 Session SMC Concert 6


Note: For the exact times of the pieces, please refer to the concert schedule.

Location: Lilla Salen
Self-Built Instrument

ABSTRACT. This work is focused on sound performance with an experimental instrument which is composed of strings and metallic sound box, producing overtones, harmonics and feed- back. It is capable to play with different sound colours : Resonances by cooper, bowing on strings, overtones and feedback. All of factors triggers each others sound. It is not a point to play a specific tone or to make a musical harmony, because the instrument is not able to perfectly control. Playing this Instrument is a challenge to your capacity, such as gestures and sonic phenomenon following sense and space. The artist composed a piece and use few repertoire partly, however, mostly it is interesting to find what kind of sound comes to nest in mesh. The Artist tried to get over typical aesthetics of classical music, such as using precise pitches, melodies, and read scores. Instead of that, her approach towards to discover unusual sound elements which are considered as mistake in traditional way. And play with them, for instance, strings without tuning, hitting a stuffs, unorganized pitch, also so-called clicker which happens unskilled.

It is musically composed of circulation of swerving sound and embrace internal and external sound in space. The coupling of acoustic and electronic resonances in a performable instrument that has an almost sculpture like quality is intriguing. The sounds range from complex and exquisite to banal and cliché, and therefore, keep the interest going.

The Eighth Island

ABSTRACT. “The Eighth Island” (“Ósma wyspa”) - 8 channels, 9:05

Inspired by the music from Southeast Asia The Eighth Island is an impression of some islands on Pacific Sea and their cultures getting lost during our times – by our desistance, lack of interest, disrespect, hypocrisy of political correctness, global warming.

Status I

ABSTRACT. The work is a study on spatialisation techniques and consists of nine stereophonic tracks and fifteen statuses. The horizontal structure is given by the succession of statuses representing different perspectives of a sound sculpture (never presented in its entirety). Like a succession of images that, through different perspectives, show a set of details of a marble sculpture. Similarly, the statuses consist of a vertical layering of sound elements (one per track) frozen in time and determining the sound sculpture from certain listening points. The performer's function is to experience each sound status through diffusion and spatialisation in space (spatialisation as a form of augmented listening). The performer must choose the spatial interpretation of each status, the spatialisation strategy to be applied and the sound system in which to perform the piece.

note: the uploaded track is a demonstration example of the interpretation/spatialization of the work through a binaural system (Ambisonic ToolKit ATK for Reaper with HRTF Cipic 0021 - KEMAR Large Pinnae Dummy).

16:00-16:15 Session Concert

Music as Script by Johan Fröst, Henrik Frisk and Claudio Panariello.

MAS (Music As Script) is a project that aims to develop a method to augment and deepen spectators' musical experience and increase accessibility to the music message. This is achieved by leveraging on inter-modality between hearing and seeing) to illuminate musical narrative, events, architecture, and form.

Location: Kungasalen
16:15-17:15 Session SMC Papers 8: Online presentations

The papers are presented remotely, and a moderated panel session with all authors and the on-site participants follows directly in the lecture hall.

Location: Kungasalen
Modeling Piano Fingering Decisions with Conditional Random Fields

ABSTRACT. Deciding what fingerings to use is a core skill for accomplished pianists. We model piano fingering decisions with conditional random fields, demonstrating the power and flexibility of this approach to produce results to compete with the state of the art. We present new corpora of fingering data, compiled from professional pianists and editorial scores. Finally, we analyze recently suggested metrics for evaluating fingering systems and discuss drawbacks in their application to models that do not assume segregation of hand assignments.

Using Deep Learning and Low-Frequency Fourier Analysis to Predict Parameters of Coupled Non-Linear Oscillators for the Generation of Complex Rhythms

ABSTRACT. This paper describes a method for generating rhythmic structures and low frequency amplitude envelopes using non-linear coupled oscillators and a machine learning model. The method is based on the Kuramoto model, a mathematical model used to describe the collective behavior of a system of oscillators, and a multi-layer perceptron neural network. The goal of this approach is not to exactly reproduce input rhythms, but rather to develop a novel form of interaction with chaotic processes for experimental musical practice. The system consists of three components: the generative component producing rhythms, a method for the analysis of rhythmic structures, and a machine learning model that learns the relationships between the parameters of the rhythm generation and the analysis. The Kuramoto model was chosen due to its potential to mediate between periodicity and chaos, creating aesthetically rich and fruitful material for the author's musical practice. The other components serve to explore this model in new ways and to couple it to external rhythmic musical signals.

F0 analysis of Ghanaian pop singing reveals progressive alignment with equal temperament over the past three decades: a case study

ABSTRACT. Contemporary Ghanaian popular singing combines European and traditional Ghanaian influences. We hypothesize that access to technology embedded with equal temperament catalyzed a progressive alignment of Ghanaian singing with equal-tempered scales over time. To test this, we study the Ghanaian singer Daddy Lumba, whose work spans from the earliest Ghanaian electronic style in the late 1980s to the present. Studying a singular musician as a case study allows us to refine our analysis without over-interpreting the findings. We curated a collection of his songs, distributed between 1989 and 2016, to extract F0 values from isolated vocals. We used Gaussian mixture modeling (GMM) to approximate each song’s scale and found that the pitch variance has been decreasing over time. We also determined whether the GMM components follow the arithmetic relationships observed in equal- tempered scales, and observed that Daddy Lumba's singing better aligns with equal temperament in recent years. Together, results reveal the impact of exposure to equal- tempered scales, resulting in lessened microtonal content in Daddy Lumba's singing. Our study highlights a potential vulnerability of Ghanaian musical scales and implies a need for research that maps and archives singing styles.

Crepe notes: a new method for segmenting pitch contours into discrete notes

ABSTRACT. Tracking the fundamental frequency (f0) of a monophonic instrumental performance is effectively a solved problem with several solutions achieving 99% accuracy. However, the related task of automatic music transcription requires a further processing step to segment an f0 contour into discrete notes. This sub-task of note segmentation is necessary to enable a range of applications including musicological analysis and symbolic music generation. Building on CREPE, a state-of-the-art monophonic pitch tracking solution based on a simple neural network, we propose a simple and effective method for post-processing CREPE's output to achieve monophonic note segmentation. The proposed method demonstrates state-of-the-art results on two challenging datasets of monophonic instrumental music. Our approach also gives a 97% reduction in the total number of parameters used when compared with other deep learning based methods.

The effect of actuating the bass trombone second valve on the quality of note transition in legato

ABSTRACT. The independent activation of the two valves of the modern trombone, implemented in the 1950s, made it possible to play the same note by multiple positions. The innovation facilitated not only the execution of chromatic passages but also note transitions that involve slide displacement between positions far apart. The full range of possibilities allowed by this improvement is not yet widespread among trombonists. This paper discusses the effect of using the second valve of the bass trombone on the execution of legato note articulation. Note transi-tions produced in positions far apart were extracted from performances of orchestral excerpts. Expert musicians with professional experience in symphony orchestras were asked to use different valve configurations in the perfor-mances of two orchestral excerpts. Two descriptors were used to infer the quality of the transitions: (i) the time interval spent in carrying out the transitions (Transition Duration); and (ii) the energy stability during the transi-tion (Energy Index). We verified an increase in efficiency in the articulation between these notes, played in legato when the second valve was used alone, by the statistical-ly significant decrease of Transition Duration in 39.2% and increase of the Energy Index in 10.6% (both df = 107 and p < 0.001).