View: session overviewtalk overview
Philip Nelson holds the post of Professor of Acoustics in the Institute of Sound and Vibration Research at the University of Southampton. He has personal research interests in the fields of acoustics, vibrations, signal processing, control systems and fluid dynamics. He served from 2005-2013 as Pro Vice-Chancellor of the University of Southampton, with particular responsibility for Research and Enterprise. From 2014-2018 he served as Chief Executive of the UK Engineering and Physical Sciences Research Council. He is the recipient of both the Tyndall and Rayleigh Medals of the Institute of Acoustics and served as President of the International Commission for Acoustics from 2004-2007. He is a Fellow of the Royal Academy of Engineering and was made a Commander of the Order of the British Empire in the 2018 New Year Honours for his services to UK Engineering and Science.
What if we could not only see but also hear and compose within the lost soundscapes of history? Advances in immersive audio and visual technologies are redefining how we reconstruct, analyze, and even compose within historical spaces. This session explores how spatialized sound, virtual acoustics, and interactive performance environments breathe new life into cultural heritage sites, shaping new creative and research paradigms in musicology, sound studies, anthropology, and beyond.
From reconstructing ancient ritual performances to composing new works within historically resonant spaces, this panel will investigate how immersive environments enable novel interactions between architecture, acoustics, and musical creativity. How do composers engage with reconstructed soundscapes? How can digital tools allow us to reimagine lost musical traditions? What role does spatial audio play in contemporary composition inspired by historical sites?
We invite innovative, practice-based projects—especially those incorporating composition, interactive demos, and performance-based research—to explore how immersive technologies can be a bridge between the past and future of music and sonic expression.
Jonathan Berger (CCRMA - Stanford University, United States)
| 09:45 | #102 - Aural Dissipation: Investigating Spatial Acoustic Behavior Through Electroacoustic Harp Performance and Multichannel Spatialization in a Cooling Tower PRESENTER: Valeria Zane ABSTRACT. "Aural Dissipation" is a site-specific sound installation that explores the possibility of perceiving architectural space as an acoustic instrument. The work takes place inside the cooling tower of a decommissioned power plant, where the structure’s volume and reverberant surfaces become sonic material to be investigated through a live performance for harp and electroacoustic processing. The harp, performed directly within the space, activates and modulates the environment: the sound is captured, processed in real time, and diffused through a discrete multichannel spatialization system. The installation is preceded by an acoustic survey phase, including impulse response measurements and reverberation analysis, aimed at understanding the spatial behavior of sound and informing the design of the diffusion setup. The project is part of a broader research framework focused on the role of sound in design processes, and it proposes a critical interplay between musical gesture, architecture, and immersive audio technologies. Expected outcomes include acoustic data, critical reflections, and documentation materials that contribute to formulating new transdisciplinary strategies rooted in sonic design thinking, emphasizing the potential of sound as a tool for interpreting, activating, and reconfiguring space. |
| 10:00 | #27 - What Did They Hear? - An immersive presentation of the Chauvet-Pont-D'Arc cave. PRESENTER: Luna Valentin ABSTRACT. We present an immersive sensory experience to simulate the soundscape of early humans in the Chauvet-Pont-d’Arc Cave—a UNESCO World Heritage site in southern France. Renowned for its Paleolithic art, the cave has been the focus of acoustical measurements that facilitate the creation of augmented 3D environments that incorporate acoustical models with visual data. Using advanced auralization techniques, we enable reconstruction of ancient soundscapes. As the cave is closed to the public, our VR-based auralizations provide an immersive means of experiencing its unique acoustics and artistic heritage for the broader audience as well as researchers of the humanities. We present this immersive environment as a case-study of the possibilities offered by the Ambisonic Virtual Acoustics Playback Toolkit within the larger framework of the Paleoacoustics project. |
| 10:15 | #72 - Virtual Reconstruction of Historical Heritage: A 6DoF Immersive Audio-Visual Reproduction of Magoksa Temple ABSTRACT. This paper presents a six-degrees-of-freedom (6DoF) audiovisual framework that advances spatial computing capabilities for virtual reconstructions of cultural heritage sites. Despite recent advances in virtual reality (VR) and spatial computing, achieving perceptually plausible spatial audio remains a significant hurdle, especially within the field of digital heritage. To address this, we propose a hybrid framework that integrates a 3D scanned model with room acoustical data captured by a spherical microphone array. This approach is demonstrated through a case study of Daegwangbojeon Hall at Magoksa Temple, a UNESCO World Heritage site in South Korea. This integration facilitates a holistic representation in which visual geometry and room acoustical properties are computationally cross-referenced to enhance spatial coherence and plausibility. Evaluation through a comparison of physical acoustic metrics shows our method more accurately replicates original acoustic characteristics than conventional impulse response convolution approaches using First-Order Ambisonics (FOA) or monophonic, especially for sound clarity and auditory spatial impression. These findings offer a perceptually grounded approach for immersive documentation that is valuable for VR-based heritage preservation. |
| 10:30 | #20 - A Tripartite Framework for Immersive Music Production: Concepts and Methodologies (Online Presentation) ABSTRACT. Music production has long been characterized by well-defined concepts and techniques. However, a notable gap exists in applying these established principles to music production within immersive media. This paper addresses this gap by examining post-production processes applied to three case studies, i.e., three songs with unique instrumental features and narratives. The primary objective is to facilitate an in-depth analysis of the technical and artistic challenges in musical production for immersive media. From a detailed analysis of the technical and artistic post-production decisions in the three case studies and a critical examination of theories and techniques from sound design, audio journalism, and archival audio preservation, we propose a tripartite categorization of mixing within immersive media: Traditional Production, Expanded Traditional Production, and Nontraditional Production. These concepts expand production methodologies in the context of immersive media, offering a framework for understanding the complexities of spatial audio. By exploring these interdisciplinary connections, we aim to enrich the discourse surrounding music production, rethinking its conceptual plane into more integrative media practices outside the core music production paradigm, thus contributing to developing innovative production methodologies. |
| 10:45 | #39 - "When We Went In: The D-Day Experience in Light and Sound" - A Site-specific Immersive Audio and Video Remembrance (Online Presentation) ABSTRACT. Partnered with the National D-Day Memorial Foundation to commemorate the 80th Anniversary of D-Day, The Virginia Tech Institute for Creativity, Arts, and Technology presented "When We Went In: The D-Day Experience in Light and Sound" at the National D-Day Memorial in Bedford, Virginia, USA on June 7th and 8th, 2024. This site-specific installation at the center of the 6-acre memorial featured a custom 13.3 immersive audio array to present spatialized audio for thousands of audience members. Original music and spatial-first sound design were created in conjunction with artfully animated moving images, which were displayed through a complex 120,000 lumen projection mapping solution across the 200-foot-wide, 80-foot-tall main section of the memorial on 14 discrete surfaces—immersing viewers in the sights and sounds of the Normandy invasion. With permission, the latest AI text-to-speech and voice-changing technology was used to integrate the voices of those no longer living into the piece, telling the story of D-Day through the words of those who were there. This report details the creative and technical processes for this special immersive presentation. |
| 11:00 | Sound as a Gateway to the Past: Enhancing Cultural Heritage with Audio Augmented Reality through Bone Conduction in the Memorie Sonore Project ABSTRACT. Sound plays a central role in shaping cultural heritage experiences, fostering a dynamic connection between past and present and enhancing visitor immersion through auditory engagement. Memorie Sonore project explores the use of bone conduction headphones to create immersive experiences through audio augmented reality in museum settings. Unlike traditional headphones, bone conduction devices rest on the cheekbones, transmitting sound directly to the inner ear while keeping the ear canal open. This unique feature enables the simultaneous perception of both actual and virtual sound layers, allowing historical audio reconstructions to merge with the surrounding acoustic environment. Developed within the Musei di Tutti network, the project reconstructs historical soundscapes across four key sites: Palazzo Vecchio and the Museo degli Innocenti in Florence, as well as the Archaeological Area and the Fondazione Primo Conti in Fiesole. Visitors engage in interactive, spatialized audio experiences that integrate historically informed reconstructions, sound design, and original musical compositions. The onsite experience provides visitors with bone conduction headphones and an audio guide app, offering location-based soundscapes and narrative storytelling. To ensure historical accuracy, historians, archaeologists, anthropologists, and sound designers have collaborated closely. Techniques such as impulse response measurements, auralization, and binaural recording were employed to create spatially coherent and immersive soundscapes. The online component features a 360-degree audiovisual virtual tour, enabling global audiences to explore museum spaces through an immersive audio-visual experience. Unlike the onsite version, the online tour incorporates contemporary environmental sounds, bridging past and present soundscapes. Memorie Sonore fosters inclusive, multisensory access to cultural heritage, offering innovative ways to experience history through sound. Additionally, the project enhances accessibility for visually impaired visitors by using sound as a primary medium for spatial and narrative engagement. |
For this session, we invite papers and posters about AI used in Acoustics and 3D Immersive Audio applications.
Audio Scene Analysis: Using AI to automatically detect, isolate, and analyze complex audio environments.
AI-Driven Sound Design: Applying machine learning to generate realistic 3D soundscapes or predict the acoustic behavior of spaces.
Virtual Acoustics Optimization: AI for simulating room acoustics faster and more accurately, including real-time predictions of reverberation and reflections.
Real-Time Spatial Audio and Networking
Low-Latency Audio Rendering: Advancements in delivering real-time spatial audio in networked environments like telepresence and remote collaboration.
3D Audio for Virtual Meetings: Research on binaural and spatial audio solutions to make remote communication more natural and reduce fatigue.
Marta Rossi (Abertay University, UK)
| 11:45 | #62 - Deep Neural Network for Personalization of Parametric Head-Related Transfer Functions in a Median Plane (Online Presentation) ABSTRACT. Head-related transfer functions (HRTFs) characterize how the human head and body modify the frequencies of sound waves as they travel toward the ear, thus aiding people in determining the direction and location of sound sources. HRTFs have different shapes that depend on the individual listener. Some studies have therefore used deep neural network (DNN) models to synthesize personalized HRTFs by measuring the sizes of the listener’s ears and head, but they have struggled to estimate large numbers of outputs (e.g., 200 samples in an impulse response or 512 samples in an HRTF). Therefore, in this work, we introduce parametric HRTF synthesis to reduce the number of outputs, and the DNN model synthesizes the HRTF in a median plane, in which the individual differences are likely to occur. The measured HRTF was approximated via series synthesis of six peaking digital filters, which were characterized in terms of center frequency, gain and bandwidth, and the output dimensions could then be reduced to 18 parameters. Use of this data compression process allowed the log spectral distance from the measured HRTF to be improved by 1 to 2%, and psycho-acoustic experiments showed even difficulty to localize accurate in the 0 (front) and 180 (back) degree using the estimated parametric HRTF. |
| 12:00 | #21 - User-Centered Evaluation of Smart Musical Instruments with Embedded Real-Time Pattern Detection ABSTRACT. The emerging class of Smart Musical Instruments (SMIs), which embed onboard intelligence and wireless connectivity, has opened new avenues for creative possibilities for musicians. This paper addresses the challenge of real-time polyphonic audio pattern detection embedded into SMIs, which are used to trigger control messages to external devices upon the pattern identification during a live performance. We describe a real-time algorithm based on deep learning that recognizes a set of predefined polyphonic patterns from an incoming audio stream. We embed such an algorithm into smart electric guitar and smart keyboard prototypes connected to external peripherals, and assess the detection accuracy, latency, and user satisfaction with a user study involving 22 musicians. Qualitative results showed that the musicians enjoyed using the system, and they expressed interest in integrating deep learning-based tools into their performances. The user study resulted in a precision of 0.72, a recall of 0,67, and an F1 score of 0.66. We also describe and discuss a musical performance that incorporated two smart musical instruments with real time pattern detection capabilities, which were used to trigger external peripherals which included stage lights, smoke machines, mixed reality headsets, and haptic wearable devices. |
| 12:15 | #48 - Application of Ambisonic Microphones and AI Agents for Automatic Localization and Classification of Ambient Sound Sources ABSTRACT. The article shows a novel integration of ambisonic microphones, multimodal sensing, and artificial intelligence (AI) to create an advanced system termed the "semantic acoustic microscope." This innovative methodology combines third-order ambisonic microphone arrays, 360° video capture, LiDAR depth sensing, and AI-driven analysis for automatic localization and classification of ambient sound sources. Traditional noise assessment methods relying on averaged decibel metrics are inadequate for capturing the complex, dynamic, and semantic nature of urban acoustic environments. By leveraging modern AI models such as Transformer networks and large language models (LLMs), the proposed system aims to provide not just quantitative, but also qualitative, semantic interpretations of acoustic scenes, thereby offering meaningful insights into sound environments. Such insights have significant implications for urban planning, public health, environmental policy, and fundamental auditory research. |
| 12:30 | #44 - Interactive IoMusT-Based Concerts: Real-Time Pattern Recognition and Audience Experience ABSTRACT. This paper explores the possibilities offered by the Internet of Musical Things paradigm to create new interactive performance experiences. We introduce a novel performance ecosystem where performers utilize smart musical instruments equipped with embedded Real-time Pattern Detection (RTPD) algorithms to trigger wirelessly connected devices. To validate our approach, the system was ecologically validated with two concerts, which involved two groups of performers and two groups of audience members. We compared the use of RTPD-triggered effects against randomized triggering, and we assessed the experience of a composer, performers and audiences in interacting with the ecosystem. Audience members preferred the condition in which the RTPD system directly triggered effects rather than random triggering. Overall, composers, performers and audience valued this new art format. The results indicate that combining smart musical instruments with pattern detection in an Internet of Musical Things ecosystem can set new avenues for artistic performance and audience engagement. |
| 12:45 | #28 - MoTT: A Speech Dataset for Modular Composition of Turn-Taking Conversations (Online Presentation) PRESENTER: Davide Fantini ABSTRACT. Among the numerous speech datasets in the literature, only a minority concerns conversational data, and even fewer datasets isolate the elements occurring in turn-taking conversations. To address this gap, this paper presents MoTT, an English speech dataset composed of questions, answers, reciprocal questions, and backchannel responses recorded by eight participants. The questions and answers pertain to ten topics and were recorded in two takes. The voice directivity pattern was simultaneously captured at frontal and lateral positions by two microphones. The MoTT dataset was designed to provide interchangeable conversational elements and enable their modular composition to obtain fictional but plausible and convincing conversations. As a result, multiple virtual speakers engage in a turn-taking conversation that emulates real-world interactions, with spatial audio techniques employed to enhance realism by arranging the speakers in the auditory scene. This dataset offers a valuable resource for studies in immersive spatial audio, human-computer interaction, and auditory scene analysis. The dataset is therefore well-suited for experiments that necessitate the simulation of ecologically valid conversations, as the one described in the use case reported in this paper. |
This session aims to investigate how acoustic modelling and virtual reality (VR), combined with advanced 3D audio technologies such as binaural audio and ambisonics, are transforming our understanding of the relationship between space, performance, and audience.
This session welcomes contributions that:
- Present case studies, experiments, and innovative projects demonstrating the effectiveness of acoustic and virtual technologies for research and the enhancement of tangible and intangible heritage.
- Explore the use of VR and 3D audio technologies for the analysis, reconstruction, and experience of historical performance spaces.
- Investigate the development of new forms of immersive performance and audience engagement, and new creative inputs for musicians, dancers, singers, composers, and conductors.
Antonella Bevilacqua (University of Parma, Italy)
#17 Multirate Modal Reverberator in Tetrad - Michele Ducceschi
#73 Réaltaht - An Irish Traditional VR Concert Experience - Joseph Clarke
#25 Technical Demonstration: Distance Extension of HRIRs in Hybrid Acoustic Environments - Pasquale Mainolfi
# SIPARIO Portable Modular Wave Field Synthesis soundbars - Adriano Farina
