ISH 2025: INTERNATIONAL SYMPOSIUM ON HEARING 2025
PROGRAM FOR MONDAY, JUNE 2ND
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:40 Session 2: Cochlea and Auditory Nerve
09:00
Inner-hair-cell dysfunction as another form of hidden hearing loss that degrades speech-in-noise perception

ABSTRACT. People with sensorineural hearing loss (SNHL) have difficulty understanding speech in noise, even with state-of-the-art hearing aids. While audibility is often addressed, individual differences in aided speech recognition may arise from suprathreshold deficits hidden from current audiological assessments. The majority of SNHL studies have focused on overt effects from outer-hair-cell (OHC) dysfunction (e.g., reduced sensitivity, compression, frequency selectivity), with recent emphasis on cochlear synaptopathy (CS) as a form of hidden hearing loss. Animal studies suggest that inner-hair-cell (IHC) dysfunction may also be significant, affecting modulation coding through a shallower transduction function following stereocilial damage. IHC dysfunction also reduces synchrony capture in vowel responses (Bruce et al., 2003), with significant implications in recent modulation-coding theories (Carney, 2018). Here, we characterize IHC dysfunction as another form of hidden hearing loss through chinchilla auditory-nerve spike-train-response estimates of IHC transduction functions and non-invasive assays.

Methods: Physiological data were recorded from anesthetized chinchillas with normal hearing or various types of SNHL: Isolated IHC dysfunction following carboplatin exposure, CS following moderate noise exposure with temporary threshold shift (TTS), and mixed OHC, IHC, CS effects following louder noise exposure with permanent threshold shifts (PTS). Single-unit spike trains were recorded from auditory-nerve fibers as rate-level functions and CF-pure-tone period histograms. IHC transduction functions were estimated from period histograms using analyses developed by Horst et al (2018). EFRs were recorded using sub-dermal electrodes in response to a variety of modulated stimuli.

Results: Estimated IHC transduction functions from ANF spike-train responses were significantly shallower in carboplatin-treated animals compared to normal-hearing; but not for TTS and PTS. This result is consistent with scanning electron microscopy from our carboplatin-exposed chinchillas showing IHC-stereocilia disarray, which reduces efficient IHC transduction. EFRs to carboplatin-exposed animals showed a reduction in higher harmonics to all modulated stimuli, whereas TTS animals showed a reduction only for the RAM25 stimuli previously suggested as a useful CS assay.

Our ANF data provide the first quantitative estimate of the effects of various forms of SNHL on IHC transduction functions, helping improve existing models to better capture IHC dysfunction (which is currently underfit compared to OHC dysfunction). Our EFR data provide unique insight into developing a non-invasive assay for IHC dysfunction, which is primarily estimated now as the portion of audiometric loss unexplained by OHC damage (e.g., Moore loudness models, Bruce/Carney ANF models). Our results have important implications for precision audiology diagnostics given potential implications of IHC transduction damage for complex-sound encoding.

09:20
Nonlinearity and level-dependent phase shifts in vibrations of the top of the mouse organ of Corti at low frequencies

ABSTRACT. Techniques like optical coherence tomography have recently advanced our understanding of cochlear mechanics beyond the level of the basilar membrane (BM). While the current measurement resolution does not permit in vivo examination of hair cell stereocilia motion, it is possible to record vibrations of the nearby structures, including the reticular lamina (RL) and tectorial membrane (TM). These structures typically exhibit stronger mechanical nonlinearity and gain than the BM, not only at the characteristic frequency (CF) of the measurement location, but also at lower frequencies, where BM responses are essentially passive and linear. Though nonlinearity in low-frequency RL or TM motion may be relevant to phenomena observed in auditory nerve responses, these motions have not yet been thoroughly characterized.

To address this, low-frequency responses of the top of the live mouse organ of Corti were examined over a wide range of sound pressure levels (SPLs) at an apical location tuned to a CF of ~9 kHz.

At frequencies far below the CF (< 2 kHz), responses of the RL and TM grew nonlinearly at stimulus levels above ~80-85 dB SPL and exhibited ~0.5 cycle phase shifts at higher levels. Such phase shifts were typically associated with a transition from response saturation (or even rollover) to mildly expansive growth. Comparison with BM motion revealed that these behaviors are likely due to interactions between active outer hair cell responses and the passive motion of the organ of Corti. At low frequencies, BM motion toward scala vestibuli is theoretically accompanied by outer hair cell contraction, such that the RL and TM are actively pulled toward scala tympani at the same time. At levels where the active response saturates, the passive motion component starts to dominate, leading to a phase shift in the overall motion.

The nonmonotonic growth and phase shifts observed here were similar to those seen in auditory nerve fiber responses in other mammals, suggesting that such neural phenomena may have a mechanical origin. Though inner hair cell stereocilia may be stimulated by multiple modes of motion that are yet to be fully understood, the present data demonstrate that just the superposition of active and passive motion components can produce complex behaviors that likely shape the cochlea’s afferent output.

09:40
Information processing from acoustics to spike trains: A modeling study of the innear ear

ABSTRACT. Our understanding of hearing has constantly improved since the discovery of cochlear tonotopy by van Békesy. The discovery of otoacoustic emissions and har cell motility further boosted the understanding of the physiological mechanisms underlying the psychoacoustically measured performance. These findings paved the way for numerous technological solutions to treat hearing loss, including hearing aids and cochlear implants. The progress in providing a perceptual benefit for hearing impaired people stagnated, however, during the last decade. This stagnation might justify to consider some of the very fundamental assumptions underlying current models of hearing. This study focuses on the discrepancies between experimental observations in inner ear mechanics and mechano-electrical transduction, and the implicit assumption of harmonic oscillations in the description of hearing. The goal of the study is to identify not-yet considered physical mechanisms underlying the highly efficient and fragile information encoding in the inner ear.

We utilize numerical modeling to account for key aspects of otoacoustic emissions, inner ear oscillations, and mechano-electrical transduction. Two systems of coupled oscillators are used to investigate the interplay between external stimulation, inner ear oscillation, and mechano-electrical transduction. The models are compared at different levels of abstraction to identify ubiquitous phenomena agnostic of a specific parameter choice. The simulations are compared to existing experimental data on otoacoustic emissions (OAE), fine structure phenomena, and auditory nerve activity. The presence and transfer of information in the model is analyzed using Shannon information theory in the acoustical-, mechanical, and neural domain.

The results show that a system of coupled, nonlinear oscillators can account for a large variety of aspects in experimental data, including spontaneous activity, high sensitivity, fine structure phenomena and nonlinear effects like distortion and suppression. The simulations reveal systematic dynamic statistical properties of individual elements in the system which are not considered in long-time averaged metrics. These properties have a strong impact on the locus of information in the absence and presence of external stimulation. The presence of external stimulation reveals a highly dynamic behaviour of the intrinsic tonotopy of the system which is consistent with experimental paradigms to measure tonotopy, but question its validity for complex stimuli.

The findings of this study extend the amount of phenomena that can be explained by established filter-based models. The species agnostic approach can help to identify the fundamental physical mechanisms underlying acoustic information processing and might utilize nonlinear dynamic phenomena to help with the compensation of hearing deficits.

10:00
Nonlinear OHC Model Exhibiting Narrowband Chaos: Frequency Selectivity

ABSTRACT. Active cochlear models have become increasingly popular due to their improved performance in simulating the basilar membrane response compared to passive models. One key component of these active models is the outer hair cell, which can oscillate at specific frequencies based on its position on the basilar membrane. This oscillation enhances the movement of the basilar membrane and effectively acts as a gain referred to as cochlear gain at those particular frequencies. Additionally, outer hair cells respond very quickly, necessitating that any model of these cells exhibits high frequency selectivity and excellent temporal resolution. The Generalized Hopf Oscillator, proposed by Bozovic and Faber in 2024, satisfies these criteria, as shown in the context of temporal modulation by Handel et al. in 2025. This oscillator is a nonlinear dynamical system that exhibits noise-induced chaos under specific parametrizations.

The present paper explores and analyses the behavior of the Generalized Hopf Oscillator extended to the human auditory frequency range. This extension demonstrates that the oscillator can effectively serve in models requiring operational capabilities over a broad spectrum. In addition, we delve deeper into the system's frequency selectivity by conducting a comparative analysis with two psychoacoustic experiments, highlighting how the oscillator's performance aligns with human auditory perception dependent on the system's parametrization.

We demonstrate that the system surprisingly exhibits narrowband chaos independent of the characteristic frequency influenced by the system's parameters. This observation suggests that chaotic behavior is restricted to a limited range of frequencies that cluster around the characteristic frequency. Consequently, the resulting signal in the time domain presents a chaotic envelope superimposed on a deterministic carrier wave. However, the approximation of the Lyapunov Exponent proposed by Bozovic and Faber remains valid, indicating that it is independent of the characteristic frequency. The results of the proposed system can explain greater frequency selectivity compared to the human auditory system.

We conclude that narrowband chaos may be essential for understanding cochlear dynamics. A chaotic outer hair cell would primarily affect only the neighboring cells, which have a characteristic frequency inside the narrowband chaos, while still being able to respond quickly to stimuli. This model configuration enables the suppression of specific stimuli, providing a plausible explanation for psychoacoustic phenomena such as sound source separation. Incorporating the concept of narrowband chaos could enhance pitch perception and frequency selectivity in active cochlear models.

10:20
Analysis of the effects of spontaneous rate class on synchrony-level and rate-level functions of auditory nerve fibers

ABSTRACT. It is widely known that the dynamic range of rate-level functions of low-spont auditory nerve fibers (ANFs) is much greater than that of high-spont ANFs, and this difference is often evoked in theories of sound coding at higher sound pressure levels. However, the dynamic range of synchrony-level curves has previously only been investigated by Johnson (PhD Thesis 1974; JASA 1980) with pooling of ANFs from all spontaneous rate classes. Preliminary simulations with an ANF model (Bruce et al., Hear Res 2018) and some example synchrony-level curves in the literature indicate that the spontaneous rate may dramatically affect the growth of synchrony with level, motivating a quantitative analysis.

In this study, the cat ANF data of Peterson and Heil (J Neurosci 2019) were analyzed using the approach of Johnson. The normalized discharge rate was plotted as a function of normalized synchrony for rate-level and synchrony-level curves that satisfied statistical criteria consistent with the curves having both reached their respective maximum by the highest sound level. Separate plots for low-, medium-, and high-spont ANFs were generated in addition to the plot for the pooled data. Predictions were also generated using the model of Bruce and colleagues.

The physiological data and simulation results both show a pattern in the growth of the synchrony and discharge rate for high-spont ANFs consistent with the pooled results of Johnson: first there is growth of synchrony alone with level, then joint growth of synchrony and discharge rate until the synchrony saturates, followed by growth in the rate until it saturates. However, very different patterns in the relative growth of synchrony and rate are observed for low- and medium-spont ANFs: synchrony saturates extremely rapidly with level in these ANFs, after which the discharge rate grows slowly (consistent with the extended rate-level dynamic ranges of these classes of ANFs).

The results of this study indicate that: i) low- and medium-spont ANFs likely did not pass the statistical criteria for inclusion in Johnson’s pooling of his data, skewing his results; ii) the relative synchrony-level and rate-level growth of the Bruce et al. model is consistent with the data of Peterson and Heil for all spontaneous rate classes; and iii) the very small dynamic range of synchrony-level curves for low-spont ANFs (which contrasts with their large dynamic range for rate-level curves) should be considered in theories of sound coding.

[Supported by NSERC Discovery Grants RGPIN-2018-05778 & RGPIN-2024-05888.]

11:10-12:30 Session 3: Masking
11:10
Forward-masking characterization of MOC-activity dependence on modulation frequency: Potential non-invasive physiological assay of IC modulation tuning

ABSTRACT. Recent animal studies (Romero and Trussell, 2022) highlight the inferior colliculus (IC) as a key input to the medial olivocochlear (MOC) system, although this pathway is much less explored than the cochlear-nucleus (CN) input. IC cells, sensitive to a range of low-frequency modulation frequencies inherited from auditory-nerve (AN) responses, convey spectral information distinct from the level-driven input of the CN. MOC neurons exhibit band-pass modulation transfer functions (MTFs), potentially originating from IC Band Enhanced (BE) cells. We hypothesize that varying modulation frequency changes the IC input to MOC, independent of CN effects, with reduced MOC activity observed for frequencies outside the IC’s MTF range. Testing this hypothesis provides insights into IC input to MOC but also may offer a noninvasive estimate of IC MTFs.

Using simultaneous measures of envelope following responses (EFRs) and transient evoked otoacoustic emissions (TEOAEs) in a forward-masking paradigm in awake chinchillas, we investigate how modulation frequency impacts the modulation-enhancement effect—an increase in EFR response with forward masking compared to the unmasked condition. While this effect has been observed in chinchillas (Farhadi et al., 2023) and humans (Bharadwaj et al., 2015), this study focuses on the dependence on modulation frequency. TEOAEs are measured immediately before and after stimuli to evaluate cochlear-gain changes, and the phase-locking value (PLV) in EFR responses is analyzed to quantify neural changes from MOC activation.

Preliminary results show a broad reduction in TEOAE responses with the masker, while responses in silence remain unchanged. For SAM tones, a greater decrease at the carrier frequency is observed at a modulation frequency of 103 (within IC MTF) compared to 303 Hz (outside IC MTF). Additionally, EFR PLVs demonstrate an increase in the first PLV peak under forward-masking conditions, with stronger effects at 103 than 303 Hz. These findings suggest a modulation-frequency-dependent enhancement of EFR responses, consistent with our hypothesis based on IC MTFs and our computational modeling (Farhadi et al., 2023).

Understanding the MTF of IC cells is important as its degradation with aging and hearing loss would impair speech comprehension in noise. While invasive single-unit recordings provide direct IC MTF measurements in animals, these measurements are not feasible in humans. We propose a noninvasive assay of IC MTFs in chinchillas using MOC dependence on modulation frequency in our forward-masking paradigm. This method enables within-subject normalization by comparing masked and unmasked responses, offering a promising framework to overcome expected higher variability in future human studies.

11:30
Effect of Noise Correlation on Binaural Signal Detection: The different role of decorrelating noise versus delaying noise

ABSTRACT. Langford and Jeffress (J. Acoust. Soc. Am., 1964, Vol. 36, pp. 1455-1458) reported the results of binauaral detection experiments employing 500-Hz, Sπ (interaurally phase inverted) tones masked by narrow-band noises. Maskers varied in interaural correlation (at lag zero). Values of interaural correlation were realized by applying interaural delays (ITDs) of 0-8 ms in steps of 1 ms (NτSπ conditions). Langford and Jeffress compared their results to Robinson and Jeffress’s (J. Acoust. Soc. Am., 1963, Vol. 35, pp. 1947-1952), who achieved different masker interaural correlations via “noise mixing” (NτSπ) showing that Masking Level Differences (MLDs), for a given value of masker correlation were essentially identical regardless of the method used to achieve the masker correlation. Thus, it appeared that no mechanism of internally compensating the applied ITDs was operative.

In a second experiment, Langford and Jeffress measured MLDs over a range of masker ITDs, each yielding a masker interaural correlation of zero. Those delays corresponded to 1/4th 3/4th , etc., of the period of 500 Hz. Large MLD values were found at shorter delays consistent with delay compensation. That outcome is at odds with the notion that only the interaural correlation of the masker, per se, determines the MLD.

We reasoned that the apparent discrepancy might stem from the relatively large ITDs (0-8 ms) and step-size (1 ms) Langford and Jeffress employed in their comparison. For such values, delay compensation would be expected to be relatively ineffective. Accordingly, we measured MLDs using Langford and Jeffress’s approach, but included smaller and more intermediate values of ITD. For small ITDs, the MLDs obtained were larger than those obtained with noise-mixed maskers (NρSπ), a finding consistent with a detection advantage conferred by internal delay compensation. Beyond about 3 to 4 ms, both types of MLD (NτSπ and NρSπ) become more similar, suggesting, as expected, that internal delay compensation is reduced in effectiveness as ITD increases.

11:50
The effects of broadband and tonal precursors on psychoacoustic tuning curves measured in simultaneous and forward masking

ABSTRACT. Physiological measures have shown that the medial olivocochlear reflex (MOCR) decreases the gain of the cochlear active process in response to sound. We have used psychoacoustic techniques to show behavioral evidence of gain reduction, which could be consistent with the MOCR. Paradigms thought to measure frequency selectivity and the input/output function at the level of the cochlea have used stimuli that should be too short for the MOCR to affect the signal, given the onset delay of the MOCR. A precursor sound is then presented before these stimuli to evoke the MOCR. Recent studies have used forward masking, to avoid complicating effects of suppression. The current study was examines the effects of suppression, by comparing the effects of a precursor on frequency selectivity measured using forward masking and simultaneous masking. Psychoacoustic tuning curves (PTCs) were measured using simultaneous and forward masking, with and without a precursor. This measured the change in the PTC with a precursor with and without the presence of suppression. Broadband precursors and tonal precursors at masker frequencies were measured because previous studies have shown broadening with broadband precursors, and sharpening when tonal maskers or notched-noise maskers are the precursors. PTCs measured with forward masking were on average sharper than those measured in simultaneous masking, consistent with the contribution of suppression in simultaneous masking. A pink broadband noise precursor broadened PTCs in both forward and simultaneous masking. Tonal precursors at masker frequencies well below and well above the signal frequency decreased threshold masker levels for forward masking at and above the threshold masker level. In simultaneous masking, such precursors did not have an effect on the low-frequency side until they approached threshold masker levels for forward masking. On the high-frequency side, these precursors sometimes increased threshold masker level. Tonal precursors at the signal frequency always decreased masker level (i.e. broadened tuning). These results are consistent with previous studies showing that in this paradigm, the tuning for a precursor is very similar to tuning for forward maskers. This supports the hypothesis that simultaneous masking is partly due to suppression, and these lower masker levels may not activate the MOCR. Although more complicated, simultaneous masking is important to understand because this is the more typical real-world condition.

12:10
A data-limit account of spatial and spectral release from masking

ABSTRACT. Speech-on-speech listening involves selectively attending to a target talker while ignoring a simultaneous competing talker. Spatially separating the talkers improves performance, a phenomenon known as spatial release from masking (spatial RM). However, it is also possible to improve performance by spectrally separating the talkers, i.e., filtering them into non-overlapping frequency bands (spectral RM). In both cases, RM benefits derive from enhanced availability of the target signal.

The relative benefit of spatial vs. spectral RM is currently unknown. Furthermore, it is unclear how listeners’ ability to exploit spatial vs. spectral cues is related to individual differences in cognitive abilities. It has been suggested that cognitive resources are of greater utility when less of the target signal is available, implying that the cognition/performance relationship should be strongest when spatial and/or spectral separation (i.e., RM) is limited or absent. However, the data-limit account (Norman & Bobrow, 1975, Cognitive Psychol 7(1):44) suggests that cognitive resources cease to be of use when the target is severely degraded, implying that the cognition/performance relationship should in fact be weakest when there is no RM.

In this study, participants (N=240) completed a selective listening task in which they transcribed the speech of one of two simultaneously-presented talkers. We filtered the speech into frequency bands such that the talkers were either spectrally overlapping or interleaved (spectral RM vs. no spectral RM). We also manipulated perceived spatial distance between talkers, presenting them either at +/-90⁰ azimuth (dichotic) or collocated (diotic) (spatial RM vs. no spatial RM). We additionally administered a battery of cognitive tasks to assess three key components of working memory/attention: phonological loop, executive function and selective/divided attentional control. Factor analysis was used to derive a single cognitive score for each participant.

Spectral RM was at least as effective as spatial RM in improving transcription performance, with the best performance observed when both types of RM were present. Cognitive scores were significantly positively correlated with both spatial and spectral RM benefits. Additionally, cognitive scores best predicted performance in the three RM conditions, with the weakest correlation observed in the condition with neither spatial nor spectral RM.

These results suggest that listeners can gain as much benefit from spectral as spatial cues during speech-on-speech listening. These RM benefits appear to be supported by cognitive processes, with larger RM benefits associated with better cognition. Finally, cognitive abilities were least predictive of performance when no RM was present, supporting a data-limit account.

16:00-17:40 Session 4: Spatial Cues and More
16:00
Behavior and electroencelphalography demonstrate that interaural level differences enhance attentional focus

ABSTRACT. Few studies have explored how spatial cue realism affects performance on spatial auditory attention tasks. Indeed, many headphone-based studies “spatialize” sounds using only interaural time differences (ITDs) or only interaural level differences (ILDs) – in extreme cases playing different monaural signals to each ear. One study showed that, compared to ILDs or ITDs, individualized head-related transfer functions (HRTFs) led to better performance and stronger electroencephalography (EEG) signatures of attention on a top-down spatial selective attention task (Deng et al., Neuroimage, 2019). Motivated by this, the current study explores whether spatialization also influences the ability to ignore bottom-up, salient interrupters.

Eighteen subjects with normal hearing thresholds performed a selective attention task. A target stream (syllables “bah,” “dah,” and “gah,” in random order) was spatialized to either the left or right (30˚ from midline), while a competing stream of otherwise identical syllables (in different random order) occurred symmetrically from the opposite side. A cue (“bah”) at the start of each trial indicated the target direction. On 25% of random trials, a salient “MEOW” occurred after the first and before the second target syllable (90˚, contralateral to the target). In different blocks, spatialization used either individualized HRTFs, ILDs (minimum-phase representation of the HRTF), or ITDs (all-pass representation of the HRTF). Simultaneously recorded EEG allowed us to analyze the N1 component elicited by the to-be-ignored distractor (from centro-medial channels) and alpha power over parietal EEG channels.

Compared to full HRTFs or ILDs, ITDs produced the largest performance decrement due to the interrupter and the worst target-syllable recall performance. Neurally, the to-be-ignored disrupter elicited a statistically larger onset N1 for ITDs than for ILDs; the N1 was similar in magnitude for HRTFs and ILDs. Alpha power lateralization in parietal channels, a signature of top-down spatial attention, was strongest for HRTFs, intermediate for ILDs, and weakest for ITDs (not even reaching statistical significance).

Results show that stimuli spatialized with ILDs or HRTFs allow listeners to more effectively deploy spatial attention and ignore distracting interrupters compared to stimuli containing only ITDs. This enhancement of attentional focus manifests both in better behavioral performance and stronger neural measures of attention. For most measures, ILDs and individualized HRTFs produced similar outcomes, suggesting that a key factor is whether or not stimuli contain ILDs: our findings hint that individualization of HRTFs may not confer large advantages over simple ILDs. Future research should address the mechanisms by which ILDs enhance attentional focus.

16:20
Evaluation of an ITD-to-ILD transformation as a method to restore the spatial benefit in speech intelligibility in hearing impaired listeners

ABSTRACT. To improve speech intelligibility in complex everyday situations, the human auditory system uses Interaural Time Differences (ITDs) at low frequencies and Interaural Level Differences (ILDs) at high frequencies, to enable binaural unmasking, better-ear listening and support auditory stream segregation. For normal-hearing (NH) listeners and spatially separated sources, these mechanisms typically improve speech intelligibility substantially. Hearing impaired (HI) listeners, however, are often much less sensitive to ITDs, leading to limited binaural benefits and decreased speech intelligibility performance. Note that at low frequencies, no ILDs will occur in real life situations, although the auditory system is in fact sensitive to low-frequency ILDs. Therefore, one could transform low-frequency ITDs, which are not that useful for HI listeners, into ILDs to potentially reintroduce a binaural benefit for HI listeners at low frequencies.

To investigate whether such a transformation could be beneficial, an experiment was run with NH listeners for whom the loss of ITD processing was simulated by manipulating HRTFs. Speech reception thresholds (SRTs) were measured in the presence of spatially separated interfering talkers, showing that this transformation can improve speech intelligibility in anechoic environments. In the experiment, listeners’ SRTs were measured for "unprocessed" conditions (containing both natural ITDs and ILDs) serving as NH base-line, simulated hearing impairment (original high-frequency ILDs but removed ITDs), and ITD-to-ILD transformation (no ITDs, original high-frequency and augmented low-frequency ILDs keeping overall level constant).

For a centrally positioned target speaker with two laterally positioned interfering speakers, the removal of ITDs decreased the SRTs by roughly 3 dB compared to the unprocessed condition. The transformation restored the SRTs to nearly the unprocessed value. For a laterally positioned target speaker, the transformation even improved the SRTs by 2 dB compared to the unprocessed condition (NH baseline).

These findings, using NH listeners with simulated hearing impairment, seem to imply that such a transformation method would be suited to restore binaural benefits in HI listeners. To further verify the effectiveness of this transformation, the same experiment will be repeated with HI listeners. From this experiment, we aim to learn how accurately spatial deficits due to hearing impairment are simulated in NH listeners by removing the ITDs from the signals, as well as how much of the binaural benefit can be restored for HI listeners by adding low-frequency ILDs. Through beamforming techniques, the transformation method could also be implemented in hearing aids and cochlea implants to directly benefit hearing impaired listeners.

16:40
Robust spatial hearing beyond interaural time and level differences in humans over deep neural networks

ABSTRACT. Spatial hearing is a critical function enabling the localization and separation of sound sources in complex environments, with interaural time (ITD) and level (ILD) differences being well-established cues for localization performance. However, their precise contribution and the role of additional auditory features to human and deep neural network (DNN) performance remains a pivotal question. Here, we directly compared the localization performance of humans and DNNs trained on spatial hearing tasks using a novel dataset of binaural recordings in diverse acoustic environments. To disentangle the role of ITD and ILD, we tested both groups with natural, synthesized, and manipulated stimuli that selectively included or excluded these interaural cues. We found that both humans and DNNs heavily relied on ITD and ILD for sound localization, with the combination of these cues providing significantly better localization performance than either cue alone for both biological and artificial systems. Surprisingly, humans exhibited remarkable robustness, maintaining localization consistency even when ITD and ILD cues were absent, suggesting reliance on additional auditory features. In contrast, DNNs demonstrated high accuracy only for stimuli that resembled their training regime but failed to generalize in conditions where ITD and ILD were absent, highlighting their limited adaptability to cue manipulation and the need for further refinements to better approximate the human performance. These results underscore the robustness of human spatial hearing beyond primary interaural cues compared to current artificial systems and point to promising directions for advancing artificial systems and informing clinical applications, such as cochlear implants and auditory prosthetics.

17:00
A Unified Model of Binaural Interaction in the Mammalian Brainstem

ABSTRACT. Introduction: The ability to localize sound sources in the horizontal plane primarily depends on the extraction of interaural time and level differences (ITD and ILD). In mammals, the primary encoder neurons for binaural differences are found within the medial and lateral superior olive (MSO and LSO) located within the brainstem. Following Rayleigh's 'Duplex Theory' of sound localization, the LSO has traditionally been associated with ILD encoding, while the MSO has been considered the primary processor of ITDs. However, this separation was primarily based on experiments using tones, and newer studies have revealed that the LSO also processes time differences conveyed by both the envelope and the fine structure of a signal. Furthermore, evolutionary evidence suggests a common origin for the LSO and MSO. Current computational models of binaural interaction typically focus on representing either one nucleus or the other and often rely on computationally expensive spiking neuron implementations. This study proposes an efficient modeling framework that extends to both MSO and LSO, enabling its use as a front-end for more complex behavioral models or machine learning approaches.

Methods: We use a linear filter model where each synaptic input is a filter with a cut-off frequency determined by synaptic time constants. The neuronal membrane acts as a transfer function, creating a cascade of filters. Auditory nerve fiber firing probabilities from existing periphery models serve as input. Model output is compared to biophysical data.

Results: Our model replicates key response characteristics of MSO and LSO neurons, including sensitivity to ITDs and ILDs. Analysis reveals strong similarities in their processing mechanisms, with the primary difference being the time scales of binaural comparisons. Adjusting synaptic input strengths can morph neuron responses between MSO-like and LSO-like.

Discussion: This linear filter model offers a computationally efficient and analytically tractable framework for understanding binaural interaction. It provides a new tool for investigating sound localization and can be extended to model more complex auditory tasks, bridging the connection between physiology and psychophysics.

17:20
Binaural interaction component in loudness perception

ABSTRACT. Loudness as the perceptual correlate of stimulus intensity plays an important role for auditory perception. Loudness is related to the sound pressure level, but many more physical parameters like bandwidth, duration, spectral contents and also monaural versus binaural stimulus presentation affect the perceived loudness. Physiological investigations have assembled a comprehensive characterization of the sensory coding of intensity in the periphery of the auditory system, including the cochlea, auditory nerve, and auditory brainstem. The transformation of sensation into perception at the cortical level, however, is less well understood. It has been reported that especially some hearing-impaired listeners show an increased binaural loudness summation for broadband stimuli (Oetting et al., Hear. Res. 335, 179-192, 2016), in comparison to normal hearing listeners. Auditory functional MRI has been used in this study to identify neural correlates of loudness perception, and specifically binaural loudness summation, continuing previous work investigating the physiological correlate of loudness functions for monaural stimuli (Behler and Uppenkamp, NeuroImage 139, 176-188, 2016)). The general loudness judgement is uniform in the sense that it is combining acoustic input from both ears. The question is, at what stages in the auditory pathway neural activation from the involved anatomical structures is integrated.

All listeners participated in a psychoacoustic experiment to gather individual loudness functions for monaural and binaural stimulation. Functional MRI was used to collect responses from auditory regions for individually adjusted levels leading to the same loudness in all three conditions, monaural left, monaural right and binaural. Data were analyzed with respect to the questions to what extent activation maps for binaural stimulation are directly predictable from the results for monaural stimulation, and which regions in cortex might exhibit a specific binaural interaction component.

As in previous studies, neural activation in cortical structures - represented here indirectly by the BOLD response - is highly correlated with individual loudness judgements. An additional binaural interaction component could be detected, indicating that the brain activation in response to binaural stimuli is not just a linear superposition of monaural responses.

In summary, the presented data demonstrate a large potential of auditory fMRI to investigate neural correlates of individual loudness perception. The neural activation in the central auditory system integrated across hemispheres forms the basis for the overall loudness judgement. This opens a way to investigate the physiological processes underlying the observation that individuals with similar audiometric thresholds may show large differences with respect to their binaural loudness summation.

18:10-19:30 Session 5: Selective Attention and Dynamic Perception
18:10
Interaural differences and spatial selective attention jointly influence activity in prefrontal cortex

ABSTRACT. Interaural time differences (ITDs) and interaural level differences (ILDs) cooperatively shape the neural representations of sounds to convey source location. Spatial selective attention (SSA) dynamically modulates these responses, enhancing the representation of a target source compared to competing maskers. Prefrontal cortical activity is linked to both listening effort, and top-down deployment of SSA via specialized networks of attention. Here, we investigated how ITDs and ILDs each contribute to SSA and activity in PFC.

We conducted two word-detection tasks in normal hearing listeners. Listeners were asked to attend to a target stream spatialized to the left or right using either ITDs or ILDs while ignoring a masker stream in the opposite quarterfield. In Task 1, subjects clicked a button when they heard a color word amongst object words in the target stream; in Task 2, subjects identified the word “bash” from the set {“bash”,”dash”,”gash”} in the target stream. To force listeners to focus attention based on spatial cues, we temporally staggered word onsets to eliminate rhythm cues, and we used the same talker for the target and masker streams to eliminate voice pitch cues. Competing streams were spatially separated using small ITDs, large ITDs, small ILDs, or large ILDs. In each condition, we calculated target hit and false alarm rates. We measured hemodynamic activity in PFC using functional near-infrared spectroscopy (fNIRS) in both tasks and event-related potentials using electroencephalography (EEG) in Task 2 only.

In both tasks, performance was better with large ITDs than small ITDs, driven by differences in hit rates. Performance was also better with large ILDs than small ILDs, but this difference was instead driven primarily by false alarm rates. Hemodynamic activity in PFC varied with spatial conditions in both tasks: small ITDs elicited a larger hemodynamic response in PFC than any other spatial condition, although PFC activity was not statistically different for large vs. small ILD conditions. Preliminary EEG results suggest that attentional modulation of early sensory responses may also vary as a function of spatial separation.

These behavioral data suggest that large ITDs and ILDs enhance release from masking in SSA tasks, but the two cues support sound source segregation through separate mechanisms. Greater PFC activity for small vs. large ITDs may reflect differences in either listening effort or top-down deployment of SSA, a topic to be explored further in future studies and by comparing with EEG results.

18:30
Auditory selective attention is impacted by the temporal coherence of a task-irrelevant visual stimulus in ferrets

ABSTRACT. Multisensory integration is a fundamental property of mammalian sensory systems, allowing us to combine information about objects in the world across senses. Audio-visual integration is especially relevant for understanding auditory processing, as visual information impacts auditory scene analysis, multi-modal object formation and speech processing. We trained ferrets on an auditory selective attention go/no-go task, requiring them to respond to the presence of a brief timbre deviant embedded in a target stream while ignoring those in a simultaneous non-target stream. Both streams were independently amplitude modulated artificial vowels. 50% of trials had no timbre deviants in either stream, requiring animals to hold for the full duration of the stimulus. Streams were 5 – 6.2s in duration, and target and non-target streams differed in pitch and vowel identity (counterbalanced across trials), with the target stream cued by beginning 1s earlier than the non-target. The non-target stream was attenuated relative to the target stream by 6-8 dB, with the relative level determined by the ferrets’ performance, to avoid both floor and ceiling effects. Each trial had a luminance modulated visual stimulus in which luminance modulation was temporally coherent with amplitude modulation in either the target, non-target or neither stream. We selected visual stimulus properties as those showing the greatest impact on neural activity in auditory cortex in neural data recorded from ferrets presented passively with similar audio-visual stimuli. Importantly, the auditory feature coherent with the visual stimulus (amplitude) was orthogonal to the feature used for the detection task (timbre), and the visual stimulus provided no information that would assist the task. Ferrets performed the task well, and response selectivity (d’) was higher when the visual stimulus was coherent with the target stream compared to the non-target stream (1.12 – 1.46 vs 0.99 – 1.16, across 3 animals). The effect of visual coherence differed across animals, with d’ changes in two animals driven principally by an increase in false alarms to non-target deviants when the visual stimulus was coherent with the non-target stream, and one animal showing a higher hit rate in the target-coherent condition. These results are in line with previous findings in a human version of this task, suggesting that temporal coherence results in the integration of visual and auditory information to create a cross-modal object.

18:50
Pupillometry as a window on the cognitive processes that underlie dynamic auditory perception

ABSTRACT. Salient sounds evoke momentary pupil dilations that are superimposed on, and interact with, slower pupil fluctuations that can reflect various arousal-related brain states, such as cognitive load, alertness, and uncertainty. Accumulating evidence suggests that the transient pupil responses are related to prediction error (surprise) signals that signify a need to update one’s beliefs to causally explain the sensory input. Pupil size recordings during sound sequence presentation can thus be used to study statistical learning and predictive processing in auditory perception.

Here, we employed pupillometry during a sound localization task in which 32 participants listened to spatially volatile sequences of 50 ms noise bursts that were presented binaurally at a 2 Hz regular rate. Lateral angles in the frontal horizontal plane were sampled from normal distributions with a fixed 10° SD, but their mean could be resampled from a uniform distribution [-60°, 60°] at random changepoints (hazard rate = 1/6). Participants indicated the location of the last sound together with a spatial confidence interval whenever the sequence stopped at unpredictable times.

Behavioral data analysis demonstrated that participants’ performance was qualitatively in line with predictions from a reduced Bayesian observer model. Participants’ accuracy and confidence were lowest immediately after a large prediction error that signalled a probable changepoint, and they progressively increased as the prior belief became more informative through averaging over preceding sound locations. Accordingly, sound localization responses were biased towards the model-predicted prior locations, weighted by the priors’ relevance and reliability. Interestingly, pupillometry analysis revealed that sound-evoked pupil dilations during ongoing sequence presentation correlated negatively with model-predicted prior relevance, while slower pupil size modulations correlated negatively with prior reliability. Furthermore, trial-to-trial and inter-individual patterns of behavioral responses could be partially predicted by the pupil size measurements.

These results confirm the utility of pupillometry for studying auditory perception in volatile environments. In contrast to previous studies, we evaluated the proposed relationship between pupil size and Bayesian belief updating during the presentation of uninterrupted, purely auditory stimuli sequences. By verifying the relationship for non-luminous stimuli and without confounds due to explicit decision making or response action preparation, we corroborate the claim that pupil size modulations pertain to fundamental neural processes in dynamic perception.

19:10
What happens to spatial auditory attention after conversation in a noisy environment?

ABSTRACT. Auditory spatial attention plays a key role in dealing with cocktail-party situations, yet its temporal dynamics are not fully understood. Here, we tested the hypothesis that spatial auditory attention depletes after effortful communication in noise. In contrast to the frequently used dual-task paradigms, our experiment measured spatial auditory attention directly in a behavioral task. We used a syllable streaming paradigm (SSP) before and after a 30-minute conversation of real participants in a highly realistic noisy space (72 dBA). In the SSP, participants heard two simultaneous streams made of syllables: ‘ba’,’da’,’ga’; spatialized to the left and the right. The task was to repeat the syllables of a target stream denoted by an auditory spatial cue shortly before the onset of the two streams. The conversation, between two SSP rounds, took place in a laboratory, where two participants stood and talked freely. Both were wearing microphones, motion-tracking units, and headphones – which delivered virtual auditory space (VAS) but preserved the actual direct sound. The VAS was using real-time signal processing to account for the head movements to create a spatially stable acoustic scene including subject’s reverberation, four spatialized interfering talkers, and spatially diffuse babble. Young normal-hearing participants participated in three experimental sessions, preceded by a training session, in which they were tested for a change in attention from pre-test to post-test in three conditions of intervention: conversation in noise, conversation in quiet, and listening to noise. Each session started with a 160-trial-long SSP pre-test, followed by an intervention, and concluded with a post-test identical to the pre-test, without any break between the last two parts. The results were analyzed as a change in performance – percent of correctly identified syllables – between the pre-test and the post-test. The amount of increased effort during the conversation in noise, as compared to the other two types of interventions will be discussed, with the long-term goal of bringing objective yet ecologically valid ways of understanding the role of attention in context of the models of effortful listening. [LH: Recipient of a Seal of Excellence Postdoctoral Fellowship of the Austrian Academy of Sciences.]