ISH 2025: INTERNATIONAL SYMPOSIUM ON HEARING 2025
PROGRAM FOR WEDNESDAY, JUNE 4TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:40 Session 10: Speech
09:00
Speech perception as inference: indexing top-down effects of prior-knowledge on EEG responses to natural speech

ABSTRACT. It has been posited that perception involves inferring the causes of our sensory input by combining that input with predictions derived from our learned/evolved model of the world. And it has been suggested that such predictions play a role in language processing. More specifically, it has been proposed that predictions – based on experience and context – interact with speech signals at each of several hierarchical stages that transform acoustic patterns into categorical representations that convey meaning. However, neurophysiological evidence of such interactions remains elusive – particularly in the context of natural speech.

Here, we sought to characterize electrophysiological (EEG) indices of speech predictions and their effects on speech encoding along the processing hierarchy –across three experiments. In Experiment 1, 23 participants listened to ~30 minutes of an audiobook. We then modeled how the EEG reflected the acoustic and phonetic features of each word - as a function of how semantically predictable that word was in its narrative context. We found higher modeling accuracy for more surprising words; a result consistent with EEG indexing prediction error during natural speech comprehension. In Experiment 2, 23 participants listened to acoustically-degraded sentences that were preceded by a text-based prime that either matched or mismatched the sentence. We found a striking behavioral effect of matched priors on subjective intelligibility. We again modeled the EEG based on acoustic and linguistic speech features and found higher model accuracy for both acoustic and phonetic features for the mismatched trials. Again, this is consistent with EEG indexing prediction error. In Experiment 3 (N = 4; study in progress), we presented participants with an audiobook that had been modified to alter the surprisal of individual words, while preserving the level of constraint provided by their preceding context. The idea was to enable us to determine whether the effects seen in Experiment 1 were better explained as resulting from prediction error or attention. Preliminary results on this experiment suggest that dynamic word-to-word fluctuations in attention influence the sensory encoding of natural speech. While these findings do not rule out a role for prediction in natural language comprehension, they do suggest a need to be careful in distinguishing between the influence of top-down attention and prediction in the prelexical perception of speech.

Finally, we discuss the potential utility of these paradigms for research on the putatively aberrant influence of prediction on perception in certain clinical conditions such as autism and schizophrenia.

09:20
Of mice, men, and models: a temporal processing model for predicting auditory brain activity in anesthetized mice outperforms standard methods for speech tracking in human EEG

ABSTRACT. Temporal mechanisms of human speech perception can be explored via "speech tracking": using linear modelling to reconstruct continuous speech from human EEG responses or to predict EEG responses from speech. The aim is to identify temporal features in speech that drive cortical activity. Speech-tracking analyses typically rely on representations of speech derived solely from the audio signal, such as the acoustic envelope, and do not take into account non-linearities in auditory temporal processing that may influence cortical drive. We postulated that human speech tracking might be improved with a simple non-linear model of auditory temporal processing originally developed to predict brain activity in the auditory thalamus of anesthetized mice [1]. An essential part of this model is normalization by recent sound level history, which transforms the acoustic envelope into an “adaptive gain” signal representing stimulus-dependent central auditory responsiveness.

Here, we asked if cortical tracking of continuous speech in human EEG recordings, usually measured using the speech envelope, could be improved by using the adaptive gain representation instead. We analyzed two datasets from different laboratories [2,3] consisting of continuous EEG recordings from either British (n=18) or Danish (n=22) participants listening to audiobooks.

Cortical speech-tracking performance, quantified by correlating the EEG-decoded speech representation with the actual representation extracted from speech, was significantly improved when linear modelling was performed using the adaptive gain representation instead of the speech envelope (Wilcoxon test p<0.001, Cohen's d=0.87). This improvement in speech tracking was observed in both datasets (p<0.001, d>0.8 for both) and in all listeners individually (p<0.05 for 87.5% of listeners, similar but non-significant trend for others). Similar results were obtained for an encoding model (better EEG prediction for adaptive gain versus envelope representation; p<0.01, d>0.58 for both datasets together and each dataset individually; p<0.05 for 30% of individual listeners, n.s otherwise).

Results indicate that adaptive gain provides a more accurate representation of acoustic events driving cortical speech tracking than the speech envelope. We conclude that fundamental insights into auditory temporal processing gained from anesthetized mice are relevant to understanding the mechanisms of human speech perception.

[1] Anderson LA & Linden JF (2016). J Neurosci 36(6), 1977-1995. [2] Etard O & Reichenbach T (2022). 10.5281/zenodo.7086208 [3] Simon A, Bech S, Loquet G & Østergaard J (2022). 10.5281/zenodo.7500806

09:40
Acoustic and neural cues for the detection and identification of auditory and speech sounds

ABSTRACT. According to the power spectrum model of masking, a signal is detected in noise if its presence produces a detectable increase in energy at the output of the auditory filter centered at the signal frequency. Similarly, vowels can be identified based on spectrally local increases in energy at the formant frequencies. Although increases in energy are generally thought to be encoded via increases in firing rate of auditory nerve fibers (ANFs), issues such as limited dynamic range and rate saturation of ANFs have often been raised as potential objections to this so-called rate-place coding. An alternative hypothesis is that tone detection and vowel identification are achieved based on cues related to amplitude modulation (AM) or the fluctuation strength in ANF responses. Specifically, adding a tone to noise leads to a spectrally local decrease in AM or fluctuation strength; similarly, fluctuations are greater between formants than at the formant peaks of vowels. Although humans are known to be sensitive to both energy (rate-place) and AM (fluctuation-strength) cues, it is unclear whether one cue dominates in typical situations where both cues are potentially available, or whether different populations (e.g., typical-hearing, hearing-impaired, cochlear-implant listeners) rely on the cues to different extents. The answers to these questions are important because they may influence what kind of signal processing approaches are most likely to be successful in enhancing speech in noise. A series of experiments are underway that are designed determine the cues listeners use in tasks including tone-in-noise detection in narrow- and wide-band noise at various sound levels, identification of synthetic vowels based on harmonic complex tones where formants are defined by combinations of level increments and phase manipulations of components, and the identification of vocoded vowels, where modulation and intensity cues can be more easily separated than in natural conditions. Preliminary results from tone-in-noise data suggest energy cues may dominate in narrowband-noise conditions but that both cues are weighted approximately equally in broadband-noise conditions. Preliminary results from synthetic vowels have revealed conditions in which vowels can be identified in equal-amplitude tone complexes in the absence of intensity cues, based solely on changes in the phase relations between neighboring harmonics. These experiments provide new directions for developing empirical tests of the use of different acoustic and neural cues for critical tasks and stimuli in auditory and speech perception, and will provide useful data to compare with predictions from computational models of the auditory periphery.

10:00
A binaural front-end for speech intelligibility models: application to the hearing-aid speech perception index (HASPI)

ABSTRACT. Binaural hearing can improve the intelligibility of a speech source that is spatially separated from competing sound sources when compared to co-located conditions. Monaural speech intelligibility models cannot predict this spatial release from masking (SRM). A binaural front-end is proposed that can be combined with monaural models to account for SRM. From the noisy speech signals at the two ears, it produces a binaurally-enhanced monaural signal that can then be evaluated by monaural models. The Hearing Aid Speech Perception Index (HASPI) is a monaural model that compares the degraded noisy speech signal to a clean speech reference and allows intelligibility predictions for speech degraded by additive noise, reverberation, spectral changes, and nonlinear distortion. HASPI was used here to test the applicability of the binaural front-end. The model predictions were compared to full psychometric functions: intelligibility scores for normal-hearing listeners measured as a function of signal-to-noise ratio (SNR), from three Danish datasets simulating sources in anechoic conditions with headphones. The speech source was always frontal. A single stationary speech-shaped noise (SSN) was tested at ten azimuths in dataset 1. In dataset 2, an SSN or a non-stationary noise were tested at three azimuths, with or without ideal binary mask processing that simulates noise reduction in hearing aids. In dataset 3, the competing sound was obtained by mixing in seven different proportions the signals resulting from three sources: an SSN simulated in front (co-located with the target speech), a diffuse noise coming from all directions, and an SSN simulated at 115-degree azimuth. HASPI was originally developed to predict percent correct for English sentences at positive SNRs, thus the model back-end was here re-fitted to predict percent correct for Danish words at negative SNRs. This new fitting followed the original HASPI procedure and only used the co-located conditions of the datasets, with no SRM nor binaural effects involved. The front-end generally improved predictions when compared to a model version that considers only the monaural HASPI at the better ear. This was always true in the conditions with a point source emitting a stationary noise (from any direction). The front-end predictions are very accurate then, while the monaural better-ear predictions systematically under-estimate performance. Predictions are less accurate for non-stationary noise, and they under-estimate performance in the conditions with diffuse noise.

10:20
Listening effort for soft speech in quiet and its implication for benefits of amplification

ABSTRACT. It is well understood that sounds require a certain sound pressure level to be audible. For speech, we can further differentiate between thresholds for detection, classification, and intelligibility. Especially, the speech-reception-threshold plays an important role for the evaluation of one’s hearing ability and the benefits of amplification. However, recent studies at our lab suggest that an important threshold, the threshold for effortless listening, has been missed out so far and should be given more consideration.

Several experiments were carried out where speech intelligibility and experienced listening effort were evaluated in young normal-hearing adults. To this end, the Oldenburg sentence test (OLSA) and the Adaptive CAtegorical Listening Effort Scaling (ACALES) were performed for speech in quiet. Both tests were also carried out using a threshold-simulating noise at conversational level. Furthermore, in some experiments, linear amplification was provided using a research hearing aid based on the open Master Hearing Aid platform and the Portable Hearing Lab hardware.

In our results, 50 % and roughly 100 % speech intelligibility were reached at 15 dB and 25 dB respectively, but listening was only rated effortless at 40 dB. Compared to reference measurements in spectrally matched noise at conversational level, the slopes of both the speech intelligibility curve and the listening effort ratings were flatter for soft speech in quiet. Also, the relations between speech intelligibility and rated listening effort were altered compared to the well-described case in noise. We observed a similar but weaker effect when experiments were conducted using the threshold-simulating noise. Amplification of soft speech in quiet significantly reduced self-rated listening effort while it had a minor effect on speech intelligibility.

The results indicate that speech in quiet needs to be presented well above speech-reception-threshold to achieve listening without noticeable effort. We believe that this observation cannot fully be explained by assuming that soft speech is masked by internal noise different from that of speech, but partly also reflects cognitive effects. Furthermore, to assess the potential benefits of amplification for both normal-hearing and hearing-impaired people, the listening effort for soft sounds should be given more attention on top of speech intelligibility. Future studies should assess whether the same effect can also be observed in hearing-impaired listeners, and whether hearing aid gain targets for soft sounds sufficiently consider listening effort.

11:10-12:30 Session 11: Interaural Time Difference
11:10
Improving ITD Sensitivity of Cochlear Implanted Rats: Microsecond Precise Versus Binaurally Jittered ITD Information

ABSTRACT. Sound localization is one of the major problems for patients with bilateral cochlear implants (biCI). One reason for that is the poor or absent interaural time difference (ITD) sensitivity, especially if they suffer from prelingual deafness. This led to the hypothesis that a lack of binaural hearing experience during an early critical period might limit ITD perception in these patients. However, recent work by our research groups has shown that neonatally deafened (ND) rats, bilaterally cochlear implanted in young adulthood, can be trained to use ITDs to lateralize sounds with remarkably low ITD thresholds (~50 μs) if the biCIs deliver informative pulse timing (pt) ITDs from the onset. Limitations in the temporal accuracy of current clinical biCI, which try to encode temporal features of sound in pt, can lead to small deviations (jitter) in the pt pattern. Here we investigate how much jitter the inexperienced auditory system can tolerate in order to develop good ITD sensitivity.

Twelve ND biCI rats were trained to lateralize pulse trains with jittered ITD. Baseline ITD were drawn independently from a set of ±120 μs in 20 μs steps. Binaural jitter values changed per pulse and were randomly drawn in one cohort from a distribution ranging from ±60 μs in 20 μs steps (n = 7) and in the other cohort from ±120 μs in 40 μs steps (n = 5). After five weeks of training, biCI rats were tested on their ITD sensitivity with and without jitter on each pulse.

Under both jitter conditions, all biCI rats showed very good ITD sensitivities. Training with ±60 µs jitter resulted in a significantly better ITD sensitivity threshold (just noticeable difference = 20.0 µs) than without jitter (≙ microsecond precise ptITD; just noticeable difference = 39.5 µs) or with ±120 µs jitter (just noticeable difference = 41.0 µs).

The results show that the early-deafened auditory system can develop good ITD sensitivity under electric stimulation even in the presence of jitter in ptITDs. Interestingly, small amounts of jitter may even enhance ITD sensitivity. These findings highlight the importance of fine structure stimulation strategies for biCI patients to improve their spatial hearing.

11:30
What – if anything – is coincidence detection?

ABSTRACT. The prominence of the time dimension in the workings of the auditory system is one of its alluring features. While there is no shortage of experimental findings elaborating on temporal response properties, there is only limited physiological insight in higher-order response features that hinge on such temporal properties (i.e. insight how these properties are “used”). A core concept in discussions of temporal processing is that of coincidence detection and its neural implementation, the coincidence detector. Although longstanding physiological evidence for coincidence detectors exists at a phenomenological level, mechanistic examinations have only recently been performed. I will first review a simple way to display temporal features triggered by sound in auditory neurons, based on the generic concepts of temporal delays and coincidence detection. These displays are then contrasted with empirical data: intracellular recordings from two cell types which are traditionally regarded and modeled as prototypes of coincidence detectors. Binaural neurons in the medial superior olive (MSO) are sensitive to “coincidences” in action potentials from the ipsi- and contralateral ear, making the neurons sensitive to interaural temporal differences. Octopus cells in the cochlear nucleus are sensitive to “coincidences” of action potentials across inputs tuned to a wide range of frequencies. The data show that in both cases, the spike output of neurons does not simply reflect coincidences of input spikes. When converging inputs vary in strength and temporal pattern, their sequence of activation is an important determinant of response strength. Maximal output is not simply determined by the maximal number of coincidences, but reflects temporal input structure combined with intrinsic (membrane) properties. In both MSO and octopus neurons, sequence detection imparts new properties which are not predicted by coincidence detection. These observations suggest a simpler mechanism of temporal sensitivity than the temporal delays and coincidence detection that are the neural operations traditionally posited as key elements in temporal processing.

11:50
Inhibitory inputs to ITD circuits

ABSTRACT. In birds, detection of sound source azimuth begins in the nucleus laminaris (NL), which computes interaural time differences (ITDs). Inhibition has been proposed to protect NL neurons from losing ITD sensitivity with increasing sound level, but inhibitory mechanisms are still unknown. Anatomical studies in birds reveal GABAergic synapses throughout the first order nuclei, while in vitro studies in chicken NL showed that depolarizing, GABAergic IPSPs shorten membrane time constants, perhaps to allow the membrane potential to follow rapid synaptic currents accurately over a range of sound levels. In the first order nucleus magnocellularis (NM), long-lasting, inhibitory responses emerge from stimulation of the Superior Olivary nucleus (SON). Given the importance of inhibition in regulating auditory brainstem activity, and especially in understanding the effect of hearing loss on the ITD circuit, we have examined the nature of the inhibitory input in NL from the SON in vivo. The SON is the major source of descending GABAergic inhibition to the ipsilateral NM, NL and NA, and receives excitatory input from the first order nucleus angularis (NA) and NL. We recorded isolated inhibitory synaptic currents from extracellular field potentials in the NL of barn owls through the iontophoretic application of blockers of GABA or glycine. We also characterized responses in the SON and used viral tracers to reveal projections from SON to NL. Blockers of GABA or glycine caused increases in the strength of the field potential evoked by ITD stimuli. Furthermore, the effects of strychnine were more pronounced than gabazine, with sub-linear summation of the two. Responses to auditory stimulation showed that the temporal dynamics of the evoked synaptic contributions were generally consistent with synaptic short-term facilitation of the GABAergic inputs. Furthermore, profiles of synaptic activation revealed more prominent inhibition following stimulus offset, suggesting inputs to NL originate from both the primary-like and offset response types of the SON. These heterogeneous response types may represent separate SON neuronal populations. Recordings in the SON showed that most units responded preferentially to binaural noise with broad frequency tuning and low firing rates. Response types included off-responses, sustained and primary-like. Viral tracing experiments showed SON projections back to brainstem excitatory nuclei and to inferior colliculus and contralateral SON.

12:10
Benefits of Enhanced Phase-locking for Binaural Temporal Coding in the Auditory Brainstem

ABSTRACT. Binaural sound localization relies on the neural processing of interaural time and level differences (ITDs/ILDs). Neurons of the lateral superior olive (LSO) in the auditory brainstem vary their spike rates according to the ILD and envelope ITD of bilaterally presented sound stimuli. Bushy cells in the anteroventral cochlear nucleus (AVCN) transfer the relevant monaural acoustic information from auditory nerve (AN) fibers to the LSO. More specifically, spherical bushy cells (SBCs) send excitatory inputs to the ipsilateral LSO, while globular bushy cells (GBCs) project to the contralateral medial nucleus of the trapezoid body (MNTB) that provides inhibitory inputs to the LSO. While previous physiological studies reported an enhancement of phase-locking in bushy cells compared to AN fibers and suggested its contribution to binaural coding, most (if not all) modeling studies assumed that bushy cells work rather as a simple relay from AN to the binaural stage. In the present study, we developed a computational model that includes the AN, SBC/GBC, and LSO stages, to investigate the effects of temporal enhancement in bushy cells on binaural coding in LSO. The Bruce-Carney-Zilany model of the auditory periphery was used to simulate the sound-driven spiking activity of AN fibers. Previously developed adaptive coincidence counting model was adopted for bushy cells. Each GBC was assumed to be innervated with 20 AN fibers; each SBC was assumed to receive two large, near-threshold (endbulb) and 18 small (bouton) inputs. The parameters of the model were set to replicate known physiological responses, including the primary-like (in SBC) and primary-like-with-notch (in GBC) peristimulus time histograms. We simulated the binaural tuning in the LSO with and without the bushy cell stage. The number of synaptic inputs to the LSO model was adjusted so that its ILD tuning remains unchanged between the input configurations. The binaural phase response that simulated the envelope ITD coding in LSO became more sharply tuned by adding the bushy cells stage. Furthermore, the LSO model can sense binaural envelope phase differences to higher modulation frequencies up to around 600 Hz, matching previously observed physiological limits. The sharpening of binaural response in LSO was underpinned by a deeper tuning trough and a higher tuning peak contributed by the GBC and SBC stages, respectively. Overall, our simulation results demonstrate the functional benefit of having bushy cells in the binaural sound localization circuit. Coding of transient sounds such as clicks will also be discussed in the presentation.

14:00-22:00 Social Event

City Excursion & External Dinner & Music Performance