Modeling normal and impaired hearing with deep neural networks optimized for ecological tasks
ABSTRACT. Computational models that perform real-world hearing tasks using cochlear input could help link the peripheral effects of hearing loss to real-world perceptual consequences. Deep artificial neural networks, optimized separately for sound localization and recognition tasks, have been shown to account for many aspects of normal hearing behavior. Here, we extend this approach to model the behavioral consequences of hearing loss using a network jointly optimized for multiple tasks.
We trained a deep artificial neural network to localize and recognize speech, voices, and environmental sounds using simulated auditory nerve representations of naturalistic auditory scenes. Once trained, we compared the model’s spatial hearing and speech recognition performance to that of humans. We also measured the model’s psychoacoustic thresholds by training linear classifiers to make binary judgments using the model’s learned features. These classifiers are intended to represent the decision rules that humans use to perform simple hearing tests, relying on relatively fixed internal representations that were plausibly optimized for ecological tasks over longer timescales. To investigate the perceptual consequences of hearing loss, we altered the model’s peripheral input (simulating damage to hair cells and auditory nerve fibers) and measured the impact on behavior.
When equipped with healthy cochleae, the model accounted for several aspects of binaural speech perception in humans with normal hearing, reproducing the effects of noise, reverberation, and spatial separation between speech and noise. When healthy cochleae were replaced with damaged cochleae, the model’s performance resembled that of humans with hearing loss: speech recognition deteriorated (especially at low SNRs) and spatial release from masking was reduced. Psychoacoustic thresholds measured from the model also reproduced patterns of normal and impaired human hearing. Despite never being fit to human data, the model exhibited human-like processing of amplitude modulation and binaural unmasking. Simulations of plausible and idealized hearing loss phenotypes suggest that both outer hair cell and auditory nerve fiber loss contribute to real-world hearing difficulties, with each also producing distinct behavioral outcomes in psychoacoustic listening tests.
The results provide a normative account for fundamental aspects of human hearing, suggesting phenomena like spatial release from masking and modulation frequency selectivity can be understood as consequences of optimization for ecological tasks. Machine-learning-based models that generate behavior from simulated auditory nerve input can predict aspects of hearing-impaired behavior and may help disentangle the perceptual consequences of different types of hearing loss.
Physiological Modeling of Detection of Spectro-temporal Modulation in Listeners with and without Hearing Loss
ABSTRACT. Psychophysical studies have shown that thresholds for detection of spectro-temporal modulation (STM) predict speech intelligibility (e.g., Bernstein et al., JAAA, 2013, TiH 2016; Mehraei et al., JASA, 2014). Recently, STM tasks have been used to develop a clinical test for suprathreshold auditory-contrast perception (Zaar et al., Hear Res, 2023, 2024; Zaar/Simonsen et al., Hear. Res., 2024). These studies provide threshold datasets for listeners with normal hearing or mild-to-moderate sensorineural hearing loss for several STM conditions. Here, we used subcortical physiological models to explore the cues and neural mechanisms that underlie performance in STM tasks. Previous physiological models have focused on temporal fine structure coding in normal hearing at the level of the auditory nerve (AN) (Magits et al., Hear Res, 2019).
The Zilany et al. (2014) computational model for AN responses and the same-frequency inhibitory-excitatory models for amplitude-modulation sensitivity in inferior colliculus (IC) responses (Nelson & Carney, JASA, 2004; Carney et al., eNeuro, 2015) were used here. A model for pitch coding in the IC based on neural fluctuations (Carney, ISH, 2022) was used to test the hypothesis that changes in pitch associated with upward or downward frequency sweeps in STM stimuli can explain performance across the set of STM conditions that was used in Zaar et al. (2023). Individual responses for test subjects with hearing loss were simulated by incorporating audiometric thresholds into the peripheral model. We hypothesized that audiogram information alone would not account for the suprathreshold deficits of these listeners, as revealed by their performance in STM tasks. Rather, we hypothesized that the impact of sensorineural hearing loss on peripheral responses would interact with central processing of acoustic cues to explain thresholds.
The test stimuli were all temporally modulated at 4 Hz but varied in spectral modulation frequency and in the carrier signals (either noises with different bandwidths or complex tones). Model thresholds based on dynamic pitch cues were consistent with general trends in mean STM thresholds for a group of listeners with a range of hearing losses.
Performance on STM tasks is correlated with speech intelligibility. Therefore, a better understanding of the mechanisms underlying performance in STM tasks could improve predictions of speech intelligibility. The long-term goal of these physiological modeling studies of STM stimuli is to improve predictions of speech intelligibility for individual listeners, including those with hearing loss (Zaar & Carney, Hear Res, 2022).
Supported by NIH-R01-010813, University of Rochester, and Eriksholm Research Centre.
Binaural sensitivity in listeners with a history of concussion
ABSTRACT. Over the past ten years, remote testing in hearing science and hearing healthcare has gone from a dream to a reality (Peng et al, 2022). During that time, we have developed, validated, and deployed a free platform for psychophysics called Portable Automated Rapid Testing (PART; Gallun et al., 2018; Larrea-Mancera et al., 2020; 2021). PART includes a test of binaural sensitivity based on the detection of Diotic and Dichotic (antiphase at the two ears) frequency modulation (FM) that is ready for implementation in a clinical setting. Here, we used the FM test to examine binaural sensitivity in people with a history of concussion, a patient population that often reports auditory difficulties that are not well represented by standard audiometric tests.
Testing is ongoing, but already 98 participants have been tested, half with and half without a history of concussion (CH and NoCH), with a goal of 150 total. Participants vary in age from 18-75, with hearing thresholds ranging from normal to mild impairment. Tests were conducted on an iPad running PART with Sennheiser 280 Pro headphones. An “adaptive scan” algorithm that we developed (Larrea-Mancera et al., 2023) was paired with four-interval, two-cue, two-alternative forced-choice task to measure the smallest amount of diotic and dichotic FM detectable in a 400-ms tone burst with a center frequency that roved between 460 and 540 Hz.
Diotic FM thresholds were greater than dichotic for nearly all listeners, indicating sensitivity to binaural phase differences. For the NoCH group, there was a significant correlation with age, with younger listeners experiencing greater binaural sensitivity. For the CH group, this correlation was absent, indicating reduced sensitivity for younger CH participants.
The hearing complaints of people with a history of concussion are an underappreciated aspect of brain injury. These data support our previous findings that there are individuals in both groups for whom binaural sensitivity is poor, and yet little is known about how these thresholds relate to hearing in daily life. The availability of remote testing solutions like PART can lead to improvements in characterization of individual hearing abilities and thus improved hearing healthcare for those individuals who may now be dismissed as having “normal hearing”, despite their complaints of difficulties in complex listening environments.
Deciphering Auditory Hyperexcitability in Otogl Mutant Mice Unravels an Auditory Neuropathy Mechanism
ABSTRACT. Auditory neuropathies form a large spectrum of congenital and acquired disorders, mostly affecting the spiral ganglion neurons of the auditory nerve and/or the synapses they form with the auditory sensory hair cells, thus distorting the sound information transmitted from the cochlea to the brain. Affected patients have hearing dysfunctions characterised by major difficulties in understanding speech not accounted for by auditory thresholds. Deciphering the underlying pathophysiological mechanisms remains challenging owing to the diversity of spiral ganglion neuron subtypes and associated central auditory circuits. Here, we report an auditory neuropathy mechanism unraveled by investigating the origin of auditory hyperexcitability in a mouse model for hereditary congenital deafness. We screened mutant mouse lines for susceptibility to audiogenic seizures — reflex seizures induced by loud sounds — indicative of an abnormal control of auditory excitability. We found that mice carrying mutations of the gene encoding otogelin-like, a large protein related to secreted epithelial mucins implicated in hair bundle top-connectors and the crown attaching outer hair cells to the tectorial membrane, are highly susceptible to audiogenic seizures. Both homozygous Otogl-/- mutant mice with moderate-to-severe hearing loss and heterozygous Otogl+/- mutant mice with normal hearing thresholds are susceptible to audiogenic seizures. Otogl+/- mutant mice are, thus, an attractive model for investigating auditory excitability mechanisms. We show that Otogl is transiently expressed in a subpopulation of spiral ganglion neurons during cochlear development. Despite their apparently normal hearing, Otogl+/- mice display poor activation of the spiral ganglion neurons processing loud sounds, the so called low spontaneous rate neurons, and an elevation of the activation threshold of the middle ear muscle reflex that attenuates loud sounds. Our findings reveal how a neuropathy affecting spiral ganglion neurons specialised in loud sound processing and associated with the middle ear muscle reflex can manifest itself as auditory hyperexcitability.
Cochlear frequency selectivity and extended high frequency hearing in individuals with normal audiograms
ABSTRACT. The role of frequencies above 8 kHz (extended high frequencies, EHFs) in human hearing remains poorly understood. While EHF hearing loss is prevalent among young adults, it’s perceptual and physiological implications are unclear. Despite clinically normal audiograms, some evidence suggests that poorer EHF hearing may indicate subclinical cochlear damage. Frequency selectivity is a fundamental aspect of cochlear function. A previous study found that reduced EHF hearing is linked to broader auditory filters, but it is unclear if this effect is limited to 2 kHz or extends to other frequency regions. Moreover, the simultaneous masking method used in that study may have overestimated filter bandwidths compared to forward masking paradigms. Stimulus frequency otoacoustic emission (SFOAE) delays provide a non-invasive method to assess cochlear tuning. SFOAE estimates are consistent with psychophysical measures obtained using forward masking. The present study aimed to investigate the relationship between EHF hearing acuity and cochlear frequency selectivity in lower frequencies. We hypothesized that EHF hearing damage, even with normal audiograms, is associated with broader cochlear tuning. Broadened cochlear tuning can have significant implications for the neural processing of speech.
SFOAEs are non-stationary time-varying signals, and a spectral-based approach to characterize its delay may not accurately capture the characteristic place and could contribute to high individual variability in delay estimation. To address this issue, we applied time-frequency analysis using the Stockwell transform. We validated this approach on the gammatone model and simulated SFOAEs. Swept-tone evoked SFOAEs at 40 dB probe for 0.5 to 4 kHz were measured using a suppressor-based paradigm from 40 young, healthy adults (19–30-years-old) with clinically normal audiograms (≤20 dB HL; 0.25-8 kHz). The sharpness of cochlear tuning (QERB) was computed using an empirical model. We also measured wideband absorbance to account for potential middle-ear contributions. Linear mixed-effects models were applied to analyze the relationship between hearing thresholds at EHFs and QERB while adjusting for the effects of other confounding variables, such as hearing thresholds in the 0.25 to 8 kHz range.
Preliminary analysis reveals an association between EHF hearing thresholds and QERB estimates. Furthermore, SFOAE estimates of QERB were consistent with forward-masking tuning curves, as reported in the literature. The results were discussed in the context of frequency effects and EHF hearing thresholds.
Findings have implications for understanding subclinical auditory damage and potentially explaining listening difficulties in listeners with normal audiograms.
Binaural temporal fine structure sensitivity development for children with developmental dyslexia
ABSTRACT. Speech-in-noise perception is known to mature over the first 10 – 12 years of life. In this age range, children with language and/or reading difficulties have been reported to experience poor speech-in-noise perception compared with controls. However, the underlying aetiology for this finding is debated. Binaural Temporal Fine Structure sensitivity (bTFSs) is known to be beneficial for attending to sound sources in challenging environments. For young normal-hearing adults (YHNA), the upper frequency limit of bTFSs is known to be around 1400 Hz. Research has found the upper frequency limit of bTFSs to be significantly lower (worse) for typically developing children than for YHNA, with age being a significant predictor of the upper limit. If poor bTFSs contributes to impaired speech-in-noise perception in dyslexia (DYS), poorer bTFSs would be expected in DYS. In contrast, the Temporal Sampling (TS) theory of developmental dyslexia predicts that the perception of bTFS of speech may be preserved in children with dyslexia. By TS theory, reduced sensitivity to low-frequency envelope modulations is the core auditory impairment regarding DYS.
Binaural TFS sensitivity was measured using the Temporal Fine Structure-Adaptive Frequency (TFS-AF) test with 88 children aged 7-9.5 years (30 age-matched [CA], 20 male and 58 DYS, 31 male). Using a 2-up 1-down paradigm, the highest frequency at which interaural phase differences (IPD) of 30 degrees and 180 degrees could be distinguished from an IPD of 0 degrees was assessed.
An LME model revealed no effect of group (F[1,44] = 0.18, p= .68), a significant effect of phase, with 30 degrees lower than 180 degrees (F[1,44] = 214.83, p<.001), and no phase by group interaction (F[1,44] = 0.04, p= .84).
These results suggest that development of bTFSs is similar for DYS and CA children. Hence, the developmental pattern of bTFSs was supported, with the upper frequency limit of bTFSs in children compared to YNHA being significantly lower (p < .001) for both levels of phase difference tested (30 degrees and 180 degrees). A smaller frequency range of bTFSs may limit the benefit gained from spectral release from masking contributing to the known speech-in-noise deficit found in children when compared with adults. However, bTFSs was not found to be additionally impaired in DYS, supporting TS theory.
Temporal fine structure sensitivity for diotic and dichotic pulse-spreading harmonic complexes
ABSTRACT. In a previous study, pulse-spreading harmonic complexes (PSHCs) were used to measure the lowest envelope rate at which normal hearing participants could discriminate between two versions of this stimulus that differed in their temporal fine structure (TFS). The two versions produced, when discriminable, rising- and falling-pitch sensations, respectively (Macherey, 2024). This lower envelope rate limit was found to increase with increases in frequency region and it was proposed that discrimination of TFS was either limited by reduced excitation pattern cues or by the frequency-dependent reduced ability of the system to process long time intervals between spikes. Here, this envelope rate limit was measured for PSHCs filtered between 500 and 675 Hz in a group of 10 NH participants in several diotic and dichotic conditions and in the presence of background noise. In the ODD and EVEN conditions, respectively, the odd and even harmonics of a 2-Hz PSHC were presented diotically. In the DICHOTIC condition, the ODD harmonics were presented to one ear and the EVEN to the other ear, thereby yielding an aggregate envelope rate twice that in each ear. The lower envelope rate limit for TFS sensitivity was significantly lower for DICHOTIC than for ODD and EVEN. Given the similarity of the excitation patterns produced by ODD and EVEN, these results suggest that in this frequency region, TFS sensitivity can be mediated by temporal cues and that the underlying neural mechanism is driven by binaural inputs. In addition, imposing a long (125-ms) delay between the ODD and EVEN signals in the DICHOTIC-DELAYED condition had a dramatic effect on performance which became worse than for ODD or EVEN alone. This shows that the advantage observed for DICHOTIC is not simply due to the independent combination of information coming from the two ears. Finally, the dichotic advantage was only observed for PSHCs filtered below 1000 Hz, i.e. in frequency regions where ITD sensitivity for pure tones, which relies on phase locking information, is possible. It is argued that, in these low frequency regions, TFS sensitivity is limited by the inability of the auditory system to process long time intervals; by presenting EVEN and ODD in each ear, the aggregate envelope rate is doubled and performance improves. Model simulations and implications for theories of pitch perception will also be discussed.
Macherey O. (2024). Temporal fine structure sensitivity measured with pulse-spreading harmonic complexes. J. Acoust. Soc. Am., 156, 1769–1781.
Pitch discrimination of masked pure tones: a glance at the temporal pitch code?
ABSTRACT. INTRODUCTION
Pitch is an ubiquitous phenomenon, mediating various auditory tasks. Surprisingly, one basic question is still not clarified. Namely, to what extent temporal and place cues contribute to pitch discrimination of a pure tone, the building block of complex stimuli. This has practical relevance regarding pitch coding with cochlear implants, where the place code is crude and the temporal code, by means of the per-electrode pulse rate, is subjected to a rate limitation.
METHODS
To address this question for the low-frequency region, seven normal-hearing listeners pitch-discriminated a 300-Hz, 400-ms pure-tone target (T) using an adaptive 2I-2AFC paradigm with and without a masker (M) stimulus. M was a 1000-ms harmonic complex, temporally and spectrally centered around T. Both T and M were gated with 100-ms ramps. To study the potential involvement of cues resulting from combining T and M, the M’s F0 was randomly roved between intervals or fixed. Besides the harmonic condition (Harm), an inharmonic (Inharm) condition involving slight component frequency variation was employed. The main parameter was Q, determining M’s spectral density and, therefore, the availability of place-pitch cues. The component level of M was kept constant at the level of T, i.e., 60 dB SPL. A low-frequency pink noise was added to avoid audibility of aural distortion products.
RESULTS
When adding a Harm M, difference limens for frequency (DLFs) were elevated for the lowest Q, further increased for mid Qs and strongly decreased towards high Qs, approaching DLFs in quiet. F0 roving had minimal effect on converging DLFs, yet, it increased the likelihood of non-converging adaptive tracks in some listeners. Condition Inharm showed a pattern similar to condition Harm for low Qs, but DLFs were unmeasurable for high Qs.
DISCUSSION
The results with/without M F0 roving suggest that listeners did not rely on cues resulting from T/M combination, but where sometimes distracted by the random M pitch. At low Qs, where both place and temporal cues are potentially available, based on a cochlear model, the elevation of DLFs suggests that listeners evaluated/pooled temporal cues across a broader cochlear region. For high Qs, listeners obviously perfectly exploited the temporal cues available in the dips of the Harm M, while this was not possible with the Inharm M. Overall, DLFs of a 300-Hz pure tone appear to be based on a broadly tuned temporal code, encouraging attempts to convey temporal pitch cues with cochlear implants at least at low rates.
Decoding Low-Frequency Auditory Processing in Tinnitus: A Focus on Temporal Fine Structure and Envelope Cues
ABSTRACT. In a recent study, individuals with tinnitus demonstrated better speech-in-noise (SPIN) intelligibility when only low-frequency (LF) information was available, compared to a control group without tinnitus (Devolder et al., 2024). However, the mechanisms underlying this effect were not explored, as it was an unexpected epiphenomenon in a study primarily focused on high-frequency (HF) temporal envelope (ENV) processing in tinnitus. Given the robustness of the finding, the present study delves deeper into LF encoding in tinnitus, with a focus on temporal fine structure (TFS) and ENV processing, to better understand auditory processing differences in tinnitus, particularly in challenging listening environments. We hypothesize that enhanced LF SPIN performance in individuals with tinnitus is linked to compensatory changes in TFS processing.
To achieve a comprehensive view, we conduct a combination of standard clinical audiometric measures, tinnitus analyses, and a robust battery of behavioral and objective auditory processing assessments:
Behavioral assessments:
1. Sentence recognition in quiet and noise (matrix test) under low-pass, broadband, and high-pass filtered conditions.
2. Psychoacoustic discrimination testing of LF syllable contrasts (e.g., /du/ vs. /bu/ and /o/ vs. /u/) in both quiet and noisy conditions.
Objective assessments:
1. EEG recordings using LF syllables presented in quiet and noise.
2. Brainstem TFS coding assessment using frequency-following responses to a LF harmonic stimulus. EEG responses are analyzed for their ENV and TFS cue tracking through polarity-based sum and difference analyses.
3. HF ENV encoding assessment using rectangular amplitude-modulated envelope-following responses to evaluate cochlear synaptopathy deficits.
Preliminary data from 28 participants with tinnitus confirm a link between LF SPIN recognition and tinnitus. EEG findings suggest a potential reduction in TFS processing in quiet conditions, while ENV encoding remains intact. In noisy conditions, ENV encoding appears primarily reduced in the tinnitus group.
We will present results from a total of 60 participants with tinnitus, compared to an age-matched control group, and conduct subgroup analyses to investigate the effects of age and hearing status. These findings will elucidate the roles of TFS and ENV processing in tinnitus-related auditory function and provide critical insights into compensatory auditory mechanisms, informing targeted treatments for this population.
Work supported by FWO G068621N Audimod
Devolder, P., Keppler, H., Keshishzadeh, S., Taghon, B., Dhooge, I., & Verhulst, S. (2024). The role of hidden hearing loss in tinnitus: insights from early markers of peripheral hearing damage. Hearing Research, 109050.
Neuronal correlates of auditory working memory maintenance during distraction
ABSTRACT. The ability to maintain relevant auditory information in working memory (WM) amidst multiple auditory distractors is critical for goal-directed auditory cognition. Auditory WM has traditionally been attributed to subvocal rehearsal within the phonological loop, mediated by a left-lateralized brain network involving inferior frontal, parietal, and temporal regions. However, the specific mechanisms by which this loop supports auditory verbal processing alongside WM maintenance remain poorly understood.
We employed a dichotic-listening and articulatory-suppression task to interfere with the WM maintenance of parametrically varied ripple sounds in 21 healthy adults, while recording magnetoencephalography (MEG) data. During the maintenance phase of a delayed match-to-sample task, dichotic syllables (/ba/ and /da/) were presented. Participants were asked to either repeat both syllables (Speech Shadowing, SS) or selectively repeat only the /ba/ syllables (Selective Speech Shadowing, SSS), ignoring syllables presented to the opposite ear. Vocal responses were captured using an MEG-compatible microphone. Behavioral performance was analyzed to identify the cognitive strategies supporting WM retention during auditory distraction. MEG data were preprocessed using MNE-Python software, with head position corrected to the initial position using continuous head position indicator correction. Evoked responses, functional connectivity patterns and source estimates were calculated. WM content was decoded from MEG source estimates and functional connectivity patterns involving auditory-cognitive pathways using a support vector machine classifier.
Increased attentional demands during the SSS task were reflected in slower reaction times to WM probes, compared to the SS condition, and were accompanied by enhanced MEG functional connectivity between the inferior frontal, supramarginal, and auditory cortices. Furthermore, we successfully classified the content of auditory WM above chance using both MEG functional connectivity and spatiotemporal activation patterns in left frontoparietal-superior temporal cortex networks.
Our results suggest that frontoparietal-superior temporal cortex pathways can maintain information in auditory WM, even when active rehearsal is distracted by competing speech and listening tasks.
Entraining alpha oscillations to facilitate auditory working memory: A TMS-EEG study
ABSTRACT. Neural alpha oscillations may support working memory during speech perception. Specifically, alpha oscillations in parietal cortex may promote inhibition of distracting sounds, whereas alpha oscillations in temporal cortex may enhance attention to target sounds. Importantly, alpha activity during speech perception may be different for younger and older adults. We hypothesised that entraining alpha activity in parietal and temporal cortex may facilitate speech perception. We investigated whether TMS delivered at an individualised alpha frequency (alpha-TMS) benefits auditory working memory, and how this may be affected by age.
Sixty-four participants: 32 younger adults (M age = 20.78) and 32 older adults (M age = 68.38), completed trials of an auditory working memory task. Participants attended to and recalled 9-digit sequences, whilst ignoring irrelevant sentences. Before the to-be-ignored sentences, participants received alpha-TMS. We investigated the effects of 1) distractibility of irrelevant sentences (less vs. more distracting); 2) site of alpha-TMS (vertex vs. parietal vs. temporal); 3) age group (younger vs. older), on digit recall and alpha power in auditory and parietal regions.
For digit recall, we observed that across all alpha-TMS conditions and age groups, digit recall was significantly reduced in trials with more distracting to-be-ignored sentences. For alpha power, recorded from both the temporal regions and parietal regions, there was a significant main effect of TMS location. This effect showed that alpha power recorded from auditory regions was significantly increased when alpha-TMS was delivered to temporal cortex, compared to alpha-TMS delivered to the vertex (control) and parietal cortex. Similarly, alpha power recorded from parietal regions was significantly increased when alpha-TMS was delivered to the parietal site, compared to alpha-TMS delivered to the vertex and temporal sites. Further, older adults displayed significantly higher alpha power, recorded from both auditory and parietal cortex, across all experimental conditions. Interestingly, the strength of parietal alpha power after parietal alpha-TMS, was significantly higher in trials with more distracting to-be-ignored sentences compared to less distracting sentences in older adults.
These data indicate that alpha-TMS can modulate parietal and auditory alpha power, which may influence inhibitory and attentional processes during auditory perception, which may be further modulated by age. However, effects were not reflected in behavioural performance. It is possible that age-related declines in auditory working memory were compensated for by TMS-modulated alpha power.
Functional organization of multisensory information in the primate auditory cortex
ABSTRACT. Auditory perception can be modulated by other sensory stimuli. However, we do not fully understand the neural mechanisms that support multisensory behavior and how this information is functionally organized. Here, we recorded spiking activity from the primary auditory cortex (A1) in non-human primates, while they detected a target vocalization that was embedded in a background chorus of vocalizations. We found that a congruent video of a monkey eliciting a vocalization, improved behavior, relative to their performance when we presented a static image. As a lever for the functional organization of multisensory information, we compared the contribution of neurons with significant spectrotemporal response fields (STRFs) with those that had non-significant STRFs (nSTRFs). Indeed, based on spike waveform shape and functional connectivity, STRF and nSTRF neurons appeared to belong to different neural classes. Consistent with this, we found that, at both the single-neuron and population level (via a targeted dimensionality reduction of neural trajectories), that nSTRF neurons were modulated more by the visual trial conditions (congruent video versus static image) than STRF neurons. Together, our results are the first to demonstrate how functional information relating to multisensory behavior is organized in the primate A1, to identify a differential contribution of nSTRFs to behavior, and to demonstrate that task-related information in the primate A1 is encoded as a structured dynamic process in the neural population.
Reinterpreting the frequency-amplitude dependence of late cortical auditory-evoked potentials in the light of tonotopic and microstructural MRI mapping data
ABSTRACT. Cortical auditory-evoked potentials (CAEPs) may hold crucial information about auditory deficits, such as hidden hearing loss or tinnitus, that are inaccessible to current diagnostic tests. However, unlike their subcortical counterpart, the auditory brainstem response, CAEPs have so far failed to acquire any significant clinical role. This may be, because CAEPs represent aggregate responses from multiple closely spaced sources, which may reinforce or cancel depending on local cortical surface morphology, thus creating inter-individual variation in CAEP amplitude unrelated to underlying neuronal activation strength. This study presents a first step towards understanding inter-individual variation in CAEP amplitude by relating one of its main factors – stimulus frequency – to the functional topography of the human unimodal auditory cortical region, derived from state-of-the art tonotopic and microstructural MRI mapping data.
We first conducted a meta-analysis of existing EEG and MEG data to compare the frequency-amplitude dependence of the earliest CAEP components, the so-called middle-latency response (MLR), reflecting direct thalamocortical input to the primary auditory region, with that of subsequent, or later components, reflecting transmission between cortical regions. We then used a forward model of CAEPs, based on realistic cortical surface morphology and a probabilistic atlas of human auditory cortical organization to propose a mechanistic interpretation of the meta-analytic results.
The shape of the frequency-amplitude dependence of CAEPs differed markedly between earlier and later CAEP components, with the later components showing a substantial decrease in amplitude with increasing frequency above 1 kHz, but the earlier components showing no such decrease. The forward model suggested that the decrease is caused as a result concurrent activation of two mirror-symmetric tonotopic maps located on opposite banks of Heschl’s gyrus (HG), creating partially opposing dipolar fields and thus smaller aggregate responses at the sensory level.
Our results indicate that CAEP amplitudes are crucially shaped by local cortical surface morphology, suggesting that individual interpretation of such amplitudes, for instance, in a diagnostic context, will require consideration of idiosyncratic morphological characteristics. Our results further suggest that only the anterior of the two mirror-symmetric tonotopic maps on HG receives direct thalamic input and is thus homologous to A1, consistent with microstructural MRI data showing higher intracortical myelination in this map.
Across-frequency interactions in the central auditory system: Insights from psychophysics of spectro-temporal modulation discrimination
ABSTRACT. Most current models of the auditory system treat the processing of spectral and temporal
dimensions of sound as independent. Among these, a common family of models utilizes
cochlear filters to decompose auditory signals into frequency bands, followed by modulation
filtering to process the temporal information within each band. However, animal and human
vocalizations exhibit spectro-temporally oriented patterns of energy, such as formant
transitions. Electrophysiological studies have further revealed central auditory neurons that
are sensitive to specific spectro-temporal directions, indicating a form of non-separable
processing. These findings suggest that the auditory system may have specialized mechanisms
to integrate temporal information across frequency bands, which is essential for forming
auditory objects and enabling robust speech perception.
Yet, very little is known about the conditions under which the auditory system relies on a
separable vs. non-separable machinery for processing auditory information. It was found that
for detecting upward (downward) spectro-temporal modulation (STM) in “modulation noise”
(https://doi.org/10.1177/2331216520978029), listeners rely on non-directional filters, i.e.
monitoring both upward and downward STMs similarly. This finding points towards a default
perceptual machinery that implements a separable processing strategy. However, humans are
able to discriminate upward from downward STMs, suggesting some degree of flexibility in
integrating the spectral and temporal dimensions of sounds.
Three psychophysical experiments were conducted with a group of normal-hearing listeners
to better characterize the mechanisms underlying STM discrimination (upward vs. downward),
and its variability across individuals in different contexts. Experiment #1 characterized the
build-up in sensitivity for upward vs. downward STM discrimination as a function of STM
duration (at full modulation depth) with “modulation noise” at different SNRs. Experiment #2
measured STM discrimination thresholds by varying the scale (density of modulations on the
spectral dimension) using an adaptive procedure, allowing to directly derive the cut-off value
of modulation phase sensitivity (https://doi.org/10.1121/1.5098770). Experiment #3 aimed to
simulate more ecologically valid listening scenarios to characterize the ‘sluggishness’ of the
auditory system in discriminating STMs upward-downward transitions over time (i) when the
carrier also varies in center frequency and (ii) for modulation noises with different degrees of
asymmetry between the quadrants for upward vs. downward STMs in the modulation power
spectrum.
The results of these experiments provide new insights into the mechanisms underlying
spectro-temporal integration in the central auditory system. We discuss how these findings
could refine current computational models of auditory processing and inform our
understanding of the individual variability in speech perception, particularly in noisy
environments.
Task-evoked pupil dilations predict latent state transition probability across spatial and temporal auditory domains
ABSTRACT. Evoked pupil dilations (EPD) have been found to correlate with processes of perceptual belief updating, likely indicating an involvement of the locus coeruleus noradrenaline arousal system. How exactly arousal modulates belief updating remains subject of debate and results vary over different domains, modalities and tasks in existing literature. In the present study we examined in an auditory latent state discrimination task whether EPDs represent a possible mechanism that is generalized over the temporal and spatial domain, rather than specific to a certain domain. We further aimed to disentangle several computational variables found in literature and their relationship to EPDs.
Two groups of participants, 49 in total, were presented with auditory sequences providing evidences for two distinct latent states: acceleration and deceleration in the temporal group and clockwise and counterclockwise movement in the spatial group. Sporadic change points within the sequences marked changes in underlying latent state. Participants were tasked with infering and tracking latent states and reporting the last current state of the sequence in a two-alternative forced-choice design. Sequences were randomized in length to ensure constant attention. Evidence levels for temporal or spatial change were sampled per stimulus to induce continuous change and presented at a rapid pace between 200 and 1200 ms stimulus onset asynchrony. To extract per-stimulus estimates of EPDs despite the experiments’ rapid time-scale and the sluggish pupillary response we fitted a deconvolution-based general linear model to the continuous pupil traces with a free gain parameter reflecting pupil dilation. We fitted a Bayesian observer model to participants’ answers to estimate several computational variables per stimulus in every trial: surprise, prior uncertainty, info gain and latent state transition probability. We regressed each computational variable with a mixed effects model against EPDs, accounting for possible differences between the spatial and temporal group.
All four computational variables significantly predicted EPDs in both domains. Bayesian model comparisons significantly favoured latent state transition probability as computational variable and simpler models without differences in domain against more complex models accounting for possible effects of domain.
The present results suggest that domain has little effect on EPDs, pointing towards a generalized arousal-mediated mechanism of perceptual inference within the auditory modality. The results further indicate that EPDs are more sensitive to the task-relevant change in latent state than to prediction errors, information gain or uncertainties in beliefs, although they are highly correlated. Here, further research with task designs clearly separating functional variables is needed.
Using Envelope Following Responses to Characterize Human Neural Tuning to Auditory Temporal Modulations
ABSTRACT. The human brain's capacity to encode temporal envelope information in auditory
signals is crucial for processing natural sounds such as speech. Temporal envelope modulation
is processed by distinct neural substrates, ranging from the auditory periphery up to the cortex
Notably, tuning for temporal modulations at the brainstem level becomes progressively
sharper during infancy and then declines with aging, making it essential to develop tools for
efficient characterization of this tuning. However, psychophysical measurements, often
influenced by attentional and cognitive factors, do not provide a direct, transparent
characterization of sensory neural coding. Interestingly, envelope following responses (EFRs)
offers an objective approach to characterizing neural phase-locking to stimulus envelopes.
Recent works suggest the use of EFRs to sinusoidally amplitude-modulated (SAM) tones to
effectively assess neural tuning across a large range of modulation frequencies, from ~10 to
1000 Hz (https://doi.org/10.1038/s42003-024-07187-1).
In the present study, we examined the potential of EFRs elicited by rectangular amplitude-
modulated (RAM) tones, which are shown to produce strong neural responses, to uncover
human neural tuning for temporal modulations at the brainstem level specifically.
We initiated our study with computational simulations, considering a model of the human
cochlea and auditory nerve fibers connected to a population of bandpass filters mimicking cells
from the inferior colliculus (IC), with best modulation frequencies distributed normally over a
50-200 Hz, to simulate EFRs to RAM tones. The summed magnitude of the first five harmonics
of the response was taken as a measure of response strength. Simulated EFRs to RAM tones
across various modulation frequencies revealed a bandpass behavior, directly related to the
tuning of IC cells in the model. We then deployed this approach empirically by measuring EFRs
to RAM tones covering of 70-160 Hz modulation frequency range on a group of N=15 young
normal-hearing individuals (yNH).
Results show that RAM-EFRs indeed led to reliable measurements of neural tuning to temporal
modulations. Surprisingly, our data highlighted a large amount of variability across yNH
individuals. While at the individual level some individuals showed bandpass tuning, there was
at the group level a rather flat EFR strength as a function of modulation frequency. These
findings will be discussed in the context of prior studies on SAM- and RAM-EFRs in human, as
well as recent RAM-EFR research in chinchilla (Farhadi et al., 2025, ISH meeting), an animal
model with strong similarities to human hearing frequency and intensity sensitivity.
Auditory adaptation to source spectrum improves localization along sagittal planes
ABSTRACT. The acoustic filtering of the sound field by the listener generates direction-dependent spectral-shape cues that enable sound source localization along sagittal planes, including elevation perception and front-back discrimination. However, sound sources themselves produce spectral cues that can interfere with these direction-dependent cues. Although the auditory system may struggle to dissociate these two types of spectral cues, it is widely assumed that listeners cannot mitigate this interference through adaptation.
We tested this assumption by having listeners localize ripple-spectrum sounds in an acoustic free-field setting. The ripple shape was either fixed or randomized within blocks. Our results show that listeners performed better under fixed conditions than randomized ones—not immediately, but later within blocks, after they had the opportunity to establish a reliable estimate of the source spectrum. These findings contradict the common assumption, suggesting that listeners adapt to the source spectrum and utilize this non-spatial information to enhance their localization performance along sagittal planes. However, this adaptation is incomplete, as performance did not reach the levels observed with flat-spectrum sounds. The limitations of this adaptation remain to be investigated.
ABSTRACT. We aim to understand the dynamics underlying auditory working memory for maintaining 'simple' tones. We recorded magnetoencephalography (MEG) in 17 subjects while they maintained one of the two presented tones (or ignore both in the control condition). After 12s, subjects compared the pitch of a test tone with the maintained tone.
Analysis of evoked responses showed persistent activity throughout maintenance compared to the pre-stimulus silent baseline but only at the start of maintenance when compared to the control condition. The evoked response during maintenance was source localised against baseline to bilateral auditory cortex. Analysis of induced responses showed suppressed alpha in the left auditory cortex, enhanced theta in medial prefrontal cortex, and enhanced beta in cerebellum.
In a second experiment, 19 new subjects were presented with a tone and a Gabor patch and a retro-cue indicating whether to maintain auditory or visual information for 12s. Analysis of the induced responses in auditory condition yielded results similar to those observed in the first experiment.
Connectivity analysis showed that the theta activity in medial prefrontal was phase-locked to activity in the left hippocampus and left auditory cortex. The beta activity in cerebellum was phase-locked to left Inferior Frontal Gyrus (IFG) activity and correlated to subject’s task accuracy.
Using MVPA, a LDA classifier was trained to decode the contents of AWM (discriminate b/w low vs high pitched tone) using beta band Phase Locking Value with right cerebellum as its features. A channel searchlight analysis showed that decoder performance at right Anterior Cingulate Gyrus (56.17% acc.) was above chance. Further, the decoder performance at Right STG & MTG was correlated (rho=0.725, p<0.01) to subject’s task accuracy, showing a correspondence between encoding distance and behavioural performance.
Our data clearly shows a network of brain areas involving pre-frontal and hippocampus, IFG and cerebellum for maintaining sounds in the auditory cortex, consistent with previous fMRI and ECoG experiments [1, 2].
[1] Kumar, et al., J Neurosci, 2016
[2] Kumar, et al., Neuropsychologia, 2021
The role of age-related changes in alpha activity during dual-task speech perception and balance
ABSTRACT. Research suggests that older adults find it more difficult than younger adults to flexibly allocate attentional resources between two co-occurring multisensory tasks, such as perceiving speech-in-noise whilst maintaining balance. This attentional control may be reflected in oscillatory alpha activity, with increases in activity reflecting inhibition of different brain regions and decreases reflecting neural activation. However, there is limited research examining how alpha activity during dual-task conditions may change as a function of healthy ageing. This study aimed to investigate how younger and older adults reallocate attentional resources when perceiving speech-in-noise whilst maintaining balance, and how these age-related changes are reflected in alpha activity.
Nineteen younger adults (18-35 years old) and sixteen older adults (60-80 years old) were asked to identify words in audiovisual sentences extracted from the Grid corpus. Participants completed this speech perception task with or without background noise, whilst standing in an easy balance position (feet side-by-side) or a difficult balance position (feet in tandem). Throughout the task, fronto-central and parieto-occipital alpha activity was recorded using electroencephalography (EEG), to measure activation in brain regions associated with balance maintenance and audiovisual speech perception, respectively.
Mixed ANOVAs revealed that all participants produced a weaker speech perception performance in noisy listening conditions. However, speech perception in the noisy listening condition was most accurate when participants stood in a challenging balance position, in contrast to our hypotheses. Whilst these behavioural effects were not reflected by fluctuations in parieto-occipital alpha power, decreases in fronto-central alpha power were greater in clear listening conditions compared to noisy listening conditions.
Taken together, the results suggest that increasing cognitive load with a secondary multisensory task may not always be detrimental to balance maintenance in physically and cognitively fit older adults. The role of heterogenous ageing trajectories and 'cognitive reserve' in the flexible allocation of attentional resources are discussed as potential reasons behind the strong performance of the older adults.
Spatial release from masking and listening effort: differences between binaural and free-field presentation
ABSTRACT. Speech intelligibility degrades in complex and noisy environments, especially for individuals with hearing impairment. Spatial separation of speech from competing sources is well known to aid speech understanding (spatial release from masking, SRM).
Virtual Reality (VR) paradigms and technologies offer powerful tools to test and train individuals with or without hearing aids. Virtual audio rendering, delivered via loudspeakers or headphones, allows replicating real-life audio experiences, with headphone setups requiring Head-Related Transfer Functions (HRTFs) for accurate spatialisation. HRTFs are individual to each listener due to unique morphological features that affect the approaching sound waves. Binaural individualisation notably improves speech-in-noise (SIN) in the horizontal plane, but this benefit may be diminished with informational masking conditions. In the median plane, generic HRTFs struggle to replicate spectral monoaural cues, impacting SRM, especially in non-native speakers. Improvements in SIN emerge with perceptual training for energetic masking, but the topic still suffer from weak transferability of skills from binaural rendering to real life sound presentation, and poorly controlled studies.
Furthermore, questions about how increased listening difficulties (i.e., listening
effort (LE)) can impact SRM and speech recognition from binaural to free field presentations remain open.
To assess SRM performance and LE for various HRTF conditions, we assess intelligibility for different target-masker spatial locations following the paradigm proposed by Gonzalez-Toledo et al. (2024), which employed a speech-on-speech test using the Coordinate Response Measure (CRM) corpus. We compare loudspeakers rendering (with sources coming from either loudspeaker locations or phantom sources), and binaural presentations (individual vs. non-individual) using performance (identification accuracy) and LE (pupillometry) measures for different target-masker locations both on the median and horizontal plane. We hypothesise similar outcomes for free-field and individual HRTF conditions for all target-masker configurations. Non-individual HRTFs are likely to result in increased effort and reduced accuracy in the median plane due to difficulty replicating monaural spatial cues, while horizontal plane performance (based mainly on interaural differences) and effort are likely to similar for free-field and all binaural presentations.
Highlighting differences in task load for SRM between individual HRTFs and free-field presentations (i.e. via loudspeakers or phantom sources) could further clarify mechanisms of auditory perception in complex environments, filling a gap in the auditory and cognitive science literature. SRM and LE in virtual rendering may become crucial, as they can be employed in the future to assess the effectiveness of perceptual training in virtual environments, whether related to sound localisation or to speech intelligibility skills.
Investigating the effect of contact vs. non-contact sports on the brain's response to sound
ABSTRACT. Data indicates that sports-related concussions may cause damage to the auditory system or lead to difficulties perceiving speech in background noise. Understanding the effects of repetitive sub-concussive head impacts in contact sports remains limited, with no sensitive objective measure identified, or understanding of how sub-concussive impacts may affect auditory function. This study aims to investigate the impact of contact sports on the brain’s auditory neural responses in young adult athletes. This research will use 48 participants, divided into two groups: athletes engaged in contact sports and those in non-contact sports. Each participant will undergo two electroencephalography (EEG) recordings: 1) subcortical and 2) cortical, under two conditions: in quiet and in noise, using a 170ms speech syllable /da/ for both conditions, with a 6-multi talker background babble only for the speech in noise condition. Preliminary results from a repeated measures mixed two-way ANOVA on a sample of 26 participants (13 per group) revealed a significant main effect of group (F(1,23)=18.41, p<.001) in the cortical N100 responses, with contact athletes showing a reduced N100 amplitudes (M = -2.06µV, SD = 1.08) compared to non-contact athletes (M = -3.19µV, SD = 1.15). There was also a significant main effect of condition, (F(1,23)=19.43, p<.001), where both groups showed reduced N100 amplitudes in noise (M = -2.07µV, SD = 0.69) compared to quiet (M = -3.23µV, SD = 1.42). There was also a significant interaction between group and condition (F(1,23)=6.62, p=.017). Pairwise comparisons revealed that the group difference in N100 amplitudes between contact and non-contact athletes was only significant in the quiet condition, but not in the noise condition. Subcortical responses (F0 amplitudes) showed no significant group or condition differences at this stage. These initial findings suggest that cortical auditory processing may be more vulnerable to repetitive sub-concussive head impacts than subcortical responses, especially in the presence of speech in noise. This research aims to offers potential insights due to understanding the role in auditory function following head impacts and potentially working towards an objective marker of auditory processing changes due to contact sports, with implications for understanding player safety and the effects of sub-concussive impacts on auditory function.
Evidence of neural ensemble averaging in categorical-loudness-scaling measurements
ABSTRACT. In a loudness-matching paradigm, a reduction in the loudness of sounds with bandwidths less than one-half octave compared to a tone of equal sound pressure level has been observed previously for five-tone complexes at 60 dB SPL centered at 1 kHz. Here, this loudness-reduction phenomenon is explored using band-limited noise across wide ranges of frequency and level. These measurements test a model of loudness judgement that includes an intermediate stage of neural ensemble averaging (NEA).
Adult participants were recruited (N=98) with pure-tone average (PTA) thresholds that ranged from normal to moderate hearing loss. An equal-loudness contour (ELC) was estimated for each participant from 0.25-6 kHz using a categorical-loudness-scaling CLS paradigm. To improve test efficiency, the presentation level and center frequency of the band-limited test stimuli were determined according to a Bayesian adaptive algorithm designed, which enabled ELC estimation within about five minutes of testing. Three separate test conditions differed by stimulus type: (1) pure-tones, (2) quarter-octave noise and (3) octave noise. Loudness reduction was defined as a decrease in the equivalent loudness of a quarter-octave relative to a tone. Loudness summation was defined as an increase in equivalent loudness of an octave relative to quarter-octave. For comparison, all three stimulus types were also processed by a loudness model comprised of a nonlinear, active, time-domain cochlear model with an appended stage of neural spike generation. The amount of NEA in the model was determined by a parameter that specified the ensemble width as a distance along the basilar membrane. The NEA stage was preceded by a signal-compression stage and followed by a perceptual-expansion stage.
Loudness reduction was observed using quarter-octave band-limited noise. The amount of loudness reduction depended on both the level and frequency of the stimulus level and decreased with increasing PTA. Loudness summation was also observed. Approximate agreement between measured and simulated loudness reduction was achieved with an ensemble width of 2 mm.
Loudness reduction is greatest, at moderate stimulus levels, at frequencies near 1 kHz, and for normal-hearing listeners. This phenomenon is thought to be due to temporal fluctuations in the stimulus envelope that are absent for tones. The dependence of this phenomenon on ensemble width in the model suggests that its observation in CLS measurements is evidence of the presence of NEA in the human auditory system. Besides NEA, the simulation of this phenomena also requires fast-acting compression in cochlear mechanics.
Influences of non-auditory nuisance factors on digit-perception tasks
ABSTRACT. Many auditory research studies assess perception of digits in the presence of masking or distracting sounds, often with the intention of detecting auditory pathophysiology. Examples include the use of digit triplets in speech-shaped noise to screen for hearing loss, or of spatially separated streams of digits to reveal temporal processing deficits associated with cochlear synaptopathy. Yet speech-perception tasks are affected by factors beyond auditory function, such as cognition, motivation, and language. The present work aims to assess the extent to which digit-perception tasks are influenced by a range of potential nuisance factors, and how these influences vary with masker characteristics.
The exploratory analysis exploits data from 220 otologically healthy teenagers, gathered at baseline of a longitudinal study. The influences of non-auditory nuisance factors should emerge more clearly in this homogeneous cohort than in older and more diverse populations. Participants completed three digit-perception tasks, each with a diotically presented digit-triplet target. The maskers were: (A) diotic speech-shaped noise, (B) spatially separated 10-talker babble, and (C) two spatially separated digit triplets, distinguishable from the target only via spatial cues. Condition A represents traditional digits-in-noise testing; condition B greater ecological validity; and condition C maximum reliance on spatial processing. Performance was tested for relations to educational attainment, auditory working memory, musicianship, and socio-economic status.
Perception of digits in diotic speech-shaped noise and in spatially separated babble was generally unrelated to the tested nuisance factors. In contrast, the task relying solely on spatial cues showed significant associations with educational attainment, musicianship, and socio-economic status in univariate analyses. Multiple linear regression models revealed that educational attainment had the strongest influence on performance, with a smaller contribution from musicianship. Model comparisons indicated that an apparent association between musicianship and listening performance was partly attributable to confounding factors.
Digits-in-noise and digits-in-babble tasks appear relatively free of influence from the tested nuisance factors, even when some spatial cues are present; not so for digit-discrimination tasks reliant solely on spatial cues. The latter are often employed due to their expected sensitivity to spatial processing deficits. However, caution may be required when interpreting their results, especially in cross-sectional study designs with high risk of confounding. Finally, observed relations between musicianship and non-musical auditory performance are sometimes attributed to “training effects” from musical practice, but may also reflect confounding factors; teenagers in our cohort who play music are more privileged and highly educated than those who do not.
[RETRACTED] Neural processing the global properties of natural auditory scenes
ABSTRACT. Conscious perception of an auditory scene is often assumed to rely on the identification and segregation of multiple objects making sounds around the same time. However, it is possible that a more global process may also occur when we evaluate auditory scenes. Studies in the visual domain have identified global properties (e.g., openness, naturalness) that aid in our rapid recognition of scenes – even without identifying each individual object within it. Recent behavioral work from our lab has extended these findings by providing preliminary evidence for global processing of complex, real-world auditory scenes.
The aim of the present study is to expand these behavioral findings by investigating the neural processing of natural auditory scenes. 45 normal hearing participants listened to 200 scenes (4 sec each) and completed separate object (“Did you hear a bird?”) and setting (“Could this be a café?”) identification tasks during EEG recording.
Non-parametric cluster-based analyses were conducted to compare performance on the setting and object identifications tasks. Preliminary results indicate similar early ERP responses (N1, P2) for both tasks, but a more negative sustained potential is elicited by the object identification task .
An additional set of cluster analyses will be conducted to compare setting and object identification task performance to the average global property ratings of the scenes (from McMullin et al., 2024). The scenes were categorized into a “high” and “low” group based on their scores on Factors 1 and 2 from the exploratory factor analysis conducted on the average global property ratings of the scenes from McMullin et al. (2024). We expect to find distinct patterns of activity for scene- and object-related processing at the nine frontocentral electrodes which often reflect activity associated with auditory cortex activation. We will also conduct a time-frequency analysis to assess changes in power across frequencies and measure inter-trial phase coherence across trials for the same comparisons made in the cluster analysis.
The results of this study deepen our understanding of how object-level and scene-level information relates to the global properties of auditory scenes, as well as the neural activity associated with identifying objects and setting information. Although EEG will not allow us to identify the location of pathways for scene and/or object processing, it may reveal amplitude differences between the setting and object identification tasks in regions distinct to the auditory ventral stream, which could merit future fMRI studies on natural auditory scene perception.
Auditory thresholds and temporal integration of loudness of infrasound and very low frequency signals
ABSTRACT. We know now that sounds below 20 Hz can be heard, provided that they are loud enough. Hearing thresholds in the so-called “audible” frequency range (20 - 2000 Hz) have been the subject of a lot of studies, leading to the establishment of international standards. However, hearing perception at very low frequencies (between 20 and about 200 Hz) and infrasound (f< 20 Hz) has been little studied in comparison with higher frequencies. To date, there are no standards of auditory thresholds in infrasound frequencies, and standards for very low frequencies have been established on the basis of few studies. In practice, many studies refer to ISO 226 for thresholds above 20 Hz, and to Møller and Pedersen's (Noise&Health 2004) proposal for those below 20 Hz.
In order to study infrasound perception, a special cabin has been built that makes it possible to reproduce these frequencies without audible distortions. To broadcast very low and infrasonic frequencies the room is equipped with 40 bass speakers and 32 subwoofer speakers on opposite wall that allows to reach the needed levels to make these frequencies audible. Hearing thresholds were measured between 4 and 125 Hz. We found lower thresholds than those reported in the literature. By calculating the excitation patterns of our signals, we verified that the system did not generate audible distortions. At 4 frequencies from 20 to 125 Hz we tested several threshold measurement methods (two different 2I-2AFC procedures with a 2 down-1up rule, method of limit), with different restitution systems (in the cabin, in a room dedicated to sound listening with a subwoofer, under headphones), on a panel of 30 listeners, confirming the thresholds we previously found at these frequencies. Inter-individual differences were also highlighted in these experiments.
We also measured temporal integration for infrasonic frequencies. We confirmed that integration time increases with decreasing frequency, and showed that at 4 Hz, integration is still not complete for a duration of 5 s.