Evaluating listener perception of synthetic Head-Related Transfer Functions
ABSTRACT. Headphone-based spatial audio uses head-related transfer functions (HRTFs) to recreate real-world acoustic interactiona and environments. Since HRTFs are unique to individuals based on their morphology, synthetic HRTFs generated from 3D scans (via Mesh2HRTF) may offer a personal alternative to conventionally (acoustically) measured HRTFs. Whilst prior studies focus on numerical comparisons and auditory models, this study comprehensively evaluates perceptual differences between synthetic, acoustically measured, and non-individual (KEMAR) HRTFs, and their impact on spatial audio perception.
Experiment 1 (n = 19) tested subjects' ability to localize noise bursts in a virtual reality environment 360° around the listener, capturing the influence of each respective HRTF on binaural and monaural spatial cues. After a training phase with feedback and free-field presentation, subjects localised the noise bursts rendered with KEMAR, measured, and synthetic HRTFs presented over headphones. Experiment 2 (n = 19) examined the impact of each HRTF type on spatial release from masking. Subjects identified a target talker’s message (colour and number) amid two competing talkers at varying spatial separations (0°, 50°, 100° in elevation). Experiment 3 (ongoing) examines how the type of HRTF affects listening effort in a non-semantic spatial release from masking task. Listeners identify gaps in a target stream amid competing streams, with behavioural and pupil responses measured under the different HRTF types, spatialisation, and number of concurrent streams to assess perceptual load.
Behavioural results from Experiment 1 show all HRTF types perform similarly for lateral localisation. However, in the polar plane, the KEMAR HRTF performs significantly worse than the synthetic and measured HRTFs. Although synthetic and measured HRTFs perform comparably overall, the latter slightly outperform synthetic ones in reducing front-back confusions. In Experiment 2, spatial release from masking enhanced performance, but no influence of HRTF type was observed. Data collection and analysis from Experiment 3 is still ongoing.
Overall, individual synthetic HRTFs enhance localization compared to non-individual HRTFs; however, it remains unclear how this improvement applies to more ecologically valid scenarios, such as auditory scene analysis and spatial release from masking. This suggests that 3D scan-based HRTFs are a viable alternative to non-individual HRTFs when acoustically measured ones are unavailable. However, acoustically measured HRTFs remain the gold standard for binaural audio, as 3D scans alone do not seem to be able to fully replicate their precision
Localization of free-field sound sources in the chronic phase of mild ischemic stroke
ABSTRACT. The ability to perceive the horizontal directions of sounds relies on interaural differences in sound level and timing. While this information is initially encoded by binaurally sensitive neurons in the brainstem, auditory processing for sound localization is not complete until later, more central stages of the auditory pathway. Although the effects of peripheral deficits on auditory perception have been studied extensively, the current understanding of the effects of damages to various central structures on directional hearing is incomplete. Stroke is a relatively common pathology whose multimodal effects are often hemisphere or even direction specific. For instance, it is known that in the visual modality, hemifield neglect can be induced by stroke, but a full understanding of its auditory homologues remains elusive. Systematic characterization of putative spatial hearing deficits related to stroke could potentially enable the development of individualized algorithmic interventions that seek to compensate for the associated spatial distortions. Previously, the effects of stroke on binaural hearing have been mainly investigated using headphone stimuli that provide combinations of interaural differences that are atypical for natural sounds. While these experiments enable precise control over low-level binaural stimulation parameters, such experiments may not be informative about the behavioral relevance of stroke-related spatial hearing deficits outside of the laboratory. Here, we investigated the localization performance of 14 chronic-phase mild stroke survivors in a free-field source identification task. Lesion locations across participants included both cortical and subcortical structures. We computed localization performance metrics (RMS error, bias, and standard deviation) from the response data and compared them to results from participants who had not suffered a stroke. The results show that localization performance varied across stroke survivors. While 10 of 14 performed normally, four subjects had localization deficits. However, comparable deficits were also observed in one of five age-matched control subjects. As such, the observed deficits are not necessarily stroke specific. Further, we compared the localization responses to the responses from a single-cue headphone-based lateralization task and found that for 7 subjects the two tasks yielded qualitatively different results. Overall, the data suggests that behavioral assessments of functional binaural hearing would benefit from both lateralization and localization tasks. Specifically, lateralization experiments may be more sensitive to detecting cue-specific processing deficits that can remain undetected in localization tasks and may be of diagnostic value. Localization tasks, on the other hand, can be more informative about the relevance of the impairment in out-of-the-laboratory listening scenarios.
Adaptation rate and persistence across multiple sets of spectral cues for sound localisation
ABSTRACT. The adult auditory system adapts to changes in spectral cues for sound localization. This plasticity was demonstrated by modifying the shape of the pinnae with molds. Previous studies investigating this adaptation process have focused on the effects of learning one additional set of spectral cues. However, adaptation to multiple pinna shapes could reveal limitations in the auditory systems’ ability to encode discrete spectral-to-spatial mappings without interference and thus help to determine the mechanism underlying spectral cue relearning.
In the present study, 15 listeners learned to localize sounds with two different sets of earmolds within consecutive five-day adaptation periods. To establish both representations in quick succession, participants underwent daily sessions of gamified sensory-motor training. Acoustic and behavioral effects of the earmolds were recorded, as well as the trajectories of individual adaptation to the modified pinnae throughout the experiment. To test whether learning a new set of spectral cues interferes with a recently learned mapping, the persistence of the initial adaptation was measured after participants adapted to the second set of earmolds. Earmolds were removed after each adaptation period, and participants’ localization accuracy was immediately measured again to test for aftereffects of the adaptation.
Both pinna modifications severely disrupted vertical sound localization, and participants recovered within each 5-day adaptation period. After the second adaptation listeners were able to access 3 different sets of spectral cues for sound localization. Learning a second set of modified cues did not interfere with the previous adaptation and adaptation rate did not increase with repeated cue relearning. Participants' localization accuracy with their native ears remained unchanged once the molds were removed.
Modified pinna shapes were sufficiently different to cause repeated disruption of vertical localization. Learning rates and adaptation persistence exceeded those observed in previous studies. Adaptation persistence did not differ between the successive earmolds, suggesting that adaptation to the second mold did not interfere with the previously learned representation. Participants adapted to both sets of spectral cues with equal success, indicating a pre-attentive process which does not underlie metaplasticity or procedural learning. The ability to store multiple sets of spectral cues without interference suggests a surprisingly large capacity for this representation, which may be indicative of a cortical participation.
On the transformation of external sound-localization to internal lateralization space
ABSTRACT. Introduction:
The everyday experience of humans is to perceive sound sources in a three-dimensional space. Distance perception is determined through several cues; critically, sound sources are perceived outside the head. Horizontal-plane (azimuthal angle) perception is determined by interaural time differences (ITDs) and interaural level differences (ILDs). In controlled spatial-hearing experiments, such as in a sound booth and using headphones, sounds become perceived as inside of the head (i.e., internalized). Despite years of research using both paradigms separately, it is still not understood how humans transform the external localization space in the horizontal plane (i.e., azimuthal angle) to a one-dimensional internal lateralization space. The purpose of this study was to understand this transformation. It was hypothesized that humans could perform one of two types of transformation: (1) a radial mapping that respects the azimuthal angle or (2) a flat-mirror mapping that compresses large angles to a small lateralization range near the ears.
Methods:
Twenty young normal-hearing listeners participated in the study. They were presented with stimuli over (1) loudspeakers (natural interaural differences) or (2) circumaural open-back headphones (non-individualized head-related transfer functions). They responded using (1) a radial response localization interface (i.e., 180-degree arc with a fixed distance viewed from above) or (2) a one-dimensional lateralization interface (i.e., a bar across the head viewed from the front). Therefore, both interfaces were utilized for both stimulus presentation modes, resulting in four conditions. Stimuli were presented with 15-degree azimuth angle resolution, and interaural differences were matched across tasks. The stimuli were a low-pass noise (<800 Hz) and a high-pass noise (>4000 Hz). There were 10 trials per condition. Finally, head size measurements were also made to estimate interaural differences for the free-field condition.
Results:
For high-frequency noises where ILDs are dominant, preliminary analyses showed support for the flat-mirror mapping hypothesis. This was a result of a steeper response slope for lateralization than localization; responses at large ILDs were compressed to a smaller lateralization range near the ears. For low-frequency noises, there was support for the radial-mapping hypothesis.
Discussion:
Spatial mapping may be different depending on the response interface and the type of interaural difference that is being utilized. Results will be discussed in terms of procedural effects and the physics of the head. These results are important because they help us understand how the response method affects the perception of binaural cues.
Three-dimensional sound localization of nearby sources in echoic rooms
ABSTRACT. Sound localization is typically examined separately in the three spatial dimensions of azimuth, elevation, and distance. While multiple studies have examined localization performance for sources varying simultaneously in two of the dimensions, very few have considered sources varying simultaneously in all three dimensions. Santarelli et al. (J. Acoust. Soc. Am. 105, 1024, 1999) performed an experiment in a reverberant classroom in which subjects were asked to point to the perceived position of broadband-noise sound sources presented from a random location in the right hemifield within 1 m of the listener’s head. However, while the source locations varied simultaneously in all three dimensions and the responses were recorded in 3-D, prior analysis only considered each spatial dimension separately.
Here, a new analysis examines how localization response biases varied with source location simultaneously in all three dimensions.
Directional 2-D localization performance was examined simultaneously in azimuth and elevation by projecting the data onto the surface of a unit sphere with the observer at the origin and separately considering near (distance < 50 cm) and far (distance > 50 cm) source locations. After binning the data for the two distances into 25 directional bins, the mean stimulus and response directions were determined using the cartesian coordinates, while mean distances were determined on a logarithmic scale.
Cone-of-confusion error analysis showed relatively large error rates, with approximately 20% of responses falling into a different quadrant than the source quadrant, out of which 87% were front-back, 6% up-down, and 6% front-back & up-down confusions, approximately independent of the distance. Directional errors in responses varied in a complex way across the three dimensions. For instance, the largest errors occurred for sources behind and below the subject’s head, with responses biased medially and downwards for near sources, while reversing to an upwards bias for far sources. For near sources only, there was a general upwards bias trend. For both near and far sources close to the horizontal plane and away
from the medial vertical plane, a lateral bias was observed. Additional analysis will consider dispersion of responses, biases within and across subjects, and comparisons to anechoic performance.
These results provide a normative characterization of 3-D localization performance for nearby sources in reverberation, showing that errors can be larger than observed previously when stimuli vary simultaneously in all three dimensions.
(Work supported by EU HORIZON-MSCA-2022-SE-01 101129903, APVV-23-0054, SK-AT-
23-0002)
Randomization in pitch ranking leads to performance underestimation in cochlear implant users and normal-hearing listeners
ABSTRACT. In pitch ranking, subjects decide which of two successive stimuli had a higher pitch. Each stimulus pair is presented in both orders, and average performance is reported. However, we observed that the presentation order influenced the performance in electrode- (cochlea implant users) and pitch-ranking tasks (normal-hearing listeners) when all stimulus pairs were presented in random order.
Our cochlear implant (CI) study involved 10 MED-EL users (13 ears) aged 22 to 55 years (mean: 44 years). Each of the 12 CI-electrodes (E1-E12, numbering starting at the apex) was compared to its three basal neighbors (direct stimulation via RIB2). Per electrode pair, ascending (apical to basal) and descending presentation order (basal to apical) were measured five times in random order. A comparable experiment included 8 normal-hearing (NH) subjects aged 22 to 28 years (mean: 26 years). Electrical stimulation was emulated using bandpass-filtered (6 dB/oct slopes) uniform exciting noise with center frequencies logarithmically distributed between 241 and 5334 Hz (F1-F12).
Pitch ranking performances were comparable between both groups (adjacent CI-electrodes: 74 ± 10 %, adjacent filters: 74 ± 13 %) and improved with increasing distance between electrodes and filters. A Greenhouse-Geisser-corrected repeated measures ANOVA showed a significant interaction between presentation order and stimulation site for adjacent electrodes (p = .003) and filters (p < .001). Bonferroni-corrected post-hoc tests indicated that in apical locations, descending pitch pairs outperformed ascending pairs in CI users (E1/E2: 46 ± 32 %, E2/E1: 71 ± 26 %, p = .03) and NH subjects (F1/F2: 42 ± 38 %, F2/F1: 96 ± 9 %, p = .002). Conversely, ascending pitch pairs performed significantly better in basal regions (E7/E8: MDiff = -20 %, p = .02; F11/F12: MDiff= -36 %, p = .03). In a second NH experiment, this discrepancy was addressed by testing the stimulus pairs in isolation (i.e., only F1/F2 and F2/F1) rather than randomizing all pairs (F1/F2: MDiff = -15 % and F11/F12: MDiff = -13 %, p > .05).
Our results indicate that CI users and NH subjects exhibit similar directional effects in challenging pitch perception tasks. Randomizing stimulus pairs and presenting them in both directions to minimize biases introduces a new interaction that underestimates pitch-ranking performance. Memory and psychological factors could influence how individuals mentally represent pitch, potentially resulting in a “central tendency” and “pooling” effect, where the memory of the first stimulus shifts closer to the average, leading to the observed presentation order effect.
Neural and behavioral processing of pitch relations within spectrally sparse musical chords in cochlear implant listeners
ABSTRACT. Musical harmony is essential in western music perception. However, in cochlear implant (CI) listeners, harmony perception is severely limited. One potential limitation in CI listeners is that the simultaneous occurrence of music chord components (voices) leads to complex interactions between electrodes in the cochlea, especially for stimuli composed of broadband real-world sounds such as piano tones. A possible solution might be a reduction of spectral complexity of sounds. In a previous experiment, we confirmed that behavioral discrimination of two successive musical triads (i.e., three simultaneous complex tones) was improved for triads with low spectral complexity (i.e., three spectral components per voice) compared to triads with higher spectral complexity (i.e., nine spectral components per voice). To further investigate harmony perception in CI listeners with these spectrally sparse musical triads by targeting higher-level harmonic integration instead of only peripheral discrimination, here, we measured behavioral as well as neural responses to changes in tone relations of triad voices in an oddball paradigm.
So far, we tested five post-lingually implanted CI listeners and ten normal-hearing (NH) controls. Participants were presented with frequent standard triads and infrequent deviant triads. Depending on the task, standard triads were either (A) physically similar (i.e., identical major chords), (B) structurally similar (i.e., major chords) but arranged at continuously ascending or descending F0s, or (C) structurally similar (i.e., major chords) but arranged at randomly varying F0s. Deviant triads differed from standard triads by one semitone in one or two voices and were, thus, structurally different (i.e., with respect to voice relations) from standard triads. Critically, the design of tasks B and C procedurally ensured that participants judge voice relations rather than track individual voices, with task C exhibiting the highest need for abstraction or harmonic integration. Participants responded by keypress whenever they perceived a deviant triad. In addition to behavioral judgements, we investigated whether participants exhibited significant mismatch negativity (MMN) responses following deviant triads, by using electroencephalography.
Results of NH listeners yielded significant MMN and behavioral sensitivity in all tasks, with highest sensitivity in task A and lowest sensitivity in task C. Data collection for CI listeners is ongoing; yet, so far, neural as well as behavioral results yield a similar pattern as observed in NH listeners and, thus, suggest that, similar to NH listeners, CI listeners can process voice relations within spectrally sparse musical chords on a neural as well as behavioral level.
Speech-on-speech perception in listeners with normal and impaired hearing – the role of periodicity and effects of fundamental-frequency dynamics manipulations
ABSTRACT. Understanding speech in the presence of interfering speech signals is a challenging auditory task that normal-hearing (NH) listeners typically perform successfully whereas many hearing-impaired (HI) listeners exhibit severe difficulties with it, even when provided with hearing aids. The periodicity information of the competing speech signals, which is connected to their fundamental-frequency (F0) characteristics, can provide useful auditory cues for segregating the target speech from the interfering speech. However, it is unclear how hearing deficits influence the availability of periodicity cues and the salience of F0 characteristics such as the F0 dynamics.
We employed a dedicated speech-test design using Danish HINT sentences embedded in connected speech as targets, spoken by three female and three male speakers. The interfering speech also consisted of connected speech, spoken by a different talker of the same sex as the target talker. Two experiments were conducted with 12 NH and 29 HI listeners using loudspeaker-based stimulus presentation and linear hearing-aid amplification for HI listeners. Experiment 1 investigated the effect of presence/absence of periodicity, measuring speech reception thresholds (SRTs) in conditions with periodicity information in target and/or masker signals either fully available (natural speech) or removed using noise vocoding (vocoded speech). Experiment 2 assessed the effect of F0-dynamic-range manipulations by compressing or expanding the F0 dynamic range of target and masker speech signals.
SRTs measured in experiment 1 for NH and HI listeners were lowest for natural target and masker speech, with HI listeners exhibiting strongly elevated SRTs. A relative worsening of NH listeners’ SRTs was only observed when both target and masker were vocoded, whereas HI listeners’ SRTs also worsened when the target was vocoded and the masker was not. SRTs measured in experiment 2 for NH listeners were unchanged when compressing the F0 dynamics of both signals and improved when expanding the F0 dynamics of target and/or masker. For HI listeners, the F0 manipulations led to worsened SRTs on average; however, SRTs differed strongly across HI listeners, with some participants showing substantial SRT improvements due to increased target-F0 dynamics.
The findings suggest that NH listeners can effectively utilize periodicity information contained in either target or masker signals whereas HI listeners rely mostly on the periodicity information in the target speech. F0-dynamics manipulations can improve speech segregation for NH listeners and some HI listeners. The HI listeners’ auditory profiles may help understand for which individuals such speech-perception improvements are attainable.
Weighting of place and temporal envelope cues for pitch, music, and emotion perception with cochlear implants
ABSTRACT. Pitch cues for spoken emotion and musical melody recognition can be poorly perceived in cochlear implant (CI) users. Two pitch cues available to CI users are cochlear place of stimulation and temporal cues, which can be encoded by pulse rate or by the temporal envelope [i.e., amplitude modulation (AM) of the pulse train]. Previous studies have shown place of stimulation to be weighted more strongly than pulse rate for CI pitch perception. However, the relative weighting of place and AM temporal envelope cues for CI pitch perception has not yet been investigated. The goal of this study was to measure the relative weighting of these cues in CI users for pitch, musical contour, and spoken emotion perception.
Eleven bilateral CI users (7 females; age range 14-39 years) were tested. Electrode discrimination (place cue) and AM rate discrimination (temporal envelope cue) were measured using direct stimulation in a 3-alternative forced choice, 2-up, 1-down adaptive procedure. Relative weighting of the two pitch cues was measured using a single-interval pitch magnitude estimation procedure on a scale of 1 to 10. Stimuli were biphasic pulse trains presented to an apical, middle, or basal electrode at 0 (no AM), 75, 100, 150, 200, or 300 Hz AM rates. Spoken emotion identification and melodic contour identification were assessed for low and high base F0s for three conditions: unprocessed, place-only, and temporal-only. Place- and temporal-only stimuli were created using a sine wave vocoder. For the place-only condition, the temporal envelope was low-pass filtered at 20 Hz; for the temporal only condition, stimuli were delivered only to a single middle electrode.
All participants showed better pitch discrimination with AM rate than electrode place cues, with AM rate discrimination limens as low as 3 Hz. In the pitch scaling task, weighting for AM rate was stronger than for place, completely dominating the pitch percept for the 75, 100, and 150 Hz rates in some participants. Spoken emotion and melodic contour performance were both best in the unprocessed condition, with poorer performance for the temporal-only and place-only conditions. Performance was better for the temporal-only conditions for low F0 stimuli, and better for the place-only condition for high F0 stimuli.
The findings suggest that for CI listeners, weighting of temporal envelope cues can be stronger than place cues, especially for low AM rates and F0s.
This study was funded by a grant from House Institute Foundation and NIH STEMM-HEAR grant R25DC020698.
How focused perceptual thresholds and neural responses reflect the sensitivity to electric stimuli in cochlear implant listeners
ABSTRACT. Introduction: Focused threshold measurements, a rapid procedure to assess perceptual detection levels in cochlear implant (CI) listeners, are sensitive to threshold variabilities along the CI electrode array. These variabilities might indicate the quality of the electrode-neuron interface (ENI), that is the site-specific interface between CI electrode and neural activity following electric stimulation. Low “good” threshold CI channels might lead to more accurate neural responses to electric stimuli. In contrast, high “poor” threshold channels are suspected to reduce signal accuracy, assumptions based on previous studies. Here we investigate the relationship between focused threshold measurements and neural electrically-evoked compound action potentials (ECAPs), to ultimately explain the link between perceptual and physiological effects as the underlying cause of performance differences in CI listeners.
Methods: Fifteen adult Advanced Bionics CI subjects participated in the focused threshold and ECAP assessments. Focused thresholds were obtained using partial quadrupolar stimulation and current steering to generate a stimulus sweep across the CI electrode array ranging from channels 2 to 15. Threshold profiles were extracted from an average of two apical- and two basal threshold sweeps. Individual ECAPs responses were assessed with the standard forward masking artifact subtraction technique in monopolar stimulation mode, extracted from the average of 50 masker-probe sweeps on all CI channels. The sweeps consisted of four frames with different masker-probe combinations and loudness-balanced stimulation levels, individual phase duration, and masker-probe intervals.
Results: In an intermediate analysis, focused thresholds and ECAP peak amplitudes were correlated, which resulted in a significant linear relationship between the grouped outcomes. Low focused thresholds showed large ECAP amplitudes on the corresponding CI channels, and small ECAP amplitudes were recorded at high threshold channels. In addition, a first analysis of masker-probe stimulation levels and ECAP peak amplitudes revealed a complex input-output relationship. Some individuals showed large channel-specific input-output gaps along the CI array, while others showed matching profiles.
Discussion: Channel-to-channel threshold differences revealed a promising significant trend with low focused thresholds reflecting large ECAP peak amplitudes and vice versa. This relationship might be further supported by the ECAP input-output gap, a possible indicator of the neural survival of the corresponding cochlea region. This knowledge could be directly applied in the clinic by adjusting stimulation locations to improve the signal transmission in regard to their neural sensitivity and, therefore, to enhance CI programming and performance outcomes.
Cochlear implant outcomes: Cluster analysis of focused threshold level, etiology, and vowel recognition patterns
ABSTRACT. When the interface between cochlear implant (CI) electrodes and auditory neurons is poor, electrical current spreads to unintended neurons, causing spectrally ambiguous input and poorer speech recognition. Focused thresholds, a proxy for the electrode-neuron interface (ENI), are higher for channels with a suboptimal ENI, often due to large electrode-neuron distances or neuronal degeneration.
Postlingually deaf adults with the Advanced Bionics CI technology participated (N=54). Using focused thresholds, participants were clustered into three groups via generalized additive modeling (GAM), with lower cluster values indicating better thresholds. Cluster 1 had the lowest thresholds (M = 122 μA), cluster 2 was intermediate (M = 233 μA), and cluster 3 had the highest thresholds (M = 436 μA). A subset of 15 participants had medial vowel identification data in quiet and Auditec 4-talker babble. Medical records identified participants’ etiologies as genetic (n = 9), Meniere’s (n = 2), SSNHL (n = 2), ototoxic (n = 1), or idiopathic (n = 1). To avoid ceiling effects, SNR levels were adjusted to a challenging target of ~60% correct performance.
Participants with genetic and Meniere’s etiologies fall predominantly into cluster 1, while SSNHL, ototoxic, and idiopathic etiologies were more frequent in clusters 2 and 3. Cluster 3 participants were tested at the lowest mean SNR (M = 4.75 dB) but had the highest variability (SD = 7.12 dB) and widest range (-4 dB to 11.5 dB). Variability was lower in clusters 1 and 2 (SD = 2.56 and 5.59, respectively). Vowel recognition errors for cluster 3 participants, characterized by high thresholds in the middle of the array, were consistent with findings from DiNino et al. (2016), where errors in vocoded vowel recognition shifted from regions of spectral degradation toward preserved regions, particularly for vowels with F2 values in the degraded region, leading to confusion with vowels sharing similar F1 values. Listeners with Meniere’s, ototoxicity, and SSNHL demonstrated error patterns indicative of apical degradation, confusing vowels with similar F2 values.
These findings highlight the complex relationship between ENI quality, etiology, and focused thresholds during speech recognition. Vowel confusion patterns reinforce the importance of assessing the ENI to identify regions of potential spectral degradation. Tailored programming strategies that account for focused thresholds can optimize outcomes and improve patient satisfaction, improving quality of life for many cochlear implant users, particularly those with suboptimal electrode-neuron interfaces.
Advancing Hearing Diagnostics and Devices through Extracochlear Electric-Acoustic Stimulation
ABSTRACT. Introduction Previous studies have shown that combined electric and acoustic stimulation of the auditory system can produce interaction mechanisms (e.g., Krueger et al., 2017; Imsiecke et al., 2018). These interactions may originate from electroneural stimulation of auditory nerve fibers or electrophonic stimulation of hair cells (e.g. Kipping et al. 2020). Understanding these mechanisms could pave the way for new diagnostic devices for hearing loss and enhance hearing by combining electric and acoustic stimulation. Assessing low-frequency hearing below ˜500 Hz in clinical settings remains challenging due to noise vulnerability (e.g., Wilson et al., 2016; Frank et al., 2017). Current diagnostic measures rely solely on acoustic or electric stimulation responses. A scientific challenge is to utilize electric-acoustic interaction mechanisms for diagnosing hearing loss, which only occurs if acoustic hearing is present. Restoring audibility in the high frequencies, for example to treat age-related hearing loss, is also challenging, especially in severe cases. Cochlear implant (CI) users with residual hearing in the low frequencies benefit from additional electric stimulation to receive combined electric-acoustic stimulation (EAS) (Turner et al., 2004; Buechner et al., 2009; Wilson et al., 2012). However, cochlear implantation can cause trauma and compromise the low frequency acoustic hearing (Pfingst et al., 2015; Quesnel et al., 2016). A less traumatic alternative is extra-cochlear electric stimulation, inspired by EAS benefits, to deliver acoustic and minimally invasive electric stimulation.
Methods We designed studies to explore the potential of extra-cochlear electric stimulation in auditory perception and its interaction with acoustic signals. Study 1 investigates whether sound sensations can be elicited through extra-cochlear electric stimulation alone. Study 2 examines the interaction between electric and acoustic stimulation when applied extra-cochlearly. Study 3 aims to measure these interactions objectively and through behavioral experiments. Study 4 evaluates whether combining acoustic and extracochlear electric stimulation can enhance speech understanding, potentially improving auditory solutions for hearing impairments. These studies are supported by simulations with a novel computational model of the electrically and acoustically stimulated auditory system (Kipping et al., 2024). Experiments were conducted with two subject populations 1) Partial insertion CI users with some electrode contacts inside and others outside the cochlea, allowing investigation of electrical stimulation benefits near the round window. 2) Normal hearing listeners with an electrode in the ear canal and a tube for simultaneous acoustic stimulation.
Results and Conclusions Sections Truncated (exceeding word limit)
Identification and Comprehension of Environmental Sounds in Cochlear Implant Users
ABSTRACT. Environmental sounds are semantically and acoustically complex signals that provide listeners with important information about objects and events in their vicinity. Despite their recognized importance for safety and well-being, extant research indicates deficits in environmental sound perception in CI users and a lack of post-CI improvement. However, the majority of past studies have relied almost exclusively on identification tasks that required naming of individual sounds. This contrasts with everyday listening that involves processing of sounds’ semantic properties in specific contexts without explicit naming.
We examined identification and comprehension of environmental sounds in CI and Hard-of-Hearing (HoH) adults in comparison to normal hearing (NH) peers on three tasks, two of which did not require sound naming and associated lexical processing. First in and odd-one-out task, participants heard triplets of individual environmental sounds that differed in their semantic properties and were asked to pick one sound that “did not fit” in (e.g. rooster -siren - duck). Second, in an identification task, participants heard each sound from the previous task and asked to identify it by selecting one of 35 sound names. Lastly, in a comprehension task, participants heard naturalistic auditory scenes and were asked inference-based questions about objects and actions in each scene without explicitly naming them (e.g. ‘which time of day events in this scene most likely take place?’).
Across the three tasks, results indicate a generally high accuracy for NH and HoH participants (means’ range: 93%-80%) and significantly poorer performance for CI users (means’ range: 81%-64%). Identification of individual sounds and comprehension of auditory scenes were the most challenging for CI users (means 64% [s.d. 12.0] and 65% [s.d. 16.8], respectively).
These findings indicate that perception of environmental sounds and their semantic properties is modulated by task requirements and involves different aspects of perceptual-cognitive-linguistic processing. They support the use of a battery of complimentary environmental sound tests to 1) reveal underlying perceptual mechanisms, 2) suggest specific areas of deficit and 3) inform the counseling of CI candidates and continuing rehabilitation of CI users.
Cue weights in auditory time-to-collision estimation for approaching vehicles
ABSTRACT. Safe road crossing requires a sufficiently long time remaining until an approaching vehicle arrives at the pedestrian’s position. For a pedestrian standing at the curb, the dynamic spatial sound field generated by the vehicle provides various cues to the time-to-collision (TTC). Due to spherical spreading and air absorption, the sound intensity at the listener position increases with decreasing vehicle distance. Also, the azimuthal position of the vehicle and the angular separation of sub-sources (e.g., left and right tires) change with distance. So-called tau variables, which closely approximate the instantaneous TTC, can be calculated from both the intensity and angular changes. Here, we investigated which cues are most relevant for auditory TTC judgments.
To estimate cue weights using a psychophysical reverse-correlation approach, the TTC signaled by the intensity cues was shifted against the TTC signaled by the angular cues. This is only possible in virtual environments. In the real world, these cues are inseparable. Simulations of approaching vehicles were generated in TASCAR. The source signals were recordings made with microphones attached to the chassis of a conventional vehicle driving at different constant speeds. The simulated approaching vehicles were presented with 15th-order 2D Ambisonics. The TTC and speed signaled by the dynamic intensity cues were shifted against the TTC and speed signaled by the change in spatial position of the simulated sound sources (i.e., the angular cues). The source intensity and the vehicle size were also varied. 20 participants estimated the vehicles’ TTCs in a prediction-motion task. Cue weights were estimated from the trial-by trial data by regressing the estimated TTC on the different cues.
The results showed a significant association of both intensity cues and angular cues with the estimated TTC. However, general dominance weights indicated a higher importance of intensity cues compared to angular cues. Also, the loudness at the time of TTC estimation (estimated with the Cambridge dynamic loudness model), a “heuristic” cue, showed a higher cue importance than the more reliable tau-variable based on loudness. At a given presented TTC, participants estimated shorter TTCs for louder vehicles, compatible with previous results. The angular separation of sub-sources at the time of estimation did not show a high importance.
Taken together, the data provide novel insights into which acoustic cues pedestrians use for auditorily estimating the TTC of an approaching vehicle. The observed cue weights are discussed in relation to the reliability of the different auditory cues to TTC.
Behavioral and Neurophysiological Approaches for Measuring the Relative Effectiveness of Envelope and Pulse Timing ITDs in Cochlear Implant Stimuli
ABSTRACT. Interaural time difference (ITD) discrimination is a major challenge for early-deafened patients with bilateral cochlear implants (biCI). State of the art clinical CI processors employ asynchronous pulsatile stimulation in each ear, so that biCI users only experience ITD in pulse train envelopes (envITD) but not on the pulse-timing (ptITD). Previous work from our groups has shown that neonatally deafened (ND) rats stimulated with temporal precise biCIs, providing informative ptITD from stimulation onset, can develop excellent ITD sensitivity. Here we studied the behavioral and neurophysiological sensitivity of the early deafened auditory system for ptITD compared to envITD, using our animal model of ND rats, supplied with biCI.
For this purpose we designed different stimuli comprising pulse trains of either 900 or 4500 pulses per second, which were modulated by either 0.01, 0.05 or 0.2 second long raised cosine windows. The ptITD and the envITD could vary independently from the values [-0.08, 0, +0.08] ms. All stimuli were presented via biCI chronically implanted in young adulthood. During behavioral training, biCI rats learned to lateralize these stimuli in a positively reinforced two-alternative forced choice-task. In “Honesty” trials with congruent ptITD and envITD, the animals had to lateralize correctly to be rewarded, whereas in “Probe” trials with incongruent ptITD and envITD, they were rewarded regardless of the side chosen. To identify the behavioral sensitivity for ptITD and envITD a probit analysis was performed. The same electric stimuli were presented to measure the neurophysiological sensitivity of the inferior colliculus (IC) under anesthesia at the end of the behavioral test phase. Multi-units were recorded using multi-channel silicon probes. For each recorded multi-unit electrical stimulation artifacts were removed and the analog multi-unit activity was computed, followed by calculation of the proportion of variance in neural responses explained by ptITD and envITD.
We observed that ptITDs completely dominate the lateralization decision of our biCI rats, while the influence of envITDs was almost negligible in comparison. Additionally, we verified electrophysiologically that IC neurons of our CI-experienced rats were more sensitive to ptITDs than to envITDs. Both findings were true irrespective of envelope shape.
Our results underline the importance of a fine structure stimulation strategy in CI users and could thus demonstrate behaviorally, as well as neurophysiologically the importance of presenting spatial ITD information on the pulse timing of current CI devices in order to improve the ITD sensitivity of biCI patients.
Individual Differences in Physiological Response to Effortful Listening
ABSTRACT. In environments with a noisy background or degraded speech, individuals must exert more mental effort to be able to understand speech. For hearing impaired listeners, this can lead to withdrawal from activities and degrade their general wellbeing. However, our understanding of ‘Effort in Listening’ remains limited, and current tools to quantify these constructs are not sufficiently robust.
The aim of the current research is to investigate individual differences in listening effort, with data collected from physiological measurements. Participants (n = 33, native speakers) were asked to complete the Oldenburg Matrix speech-in-noise tests. Sentences were formed from 5 words, and each word was randomly chosen from 10 choices.
During the experiment, Physiological data such as EEG (electroencephalogram), ECG (electrocardiogram), Pupillometry, GSR (Galvanic Skin Response), and Respiration were measured. Subjective rating of invested effort, subjective rating of task difficulty, and accuracy of the response were collected.
To vary the difficulty of the tasks, there were 4 levels of SNR (signal-noise-ratio): -16, -11, -6 and 12 dB, which corresponds approximately to 20%, 50%, 80%, and 100% accuracy in pilot studies. Participants repeated the measurements after one week at about the same time of the day.
Results show that there is a significant difference of subjective effort rating and accuracy between different SNR levels. For the most difficult task (SNR: -16dB), the average rating of task difficulty is higher than measured subjective effort, which means that participants recognize the task is highly difficult but actually invested less effort in completing the task.
There is a significant average difference between each SNR levels on diameter change. The average correlation between the diameter changes and SNR level is negative (r = - 0.37), demonstrating that pupil diameter increases as the SNR level decreases in more difficult tasks. No significant difference was found in heart rate between different SNR levels.
We expected and observed large individual differences in the data. Preliminary analysis shows that correlation between two datasets randomly selected from first and second experiment varies largely, indicating large individual differences. EEG shows similar results. We also observe that the change of pupil diameter from start to the end of the stimulus varies greatly between subjects but also shows common trends.
We will show a comprehensive analysis of group results and how individuals differ from each other on the poster/talk.
Speech intelligibility experiments and objective prediction with simulated hearing loss sounds to separate the effects of peripheral function from higher-level processes
ABSTRACT. The development of speech enhancement techniques for use in hearing aids is critical to assisting older adults with hearing loss (OHL). While evaluation with OHLs would be ideal, variable cognitive decline and higher-level knowledge make it difficult to interpret SI results in terms of enhancement effects. To address this issue, we conducted SI experiments with young normal hearing (YNH) listeners using a hearing loss (HL) simulator, WHIS, that mimics the HL of an OHL. A new OIM, GESI, was then used to predict the SI scores of both YNH and OHL, in order to extract information about the effects of peripheral and higher-level functions.
In the previous study, we already performed an SI experiment with 16 OHLs to report four-syllable word text from speech-in-noise with/without an ideal speech enhancement algorithm using an Ideal Ratio Mask (IRM) with SNRs between -6 dB and +12 dB. The same paradigm was used for the current experiment with 14 YNHs, but the stimulus conditions were expanded to include HL conditions in which WHIS produced reduced level sounds simulating one of the OHLs. The SI scores from the subjective experiments and the objective predictions from the GESI were compared between the YNHs and the OHL.
The speech reception thresholds (SRTs) calculated from the average SI scores of the YNHs for unprocessed and IRM-enhanced sounds were significantly better than those of the OHL, as expected. However, when listening to HL-simulated sounds, the SRTs were significantly worse than those of the OHL. The GESI prediction was in good agreement with the perceptual SI scores of the YNHs. On the other hand, the objective SI scores of the OHL were underestimated, as observed in the YNH HL-simulated results.
Since GESI consists of the auditory and modulation filterbanks and the similarity measure metric, it does not include a model of higher-level processes such as the mental lexicon. The underestimation by GESI implies that the OHL used higher order knowledge more effectively than the YNHs. More detailed analysis showed that the OHL was better at identifying the second and fourth syllables based on co-articulation and lexical knowledge. Experiments with YNHs using WHIS and prediction by GESI would be useful to evaluate the enhancement algorithms, distinguishing the effects of peripheral and higher-level processes.
Reevaluating effective channel number for cochlear implant speech perception using an ASR model
ABSTRACT. Previous research suggests that increasing the number of stimulating electrodes beyond a certain threshold (e.g., 8 electrodes) does not significantly enhance speech perception in cochlear implant (CI) users, and that most users cannot fully utilize the available spectral information. However, recent studies indicate that when the number of electrodes exceeds nine, speech perception continues to improve up to the maximum number of usable electrodes. In this study, we reexamine the effect of the effective number of channels on CI speech perception using a Whisper Large-V3 ASR model and a Gaussian-enveloped tone vocoder.
We conducted two preliminary experiments. In Experiment 1, we examined the effects of varying maxima and electric dynamic ranges in the Advanced Combination Encoding (ACE) strategy on speech intelligibility in both simulated and actual CI subjects. In Experiment 2, we investigated how channel numbers (4 to 44) and inter-channel current interference (−8 to −32 dB/octave) affected speech perception under both quiet and noisy conditions.
Experiment 1 showed that using Whisper in a zero-shot manner produced trends consistent with those reported in published studies, thereby verifying the feasibility of the objective methods based on the ASR model and the specific vocoder. Experiment 2 demonstrated that under quiet conditions, lower inter-channel current interference required more channels to reach the saturation point in recognition rate. When the effective number of channels reached 12 and inter-channel interference was reduced to −20 dB/octave, the sentence recognition rate saturated. Further increases in the number of channels and reductions in interference did not yield additional improvements. However, under noisy conditions, the recognition rate did not saturate until 16 channels were used. Furthermore, when inter-channel interference was relatively high, such as −14 dB/octave, further increasing the number of channels beyond the saturation point caused the speech recognition rate to decrease, both in quiet and noisy conditions.
These findings suggest that the optimal number of effective channels is not fixed but highly dependent on the listening environment and the degree of inter-channel interference. This highlights the importance of personalized CI programming strategies, where both the user's auditory conditions and the specific noise environment are considered to achieve the best speech perception outcomes. Additionally, these findings highlight the potential of ASR models in predicting the impact of channel number on CI speech perception. The objective approach offers a more efficient alternative to human subject testing, which warrants further investigation through additional experiments.
Association between speech in noise performance and inferred cochlear response functions derived from otoacoustic emissions
ABSTRACT. Perceiving and understanding speech in background noise is crucial for everyday human communication. For this reason, the clinical assessment of speech in noise often forms an integral part of many clinical hearing assessments, providing a metric by which to understand the perceptual impact of hearing dysfunction. In some types of hearing dysfunction, a component of the speech difficulties may result from cochlear dysfunction involving reduced compression. This study investigated the association between measures of cochlear compression obtained from distortion product otoacoustic emission (DPOAE) input-output (I/O) functions and measures of speech intelligibility using the Matrix language speech test at noise levels of 45, 55, and 65 dB SPL.
Participants first underwent a clinical test battery of preliminary assessments and had thresholds within normal limits. Two measures of cochlear compression were obtained using DPOAEs test signal frequencies spanning the speech range: a mid-frequency range (1414 Hz), and a higher-frequency range (4243 Hz). Values of the slope and kneepoint of the DPOAE I/O functions were then derived and compared to two speech intelligibility metrics derived from the Matrix speech test: An estimate of psychometric function slope, and an estimate of the Speech Reception Threshold to achieve 50% correct (SRT-50). It was hypothesised that higher estimates of compression may be associated with better speech intelligibility performance on the Matrix speech test. This association may be further moderated by the choice of DPOAE test signal frequency. In addition, the level of noise may affect speech intelligibility affecting the two speech metrics (slope and SRT-50), which in turn may influencing the strength of association between speech intelligibility and compression.
Results appear to indicate that the correspondence between DPOAE I/O functions metrics and speech intelligibility performance may be influenced by the choice of test signal frequency (a test frequency of 4243 Hz may provide a closer correspondence with speech test performance). Furthermore, speech intelligibility function slopes may be affected by a combination of compressive and efferent responses to speech in noise. This interaction may influence the choice of the most appropriate speech intelligibility metric (slope or SRT-50) for comparing with compression estimates at different background noise levels.
Individual differences in cochlear compressive nonlinearity: Relation to extended high frequency hearing and implications for suprathreshold auditory processing
ABSTRACT. Introduction:
Compressive nonlinearity of the cochlea is fundamental to hearing sounds across a wide dynamic range. This study investigated how individual variability in cochlear compression relates to subclinical cochlear damage and whether such variability influences suprathreshold hearing.
Distortion product otoacoustic emission input/output (DPOAE i/o) functions provide a noninvasive assay of cochlear compressive nonlinearity. Evidence suggests that hearing loss at extended high frequencies (EHFs) is associated with subclinical cochlear deficits at lower frequencies. This raises the possibility that elevated EHF hearing, a marker for basal cochlear dysfunction, could contribute to individual variability in DPOAE estimates of cochlear compression in individuals with normal audiograms.
Reduced cochlear compression in the presence of a normal audiogram may not affect hearing at low sound levels but could impair suprathreshold processing. For instance, suprathreshold deficits could manifest in tasks like frequency modulation (FM) detection, which relies on different mechanisms depending on the modulation rate. Slow-rate FM detection (<5 Hz) is primarily mediated by temporal fine structure cues, while fast-rate FM detection (>10 Hz) depends more on excitation patterns. Literature suggests that cochlear factors can account for up to 30% variance in FM sensitivity.
The present study tested two hypotheses: (1) EHF hearing sensitivity contributes significantly to individual differences in DPOAE estimates of cochlear nonlinearity, and (3) Variability in cochlear compression estimates predicts individual differences in fast-rate FM sensitivity but not slow-rate FM sensitivity.
Methods:
DPOAE i/o functions at 1 kHz were measured in adults aged 18 to 31 years (N = 33) with clinically normal audiograms. DPOAEs were recorded using an f2/f1 ratio of 1.22, with primary levels ranging from 20 to 70 dB SPL in 5 dB increments. FM detection thresholds were measured for a 1-kHz carrier with 2 Hz and 20 Hz modulation frequencies, presented at 70 dB SPL.
Results:
Preliminary analysis suggests reduced DPOAE estimates cochlear compression in individuals with EHF hearing loss. Statistical analyses examining (1) the relationship between EHF thresholds and DPOAE i/o functions and (2) the association between DPOAE i/o functions and FM detection thresholds (FMDTs) will be presented.
Discussion:
Reduced cochlear compression associated with EHF hearing loss suggests that subclinical cochlear deficits may manifest as variability in DPOAE i/o functions. Whether this variability influences performance on suprathreshold tasks will be discussed. This study bridges the gap between subclinical cochlear damage, cochlear nonlinearity, and functional hearing outcomes, providing insights into the auditory deficits that can occur despite normal audiograms.
Direct Characteristics-Based Design of Filterbanks for Perceptual Studies and Speech and Hearing Technologies
ABSTRACT. We develop methods for auditory filterbanks design that directly use specified frequency-domain filter characteristics such as peak frequencies, various bandwidths, and group delays. This ability to directly design filters based on specified characteristics enables (1) accurate design of auditory filters, and (2) systematic studies of the dependence of findings of perceptual studies on filter characteristics. In contrast, existing filter design methods generally do not allow for design based on filter characteristics, erroneously design higher-order auditory filters based on formulas for quality factors of second order filters, or offer limited control over filter characteristics.
We design auditory filters by direct specification of any two frequency-domain characteristics alongside peak frequency and peak magnitude – e.g. n-dB quality factors, ERB, and group delay. We derive closed-form parameterizations of the transfer functions in terms of sets of filter characteristics which ensures direct control over behavior and enables systematic variation. The methods are accurate, direct, simple, stable, computationally efficient, avoid iterative processes, and are appropriate for various species. To our knowledge, no existing methods exhibit this degree of accuracy, simplicity, and control.
Our approach enables specification of various sets of characteristics depending on the needs of a particular study. For instance, it enables simultaneous specification of mixed magnitude- and phase-based characteristics (e.g., ERB and group delay), allowing for sharply tuned responses without excessive group delay and for control over both frequency selectivity and synchronization in filterbanks. It also enables control over the shape of the frequency response magnitude – e.g. through simultaneous specification of 3dB and 15dB quality factors.
The characteristics-based methods for direct design of auditory filter addresses two critical needs: accurately designing auditory filters based on simultaneous specifications on characteristics, and systematically studying how varying characteristics affects the findings of perceptual studies and technological advances. For instance, the methods (using our open-source code) may be used to systematically investigate the dependence of study outcomes on filter characteristics in studies reporting sensitivity based on ad hoc variation of certain characteristics. These include intelligibility scores of bandpass-filtered speech, accuracy of speech recognition, sound source localization models, mutual information between articulatory gestures of vocal tracts and acoustic and perceptual features, and accuracy of speech intelligibility models for cochlear implants. The methods may be extended to incorporate specifications on combined spectrotemporal characteristics as is relevant for studying certain perceptual functions. This work may also be used to understand the cochlea's role in perception via underlying unified models.
[RETRACTED] The Long-Term Stability of the Speech-to-Song Illusion
ABSTRACT. Despite sharing many structural acoustic similarities, music and language are easily distinguishable for the average adult listener and give rise to distinct percepts. In the Speech-to-Song (STS) illusion, multiple repetitions of a natural spoken utterance can give rise to a perceptual switch wherein the stimulus begins to sound like song to the listener. Anecdotally, once a speech excerpt transforms to song listeners report that even after long delays, they experience the stimulus as song when they reencounter it, suggesting the STS illusion is temporally stable. However, to our knowledge, the long-term stability of the STS illusion has not yet been experimentally demonstrated.
In our study, we presented listeners with excerpts known to elicit the STS illusion and asked them to rate the degree to which each repetition sounded song-like for ten repetitions, across two sessions that occurred over delays from 0-56 days. At each session, we assessed the strength of the STS illusion by measuring ratings of each stimulus after repetition (“trial-final ratings”) as well as how many repetitions were necessary to elicit the illusion (“transformation position”). We also examined participants’ detection of subtle pitch manipulations using a same-different discrimination task.
Across sessions and regardless of delay duration, the number of repetitions needed to elicit the illusion decreased, and overall mean ratings of song-likeness increased. Moreover, listeners were better at detecting pitch changes that deviated from rather than conformed to the Western musical scale structure across all stimuli, and improved across sessions only for stimuli rated as = > 3 on a scale of 1 (exactly like speaking) to 5 (exactly like singing).
These results provide strong support for the claim that once a stimulus transforms to song for a given listener, a stable, music-specific perceptual memory of that stimulus is formed that can be elicited upon future encounters. Moreover, these findings provide valuable insights into how the auditory system forms and maintains stable representations of auditory stimuli based on prior experience. Finally, understanding how the auditory system processes musical and linguistic stimuli may have implications for the development of effective hearing devices or for diagnostic assessments for individuals with communicative deficits.
The association between socio-economic and lifestyle factors and auditory function in middle-aged adults.
ABSTRACT. Socioeconomic and lifestyle factors may increase risk for experiencing age-related hearing loss. This study aimed to determine the associations between various indicators of socioeconomic status and lifestyle, and hearing ability. To gain insights which could help audiologists identify high-risk demographics and ultimately contribute to addressing inequalities in hearing health.
A sample of 274 45-65-year-olds (M age = 53.84, SD = 5.91) were recruited based on Office for National Statistics (ONS) income groups using Prolific (www.prolific.com). Participants completed online questionnaires on demographics, lifestyle and socioeconomic factors, including age, gender, ethnicity, region of residence, income, education, occupation, exercise frequency, height and weight, smoking status, and weekly alcohol consumption. Participants also self-reported their auditory function using the Speech, Spatial and Qualities of Hearing Scale 12 (SSQ-12 (Noble et al., 2013)) and completed an online digits-in-noise (DiN) task to assess speech perception ability.
Multiple regression models were conducted to investigate the associations between socioeconomic and lifestyle factors and the outcome variables of self-reported auditory function (SSQ-12) and speech perception ability (DiN). We observed that being a regular smoker/tobacco consumer was significantly associated with worse self-reported auditory function (SSQ-12 scores) as compared to never-smokers. There were no significant predictors of speech perception ability (DiN task scores).
These data suggest that tobacco consumption may be associated with perceived hearing ability. It is possible that increased oxidative stress, induced by smoking, damages the inner ear, affecting hearing ability. However, we did not find an association between behavioural speech perception ability and tobacco consumption; which raises questions about the underlying mechanism. It is also possible that the measure of speech perception ability (DiN scores) was not sensitive enough to detect variations in hearing ability within our target sample. Tobacco consumption rates are higher among disadvantaged groups, which relates to previous findings that lower socioeconomic status correlates with hearing loss. Future research should explore the neurobiological mechanisms behind the effect of lifestyle and socioeconomic factors, such as smoking, on auditory function.