Research Article - (2018) Volume 9, Issue 3
Keywords: Click-evoked auditory brainstem responses; Speechevoked auditory brainstem responses; Cortical auditory responses; Subcortical generators; Cortical generators
How speech and non-speech sounds are processed at the subcortical and cortical levels is relatively poorly understood. This knowledge is important for evaluating hearing impairment and providing patients with optimal rehabilitation strategies and communication training, especially for young children or people unable to provide reliable feedback on their hearing experience. A first step toward achieving this goal is to elucidate the biological mechanisms that underlie auditory processing. The currently available methods for studying these mechanisms include both subjective and objective techniques with varying degrees of invasiveness. Currently, auditory evoked responses represent the optimal compromise for a thorough assessment that is also noninvasive. Auditory evoked responses stem from the neural activity generated by subcortical regions located in the brainstem (auditory brainstem responses, ABRs), the thalamus (middle latency responses), and also cortical generators (cortical auditory evoked potentials, CAEPs) [1,2]. Analyses of the response morphology of ABRs and CAEPs offer a temporal window to noninvasively observe the neural representation of speech processing and how the subcortical and central auditory mechanisms interact.
For years, auditory evoked potentials have been used in children and adults to explore in a noninvasive, reliable manner the neural transmission of various types of stimuli, including clicks , chirps and tone-bursts [4-6]; steady-state signals such as amplitude modulated (AM) tones in healthy hearing [7,8] and rehabilitated patients  have also been employed. While these stimuli can be easily implemented in a clinical setting, they do not reflect the complex nature of the information transmitted and integrated by the auditory system during traditional daily communication [10,11]. Toward bridging this gap in both research and clinical communities, using speech-elicited auditory brainstem responses (speech ABRs) has become of growing interest [12-16]. Speech ABR is an objective, noninvasive electrophysiological approach for studying auditory neural coding at the brainstem level [13,14,17]. This neural response to speech includes both a transient response (onset response, OR) to the non-periodic part of the stimulus and a sustained phase-locked response (frequency following response, FFR) to the periodic portions [18,19]. Importantly, the FFR has been found to be highly replicable [20,21] and can provide robust biomarkers of auditory processing at the brainstem level in humans [13,14,19,22-25] as well as top-down interactivity of the auditory system [17,26]. These reports emphasize the role of the brainstem as a hub of interconnected ascending and descending pathways, prone to neural adaptation in response to learning . Studies have focused on the relationship between subcortical encoding and CAEPs  and their potential integration into clinical practice [29,30]. Despite great strides toward understanding the anatomical and functional organization of the auditory brainstem system, its interconnectivity with cortical structures remains only partially understood. The mechanisms cited for accurately encoding the many speech-related acoustic cues are still largely speculative. Indeed, controversies remain regarding the relationship between OR and click-elicited wave V, some arguing similar mechanisms at the subcortical level while others suggest different encoding processes [10,22]. In line with the uncertainty related to wave V of the click-evoked potentials and OR of the speech ABR generating mechanisms, the location of FFR generators in the subcortical levels is also still debated. Some reports suggest the FFR emerges from regions lower than the inferior colliculus [31,32], while others propose a strong contribution from the inferior colliculus [33-36]) supported by FFR latency analyses [37,38]. Previously, a magneto-encephalographic study proposed an additional right hemispheric predominant contribution from the auditory cortex at the 100 Hz FFR fundamental frequency . Conversely, other recent reports favor the idea of the FFR representing a composite of activity from different sources in the auditory system . Although one may argue that previous studies correlated speech ABR and cortical encoding , there is still a lack of knowledge regarding potential clinical use of EEG techniques to assess the topographical and qualitative relationship between click and speech ABR as well as the connectivity between brainstem and cortical speech auditory potentials.
We have developed methodology based on the high temporal accuracy of a multichannel EEG system and the information this generates allows for advanced processing and analysis methods that may be used in children. Therefore, we aimed to compare click- and speech processing as a function of intensity at both the brainstem and cortical level. We performed a direct investigation of the relationship between subcortical and cortical activity for the wave V of the click ABR, the OR, and the FFR of the speech ABR. Since we found interesting differences between the characteristics of the OR and FFR of the speech ABR and CAEP, we conducted an exploratory analysis in adult listeners using validated source modeling techniques  to identify their underlying generators in the brainstem.
Eight French native young adult speakers (mean age: 24.7 years, SD: 0.88 years) with similar educational levels participated in the present study. All participants provided signed informed consent documents prior to their enrollment. This study and its related methods were approved by the University Hospital Research Ethics Committee of Lausanne (#PB_2016-02008) and were performed in accordance with the ethical standards and good practices guidelines as put forth by the Declaration of Helsinki.
No participant had been diagnosed with a hearing, language, or neurological disorder. In order to avoid the influence of musical training on speech processing, participants with formal musical training were excluded . All participants were strongly right-handed (>72% laterality) according to the Edinburgh Handedness Inventory . Prior to inclusion, participants underwent clinical examination consisting of an otoscopy, otoacoustic emission test, and audiometric testing to ensure typical hearing thresholds from 125 to 8000 Hz on both sides and to determine pure-tone averages (PTAs) for each participant at 500, 1000, 2000 and 4000 Hz (mean=8.8; SD=3.3; max=15 dB HL).
Stimulation setting and stimuli characteristics
Setting: stimuli were sent using a SoundBlaster Audigy® X-FI 5.1 Surround Sound Card and delivered to the insert earphones. To avoid any time delay between the recorded brainstem and cortical signals, the soundcard was connected to a trigger that delivered a Transistortransistor logic impulsion to the EEG recording system. For all EEG recordings, participants sat comfortably in an electromagneticallyshielded soundproofed room while watching a subtitled movie without a soundtrack . In order to avoid any attention-induced modification of neural activation in the auditory cortex, participants were instructed not to focus on the sound .
We recorded ABRs to clicks and speech stimulus tokens at five intensities to better match settings commonly used in clinical practice and cortical responses at three intensities. To avoid any stimulus artifacts, ER-3A Insert Earphones (Etymotic Research, Elk Grove Village, IL, USA) were used. Auditory brainstem and cortical potentials were scalp-recorded separately in response to both click and speech stimuli. To emulate realistic conditions and obtain larger and more robust responses, binaural stimulation was used [41,44,46].
Click stimulation: Similar to conditions routinely used in clinics, clicks of 200 μs with a repetition rate of 20/s were presented in alternate polarity . These clicks were delivered binaurally through the insert earphones along a seven-step intensity continuum from 60 dB SPL to 0 dB SPL at the subcortical level and from 60 dB SPL to 30 dB SPL for cortical responses (according to the hearing threshold (dB HL) previously identified for each participant). A total of 2000 epochs were presented to the listeners for subcortical responses. Cortical responses were elicited from 300 epochs (alternate polarity) of the same click with a random jitter of 200-300 ms (to avoid α-band entrainment) and an average inter-stimulus interval (ISI) of 750 ms.
Speech stimulus: Given evidence illustrating the importance of using natural sound , we used a 202 ms length natural consonantvowel (CV) /ba/ syllable (/b/=110 ms; F0: 200 Hz; F1: 750 Hz; F2: 1500 Hz) for both subcortical and cortical recordings. The CV syllable / ba/ was chosen based on both clinical evidence of adult phoneme perception’s dependence on the subject’s native language [49,50] and its common use in the French literature regarding speech ABR [38,51]. Moreover, in line with a possible future application in infants, the choice of a natural voice quite similar to a mother’s seemed highly relevant. The /ba/ syllable was binaurally presented through ER 3A Insert Earphones (Etymotic Research, Elk Grove Village, IL, USA), 2000 epochs (alternate polarity to enable canceling of the cochlear microphonic ), 3.1/s with an ISI of 75 ms for subcortical stimulation and average ISI of 750 ms for cortical stimulation (jitter 200 ms). Stimulation intensities for both subcortical and cortical responses ranged from 30 to 60 dB SPL in 10 dB steps. The test presentation order was counter balanced across intensities and stimuli.
EEG Recording and Preprocessing
Recording: EEG data were recorded from 32 channels using an actiCHamp EEG recording system with actiCAP active electrodes (Brain Products GmbH, Germany) with electrode impedance kept below 25 kΩ (thereby preventing overly noisy recordings). EEG signals were referenced against Fz, amplified by an actiCHamp amplifier (Brain Products GmbH, Germany), sampled at 10 KHz and stored for offline analysis. In order to optimize recording length, brainstem and corticalevoked responses were collected separately .
Subcortical potentials: Click- and speech-evoked potentials were obtained by averaging EEG epochs from -25 to 25 ms (click) and -25 to 300 ms (/ba/) post-stimulus onset. Traces were filtered between 80 and 2000 Hz (Butterworth filter, notch filter 50 Hz). Epochs with amplitude deviations greater than ± 80 μV in any channel were considered artifacts and thus rejected. The traces were analyzed using an average reference  and a classical mastoid reference (mean mastoids . Each run and recording included the responses to 2000 clicks (alternate polarity; 40 dB SPL: 3801 epochs (3148-3976), 50 dB SPL: 3800 (3465-3959), 60 dB SPL: 3894 (3696-3995]) or 2000 /ba/ (40 dB SPL: 3706 (3438-3881)), 50 dB SPL: 3636 (3292-3911), 60 dB SPL: 3719 (3495-3928)). Validity was statistically assessed: 6 × 1 one-way ANOVA F (5, 42)=1.81; p=0.13.
Brain potentials acquisition and pre-processing: Event-related potentials were obtained from 32 active electrodes (impedances<25 kΩ, Fz reference, 0.1-40 Hz bandpass filter, notch filter 50 Hz, 1000 Hz sampling rate). For auditory ERP calculation, EEG epochs were time-locked to the presentation of the sound and spanned 100 ms prestimulus and 500 ms post-stimulus. Epochs with amplitude deviation greater than ± 80 μV at any channel were considered artifacts and were rejected. Data from ‘bad’ channels were interpolated using 3D splines . Prior to grand-averaging, data were re-calculated to an average reference and a baseline correction was applied using the 100 ms pre-stimulus period. For each participant, eight auditory ERPs were calculated following the two test conditions (/ba/ and clicks). The number of accepted sweeps per condition was (mean, range) /ba/ 40 dB SPL: 472 epochs (421-556), /ba/ 50 dB SPL: 463 (339-531), /ba/ 60 dB SPL: 483 (299-567)) and click 40 dB SPL: 478 (412-550)), click 50 dB SPL: 471 (354-530), click 60 dB SPL: 472 (286-570)). Validity was statistically assessed: 6 × 1 one-way ANOVA F (5,42)=0.08; p=0.99.
Brainstem evoked potentials: Experienced observers identified waves I-III and V for the click-evoked responses and waves V-A (peaks of the OR complex) for the speech ABRs of each subject and intensity. Observers were blinded to the test conditions. To ensure correct peak identification, OR and FFR latency were also measured using dynamic time warping  because the standard cross-correlation technique usually used at 60 dB SPL didn’t provide reliable results at 40 dB SPL. Wave V of the click ABR, wave V latency, wave A latency, VA interpeak slopes, duration and amplitude (voltage difference) of these peaks for the speech ABR were measured. Source generators as well as intensity effects were evaluated as described in the following section.
General overview: ERP analyses were performed using Cartool freeware (http://sites.google.com/site/fbmlab/cartool/cartooldownload), Python-based LINEViewer and STEN utilities (http://unil.ch/line/home/menuinst/about-the-line/software-analysistools.html). Effects were identified with an analysis procedure referred to as electrical neuroimaging [54,55], which allows for direct assessment of reference-independent global measures of the electric field at the scalp as well as distributed source estimations. Using these referenceindependent global measure analyses, we were able to differentiate effects due to modulations in the strength of responses of statistically indistinguishable brain generators from alterations in the configuration of the active generators (inferred from the topography of the electric field at the scalp).
Global field power measures: Brain microstate variations have been proposed to reflect rapid switching between neural networks . These variations are reflected in the brain’s electric field configuration  and can be ascertained by calculating the global field power (GFP [58,59]. GFP acts as a reference-independent descriptor of the potential field, allowing for determination of component latency and topographical changes across all EEG electrodes as a function of time . The ERP components were extracted using peaks of GFP. Statistical analyses were performed on GFP values at each time point. The statistical design was a repeated measures 2 × 3 ANOVA using within-subject factor sounds (click, /ba/) and intensities (40, 50 and 60 dB SPL). Temporal auto-correlation at GFP levels was corrected through applying a 150 contiguous data-point temporal criterion (15 ms at 10 KHz sampling) for the persistence of differential effects .
Topography consistency testing: The topographical change occurring in subsequent potential field distributions was analyzed using global dissimilarity (DISS) . The DISS is directly correlated to the spatial correlation coefficient and provides a measure of topographic instability between two electric fields. DISS values at each time point were compared with RAGU software  and non parametric repeated measures 2 × 3 ANOVA using within-subject factor sounds (click, \ba\) and intensities (40, 50 and 60 dB SPL). In addition to this time-based measure, a topographic consistency test (TCT)  was conducted across six conditions for each time point. Based on this analysis, we ascertained the consistency of the observed effect across subject and for each condition. To account for temporal auto-correlation, only effects (p<0.05) persisting for at least 150 time frames (>15 ms at 10 KHz sampling) were considered reliable .
Source estimation: To estimate generator sources involved in click and /ba/ processing, we conducted source estimation analyses at both subcortical and cortical (ERP) levels. The LAURA algorithm was used to estimate the neural sources of the electric signal recorded at the 32-head surface active sensors (31 recording channels and one reference electrode) by using an inverse solution matrix consisting of 5104 nodes equally distributed within the grey matter of the Montreal Neurological Institute (MNI) average brain and generated with the Spherical Model with Anatomical Constraints (SMAC; ). For each subject, 4000 epochs were randomly chosen and processed to illustrate baseline activity. The activity of each node is provided in μA/mm2 with a spatial accuracy of 6 × 6 × 6 mm [65,66]. Only activities above the 95 percentile were used for source estimation identification. The time periods used for source estimations were determined for each subject as follows: the beginning and the end of the OR component, the beginning and the end of the FFR and to ensure the entire FFR was fit, a Fast Fourier Transform to localize the FFR F0. Then, source estimations were performed using the time period -1 ms and +1 ms including the wave V, as well as the P50 and P300 periods extracted from the GFP grand mean levels.
Phase-locking activity and reproducibility of speech ABR versus click-evoked ABR
Click-elicited ABR revealed a well-defined and reproducible wave V until 0 dB SPL for all subjects. The OR and FFR components of the speech ABR were identified in the grand average of the individual neural responses and were found to be very well-defined when recorded at stimulation intensities from 60 dB SPL down to 40 dB SPL (Figure 1A). However, the FFR peak synchronization to the periodic part of the /ba/ was less reproducible at 30 dB SPL (peak’s amplitude not above the pre-stimulus amplitude) and unrecognizable at lower intensities. This observation is consistent with previous reports documenting the brainstem response elicited by speech . In contrast, speech ABR components were present and reproducible across participants from 60 to 40 dB SPL, with spectral features of the grand average phase locked up to the second harmonic, in accordance with brainstem phase-locking activity in brainstem nuclei (lower brainstem), inferior colliculus and medial geniculate body . Neurophysiological spectral information regarding encoding was clearly identified at high intensities (50 and 60 dB SPL) but partially blurred at 40 dB SPL (Figure 1B). FFR mimicked the temporal features of the /ba/ stimulus with a 15.9 ms (± 1.3) interval at 60 dB SPL that decreased as intensity increased ((16.6 ± 1.5) at 50 dB SPL; (17.5 ± 1.9) at 40 dB SPL (Figure 2A). OR and click wave V latencies shortened as stimulation intensity increased (Figures 1A-1C, 2B and 2C). At intensities lower than 40 dB SPL, the OR and FFR components were not clearly identified and poorly reproducible. Unlike the wave V evidenced until 0 dB SPL, consistent with behavioral hearing testing data, the OR and FFR require intensities above 40 dB SPL. Given that brainstem responses elicited by a 30 dB SPL /ba/ stimulus were not reproducible across participants, only components recorded at 40, 50 and 60 dB SPL were further evaluated.
Figure 1: Single subject onset response (OR) and Frequency Following Response (FFR) components, in response to the /ba/ stimulus (upper row), were clearly identified on temporal representation from 40 to 60 dBSL (A). Corresponding spectrograms (B) show elicited activity in the F0 bandwidth. Click responses of the same subject at the 3 corresponding intensities are shown in C.
Figure 2: Effect of intensity on recorded latencies of OR and FFR components of the speech-elicited ABR and of wave V of the click-evoked ABR. Bar plot representations of the stimulation intensity effect on FFR (A, SD=1.92 at 40 dBSL, SD=1.5 at 50 dBSL and SD=1.30 at 60 dBSL), OR (B, SD=0.77 at 40 dBSL, SD=0.87 at 50 dBSL, SD=0.61 at 60 DBSL) and wave V latencies (C, SD=0.82 at 40 dBSL, SD=0.74 at 50 dBSL and 0.68 at 60 dBSL). Pairwise t-test values are provided. Linear regression revealed direct linkage between wave V latency and OR latency evolution patterns (R=0.414; p=0.044). Also shown is the 95% confidence interval of the regression.
Subcortical linear relation between waves V and OR as a function of intensity
Speech ABR and click ABR component latencies varied by intensity. Latency of the FFR start tended to become shorter relative to stimulation intensity increase (mastoid referenced RMANOVA F (2, 14)=6.84; p<0.01; average referenced RMANOVA F (2, 14)=6.45; p=0.01), with a significant decrease in latency from 50 to 60 dB SPL (mastoid referenced p=0.007, Figure 2A; average referenced p=0.01). Similar results were found regarding the FFR start whether mastoid or average referenced, with s significant decrease in latency from 60 to 40 dB SPL (average referenced p=0.02) but not between 40 and 50 dB SPL (average referenced p=0.41). A significant effect of stimulation intensity on both latency of onset and wave V latency was found overall and between each of the three intensity conditions (onset, mastoid referenced F (2, 14)=30.09; p<0.01, all pairwise p<0.02, average referenced F (2, 14)=15.92, p<0.01, all pairwise p<0.03); wave V, F (2, 14)=51.27; p<0.01) (Figures 2B and 2C). The onset duration (waves V-A of the speech ABR) and slope did not vary as a function of stimulation intensity but its amplitude showed a tendency to decrease with decreasing intensity (p=0.08). Although FFR component amplitudes tended to decrease as the intensity decreased (Figure 1A), this relationship was not statistically significant. As shown in Figure 2D, latencies of OR and wave V across intensities correlated (R (23)=0.414; p=0.044).
Brainstem source location analysis of the neurophysiological mechanisms involved in click and speech processing
Spatiotemporal mapping of brainstem auditory responses: Brainstem click and speech ABR were characterized at 40, 50 and 60 dB SPL. Three time periods of activity were distinguished, corresponding to the time range of the wave V of the click-evoked response, the speechevoked FFR and OR components. Despite some inter-subject variability, source localization methods revealed a common spatiotemporal pattern of activities involving the upper brainstem (midbrain) in the inferior colliculus area (Figure 3). At 60 dB SPL, during the time frame of the wave V, we observed a progressive activation of generator sources in the dorsal part of the upper brainstem in the inferior colliculus with a similar strength of brain activity as the onset response. However, the strength of the activity related to the subcortical source generators involved in FFR processing was 50-fold lower than the OR and wave V. Even though the source generators of the OR were mostly identified in the dorsal upper brainstem, some activity was revealed in the ventral part of the midbrain (Figure 3, upper panel). Activity related to the FFR was predominantly localized to the caudal part of the upper brainstem. Although inverse solution did not differentiate the location of the OR and FFR in the dorsal part of the upper brainstem, sources involved in FFR processing exhibited lower subcortical activity intensity compared to the OR (Figure 3).
Figure 3: Identification of generator source performed using a distributed linear inverse solution (ELECTRA) applying the local autoregressive average (LAURA) regularization approach to address the non-uniqueness of the inverse problem. Mean of each individual’s baseline-corrected LAURA source imaging are presented with generators of the OR (black arrow, upper panel), FFR (black arrow; middle panel) and wave V (black arrow; lower panel) are presented in sagittal (left) and axial (right) views. All views depict mean average activity in response to a 60 dBSL stimulus (either click or /ba/) while subjects were watching a silent movie (inducing the occipital activation seen on the different views).
Brainstem source generators involved in OR and FFR processing show different sensitivity to stimulus level: Source estimation analysis one-way repeated measures ANOVA (condition=three stimulation intensities) revealed a main effect of intensity related to the onset processing in the hypothalamus (Figure 4A). FFR processing was more sensitive to stimulus intensity in the thalamic area, predominantly in the right thalamus (Figure 4B). Post hoc analyses of the effect per intensity revealed distinct activity patterns between OR and FFR processing (Figures 4C and 4D). While brainstem sources involved in OR processing showed sensitivity to stimulus intensity variation between 50 and 60 dB SPL, the activity related to the FFR generator sources varied greatly at lower intensities, between 40 and 50 dB SPL (Figure 4E).
Figure 4: Source estimation analysis: one-way repeated measured ANOVA (three stimulation intensities) revealed a main effect of intensity from 40-60 dBSL on the time period related to the OR (A) and to the FFR processing (B). Panels C and D display the mean (SD) scalar values within the cluster and across subjects and stimulus intensities for the OR (moy=2.83*10-5 ± 1.39 (40 dBSL), moy=2.21*10-5 ± 1.07 (50 dBSL), moy=4.17 *10-5 ± 1.61 (60 dBSL)) and FFR (moy=3.36*10-7 ± 2.39 (40 dBSL), moy=5.94*10-7 ± 3.47 (50 dBSL), moy=5.77*10-7 ± 2.71 (60 dBSL)).
Cortical click-elicited response versus speech-evoked potential: Suggested patterns of processing
Cortical response peaks P50, N100, N200, P200, P300 and N400 for each stimulus and intensity (/ba/, click at 40, 50, and 60 dB SPL) are presented in Figure 5A. The 3 × 2 time-wise RMANOVA (p<0.05, >15 ms) at GFP level showed a main effect of intensity over 57-82.1 ms and 122.5.3-146.8 ms (Figure 5B) and a main effect of sounds over 27.6-57.5 ms (P50 component period) and 179.9-500 ms (P300-N400 component period) post-stimulus interval (Figure 5C). Visual inspection of significant periods for the main effect of intensity revealed a difference between 40 dB SPL and the other intensities only for the first period (Figure 5B). The second period showed an association between intensity and GFP values (60>50>40 dB SPL). Visual inspection of relevant periods for the main effect of sound (Figure 5C) showed bigger GFP for clicks versus /ba/ for the P50 component period, but the P300-N400 component shows bigger GFP for /ba/ than for clicks.
Figure 5: Grand mean Cz waveforms recorded with a mastoid reference (A) across all three conditions in response to the click stimulus (left panels) and /ba/ (right panels). (B) Mean GFP cortical activity in response to speech at 3 stimulation intensities. Significant effect of stimulus type is seen between two time periods 57- 82.1 ms and 122.5-146.8 (green bands). (C) Mean GFP cortical activity in response to speech and click. Significant effect of stimulus type is seen between two time periods: 27.6-57.5 ms (P50 component period) and 179.9-500 ms (P300-N400 component period). GFP activity in response to click stimulus is higher between 50 and 80 ms whereas GFP activity in response to /ba/ is higher after 205 ms.
The 3 × 2 time course analysis of variance (TANOVA) analysis provided similar results regarding the main effect of sound related to GFP. In addition, consistency of statistical topography maps across subjects and for each experimental condition showed the influence of sound (click and /ba/) on the main effect of sound periods (Figure 6B). Indeed, results provided evidence for a consistent pattern of active sources related to the P50 component period was consistent across all intensities for the click sounds conditions. In contrast, the P300-N400 component period was consistent for the /ba/ sound conditions across all intensities (Figure 6B).
Figure 6: (A) Statistical analysis of topographical differences as a function of time (TANOVA analysis) revealed four time periods of statistically significant topographical differences between click and /ba/ (green bars). (B) Consistency analysis within each stimulus and condition (2 stimuli, 3 stimulation intensities) depicted the reproducibility of the topographical effect across participants as a function of time (significant periods: green bars). Analysis across intensities (lower row) shows a major effect around the P50 component period for the click and around the P300-N400 components period for the /ba/ condition. (C) Source location analysis over P50 and P300 periods (lower right panel) revealed greater activity in the right hemisphere in response to the click and in the left hemisphere to the /ba/.
The GFP and topography main effect of sound (click and /ba/) for the time periods of interest were used to define the source estimation parameters. The consistency analysis showed a clear relationship between the P50 component periods to the click sound condition, while the P300-N400 component period was related to the /ba/ sound condition. As expected, P50 component source estimation showed higher brain activity located in the right auditory cortex in response to the click (Figure 6C). In contrast, response to speech (/ba/) displayed maximum activity at P300-N400 located in the left auditory cortex, consistent with the well-known leftward specialization for linguistic function. Results showed the location of the generators involved in click versus /ba/ processing under typical circumstances. Cortical areas activated during /ba/ versus click processing were statistically different (Figures 5C and 6A) and these results were consistent across all participants (Figure 6B).
Neurophysiological encoding at the brainstem level influences cortical processing
Given the aforementioned timeline and topography of speech and click processing in the cortex, a linear regression time-wise analysis was performed to evaluate the relationship between cortical GFP peak activity and brainstem encoding characteristics. There was a significant negative linear relationship between OR latency and GFP (cortical) over the 125.9-149.4 ms period (max at 139.6 ms; p<0.05; >15 ms) and a positive trend around 273.4 ms (Figure 7A). This demonstrated with a high temporal precision that an increase of OR latency induced a decreased activation related to the P100-N100 peak activity (139.6 ms; R (23)=-0.596; p<0.01; Figure 7B). Similarly, P300-N400 peak activity tended to increase as the OR latency increased (273.4 ms; R (23)=0.354; p=0.089; Figure 7C). However, no effect on the OR slope, OR duration, FFR latency, FFR duration or FFR amplitude was found (all p>0.05). We hypothesized that increases in wave V latency would correspond to statistically significant increases in cortical activity related to the P50 peak. All participants showed significant changes in P50 and P100 periods over a time period <10 ms (p<0.05).
Figure 7: Estimation of the relationship between onset latency and cortical GFP in response to the /ba/ stimulus. (A) Pearson correlation between GFP at each time point and significance thresholds (dotted line). The latency on the OR showed one significant time period (>10 ms duration) between 125.9-149.4 ms and a trend around 273.4 ms. The linear regression analysis revealed a relationship between the latency of the OR at the subcortical level and the GFP activity (cortical) at 139.6 ms (R(23)=-0.596; p<0.01) (B), as well as at 273.4 ms (R(23)=0.354; p=0.089) (C).
In this study, we investigated subcortical and cortical encoding of click and speech ABR at different stimulation intensities in healthyhearing young adults. Findings suggested (1) a linear relationship between wave V and OR processing at the brainstem level that confirms shared mechanisms between the two components but with distinctive additional processing for the /ba/ stimulus, (2) location of the source generator of wave V, OR and FFR in the upper brainstem, (3) a robust effect of intensity in the thalamus and the upper brainstem for FFR and OR processing respectively, thereby providing direct evidence for differential processing of ORs and FFRs at the brainstem level, and (4) a positive temporal relationship between OR latencies at the brainstem level and cortical processing.
Wave V relationship to OR processing in the brainstem and comparison to previous studies of sound level on speech ABR
The present data regarding wave V and OR patterns suggest that while they share some underlying processing mechanisms (statistical correlation (Figure 2D) and a similar range of latencies), there is also a separate component distinctive to the processing of the /ba/.
Since the early report by Picton et al.  that described the OR by analogy to click ABR as the result of inferior colliculus cell activity, several reports have suggested similar generators between wave V and OR. However, the latencies of wave V and OR differ. The click ABR wave V typically occurs at 5.47 ms from stimulus onset for adults while mean OR latencies vary from 5 to 10 ms . The existence of a distinctive mechanism is corroborated by animal studies [70,71] and modeling studies in well-hearing adult humans . Similarly, previous works in learning-impaired children revealed typical click-evoked latencies while speech ABR latencies were disrupted , suggesting this process can be compromised in children with delayed speech ABRs . Another facet is related to the structures of the stimulus itself: a speech stimulus such as the /ba/ syllable is a longer, complex stimulus compared to a click. Moreover, stop consonant identification relies on multiple acoustic cues conveying, for instance, voicing and manner of articulation (both primarily conveyed in the temporal domain with possible contributions from spectral cues) and factors related to the place of articulation (encoded in the spectral domain [73-75] but with less high-frequency information than a click stimulus. A syllable evokes an OR (waves V-A) that is a transient complex component including responses to the onset of the sound, the onset of vocal cord vibration, and the offset of the sound . This complexity was highlighted in this study by the lack of reproducibility of the OR below 40 dB SPL whereas wave V of the click ABR was reproducible down to 0 dB SPL. In addition to revealing, that high stimulation intensity is mandatory for precise analysis of speech ABR components, this also corroborates the concept that understanding language and speech requires higher intensities than the perception of sounds (clicks). An additional argument is made by Song et al. : the encoding of click and speech auditory stimuli requires recruitment of different neural populations. The click-evoked brainstem potentials witness the integrity of the cochlea and ascending pathway while the speech ABR provides insight into the quality of the neural processing of complex sounds.
Source generators of the FFR, OR and wave V are located in the brainstem
The present data reveal the brainstem locations of the neural mechanisms responsible for different acoustic aspects of speech sound processing. Although it is generally accepted that the inferior colliculus houses click ABR wave V generators, the location of the FFR generators remains debatable. Electrophysiological recording of the latencies and spectral components of the FFR provide indirect evidence of a generator located in the brainstem [37,38,70,76-78]. Animal studies focusing on the discharge rate in the different nuclei strengthened the hypothesis of IC involvement . A study using MEG suggested a contribution of the auditory cortex in the FFR consisting of cortical and subcortical components . Other works support the concept of subcortical and cortical FFR generators [80,81]. However, evidence has also been reported for EEG activity to emerge from upper brainstem generators for the OR and FFR . In Bidelman’s report , the location, strength, and orientation of the generator source involved in speechevoked FFRs were estimated using only a single pair of dipoles and thus may not have been able to dissociate multiple sources. Of note, it was previously suggested that dipole sources and source analysis should be further validated using co-recordings of MRI and functional EEG. The present data corroborate this hypothesis by suggesting generator sources of both OR and FFR are located bilaterally in the caudal part of the upper brainstem. Our source imagings results provide further evidence for human FFR sources arising bilaterally from the IC .
However, there are some limitations to our study. We used a passive listening set-up with a unique stimulus frequency to match the conditions relevant to clinical applications. However, the present stimulation intensity is above the F0 intensity generally used in the English literature [83,84]. Moreover, a 200 Hz stimulation is known to be above cortical neurons’ phase-locking [85,86]. Together, these data and our source imaging results corroborate the idea of a stronger subcortical response to stimulus frequencies over 100 Hz . As previously suggested , the relative involvement of the subcortical and cortical FFR sources may vary with stimulus frequency. Of practical implication when considering the clinical investigation of hearing impairment location, the FFR should be conceptualized as a spin-off from different generators whose involvement vary according to the stimulus characteristics [39,87,88]. Therefore, a 200 Hz F0 speech stimulus could be considered as a way to better separate subcortical from cortical FFR.
OR and FFR processing involve different structures of the auditory system
Consistent with previous reports [13,38,44], OR and FFR latencies increased with decreasing intensities, suggesting a common mechanism between OR and FFR processing. Although OR and FFR are thought to be processed in the same brainstem nuclei, the existence of distinct underlying mechanisms has been previously suggested [34,38,89-92]. We report for the first time results that suggest different sensitivities to intensity between OR and FFR at the brainstem level with the OR requiring generally higher levels to elicit sufficient neural synchrony for their generation. Furthermore, OR and FFR exhibited direct relationships to two distinct areas (hypothalamus and thalamus respectively). Recent data showing the FFR to be different from the succession of wave V , as well as our present data, provide evidence that the onset and FFR components rely on different mechanisms at distinct portions of the auditory system, a finding directly visualized in the current study.
The thalamocortical network is known to be engaged in different brain functions including language, music and cognition . Among its numerous roles, this network relays peripheral sensory signals to the primary sensory cortex  and carries information related to tone and rhythm through separate projection channels. Involvement of the middle geniculate body was previously suggested by fMRI studies revealing sound-related activation in the cochlear nucleus, superior olivary complex, inferior colliculus and medial geniculate body [95,96]. FFR integration of sound intensity in the MGB suggests preprocessing before transmitting information through projections to the auditory cortex for cortical emotional and cognitive appraisal [97,98]. In line with this observation, the medial geniculate body (MGB) may be a part of the plasticity of subcortical encoding . The MGB has been implicated in the analysis of auditory communicative signals  as well as the processing of communicative signals loaded with emotions [100,101]. A proposed model for auditory communication  promotes the role of the inferior colliculus in decoding the spectral and temporal features of the signal, while the MGB is involved in the analysis of effect, highlighting the subcortical contribution as one of pre-processing before thorough cortical processing.
The mammillary bodies play a role in recognition memory (together with the anterior and dorsomedial thalamic nuclei) and spatial memory and learning and the hypothalamus acts as a control center for the autonomic nervous system.
Temporality of click and speech CAEP: comparison with previous observations
Clicks are predominantly encoded in the right hemisphere while speech is left lateralized. The present data are in congruence with the concept of a left hemispheric lateralization for vowel processing [102- 104]. Our data support previous work suggesting there’s a strong predisposition for speech sounds to be processed by the left hemisphere and non-speech signals by the right auditory cortex of the temporal lobe . Subsequent mapping of phonology is more left lateralized . Cortical GFP activity is mainly dependent around P50 for clicks while a speech stimulus influences the P300-N400 GFP-related activity. In the cortical area, we found greater activity strength in the P50 time frame for the click and at P300-N400 for the /ba/.
Subcortical encoding and cortical processing: disclosure of a direct temporal relationship
In the present study, we found a direct linkage between OR latency at the subcortical level and cortical GFP activity at two time points related to N100 and P300. Data gathered from MEG studies of evoked activity show possible phonological processing starting as early as 100 ms after sound onset . However, cortical semantic and lexical processing begins between 200 and 300 ms after sound onset. Therefore, we speculate that an increase in OR latencies at the brainstem level induces a decrease in P100 GFP cortical activity while it increases the cognitive effort of semantic processing (P300). Previous reports suggest a relationship between OR latency at the subcortical level and higher incidence of language processing disorders, highlighting the influence of poor neural encoding at the brainstem level on higher cortical abilities . Other studies strengthen the hypothesis of a subcortical effect on cortical speech processing, for instance, reports related to hearing in adverse listening conditions [108,109] or related to optimize subcortical encoding in musicians [110,111]. The present data strengthen previous reports [110,111] by providing a direct temporal relationship between encoding by subcortical structures and cortical activity.
Comments on the methodology and importance of present findings
The 32-channel EEG system used in the present study provided a straightforward, rapid and non-invasive sensor application for subcortical and cortical auditory potential analyses but it does not permit the differentiation between one and multiple generators in the brainstem. Auditory scalp-recorded potentials reflect the engagement of multiple subcortical and cortical networks overlapping in both space and time. As such, it is difficult to ascribe intensity-related changes to a single neural generator. Different methods are based on varying assumptions related either to the geometrical, anatomical, or the electromagnetic properties of the brain that constrain the inverse problem. Nevertheless, processing techniques developed and previously used in our laboratory [42,112] have established their reliability and reproducibility. Aside from stimulus characteristics, the underlying principles of EEG and MEG may explain some differences observed in generator contribution between studies [39,83]. EEG fundamentals provide a better ability to establish accurate localization of neural generator compared to MEG [113,114]. Therefore, the EEG technique allows for direct visualization of gross changes in brainstem activation and functional involvement of the upper brainstem in the millisecond range while the MEG technique may jeopardize deep source signal extraction and interpretation  and thereby underestimate their contribution to FFR generation. Although MEG and EEG’s distinct properties render the two modalities complementary in many respects [115,116], the operating costs of the MEG sensing technology still limit its implementation in clinical practice.
In the present report, we demonstrate that EEG is a reliable, affordable, practical, and straightforward modality applicable toward assessing the quality of speech encoding and identifying the neural generators that contribute to the scalp-recorded measures.
This exploratory study provides further information regarding the link between subcortical and cortical auditory circuitry. In addition, it showed the feasibility of a direct, noninvasive assessment of the location of subcortical generators involved in the processing of OR and FFR components. The approach described herein has great potential for enabling direct qualitative and topographical evaluation of auditory deficits and their mechanisms toward providing patients with optimized diagnoses and care strategies.
This work was supported by a FBM grant (Faculté de Biologie et de Médecine de Lausanne #29747).
We thank all the participants who volunteered their time. We are very grateful to Pr Clarke and Pr Murray for their help. Additionally, the authors thank Karen I. Berliner, Ph.D. for manuscript editing.