"Seeing Blue": The S-cone Pathway in the Mammalian Retina
ABSTRACT. The highly conserved ability of animals to detect short-wavelength signals (blue or ultraviolet light) plays a crucial role in various biological functions and ecological interactions. This detection is critically dependent on the precise synaptic connections between short-wavelength sensitive cone photoreceptors (S-cones) and S-cone bipolar cells (SCBCs), which create a distinct spectral channel. Utilizing a cone-dominant animal model, the 13-lined ground squirrel, we investigated the structures and functions of the S-cone pathway in the mammalian retina. Additionally, we employed a unique deep RNA sequencing technique to probe the molecular underpinnings of the S-cone specific synaptic wiring.
ABSTRACT. Neural processing of facial texture information is crucial for estimating health. We investigated neuronal activities in the macaque inferior temporal cortex(ITC), an area important for facial recognition and the processing of texture information, such as gloss.
Visual stimuli were eighty colored images, including monkey and human faces with varying expressions and identity and altered for gloss and style, and simple shapes filled with pixel randomized monkey or human images. These stimuli were presented to two macaque monkeys (Macaca fuscata) performing a fixation task. The activities of neuronal populations were recorded using ITC-implanted electrodes. The visual stimuli were inputted into four types of ImageNet pre-trained CNNs (AlexNet, VGG16, GoogLeNet, ResNet) for comparison.
The mean firing rates within a [50ms,249ms] post-stimulus window were calculated for each stimulus for each neuron, and centroid distances of facial expressions, identity, or texture were examined in population vectors of the firing rates. Centroid distances of facial expressions, identity, or texture were also assessed using embeddings of the fully connected layers of CNNs.
The averaged normalized centroid distances of the facial texture in the ITC (monkey faces 0.315 ± 0.04, human faces 0.405 ± 0.018) were smaller than those in CNNs (monkey 0.477 ± 0.017, human 0.567 ± 0.018), showing a larger separation among the facial texture in CNN embeddings. The ITC showed more distinct separations between monkey and human facial expressions and identities compared to CNNs. These results suggest that ImageNet-trained CNNs emphasize texture more than the ITC in their representation space.
ABSTRACT. Visual processing of faces during natural viewing typically involves making a saccadic eye movement towards a face which is first viewed, and selected, using extra-foveal vision. Previous studies have shown that this extra-foveal preview of a face can be used to make perceptual judgments about that face, after it is fixated, more accurate and efficient. However, the mechanisms underlying trans-saccadic face perception remains a matter of debate. In this study, participants performed a gaze-contingent task in which an extrafoveal face stimulus (the preview) was replaced with a tilted face (the target) during a saccade directed to it. On separate trials, the preview face was either identical to the target (valid preview) or different (invalid preview). We ran two separate experiments, with invalid previews consisting of either inverted faces or faces with a different identity. Additionally, on some trials we presented foveal or peripheral noise patches, based on previous studies suggesting that around the time of saccades visual processing combines information from the future retinal location of the stimulus, or uses foveal resources to process in more detail attended information in the periphery. The results showed a significant preview effect for face inversion, as expected, but not for face identity. The added noise stimuli had no significant main effect or interaction with preview validity, in contrast to the predictions from previous studies finding such interactions. Overall, these findings suggest that saccadic programming and processing of the extrafoveal saccade target creates a category-level prediction which underlies the parafoveal preview effects in trans-saccadic perception.
ABSTRACT. This study investigates how cultural background and encoding instructions (trustworthiness vs. distinctiveness judgements) influence own-race bias (ORB) in face recognition memory. We used eye-tracking to unveil the underlying processing strategies in a cross-cultural design. British and Malaysian Chinese participants viewed faces of both Malaysian Chinese and Caucasian origin while performing either trustworthiness or distinctiveness ratings. Following a brief filler task, a recognition memory test assessed memory accuracy for previously encountered faces. Eye-tracking data analysed fixation patterns on facial regions of interest (AOIs) such as the nose and eyes. We predict that trustworthiness encoding, compared to distinctiveness encoding, will lead to reduced ORB as reflected in recognition sensitivity (d'). Furthermore, we anticipate greater holistic processing (fixation count and duration on the nose region relative to the eyes and mouth) for Chinese faces during trustworthiness encoding. By identifying encoding tasks that mitigate ORB (e.g., trustworthiness judgements), this research can inform the development of training programs for improved cross-racial face recognition accuracy. Understanding how to minimise ORB has significant practical implications in our increasingly global and multicultural societies, potentially fostering better social and professional interactions across racial and ethnic lines.
ABSTRACT. Race often serves as a key indicator influencing how we recognize and remember faces. Perceivers use categorical information when recognising other-race faces, leading to poorer face recognition compared to own-race faces (the Other-Race Effect).
The Categorisation-Individuation Model suggests that situational cues emphasizing the importance of other-race faces promote attention to individuating details, enhancing other-race face recognition. While group motivators like gender, university, and political affiliation have been studied, the impact of national identity as a motivator for other-race face recognition has not been tested. Singapore and Malaysia, neighbouring multiracial countries, provide an ideal context to test this idea, as ethnic Chinese are the majority in Singapore but the minority in Malaysia. Singaporeans, on average, have a stronger national identity, while Chinese Malaysians have a stronger racial identity as minorities.
We investigate how national identity impacts other-race face perception. Ethnic Chinese Malaysians (N=82) and Singaporeans (N=108) read articles depicting inter-country disputes or domestic affairs to influence their national identity. They viewed Singaporean and Malaysian Chinese and Malay faces, followed by a recognition task involving the same faces and foil options. We predicted high national identity would improve recognition of same-country faces, with Malaysian participants also showing better recognition of Malay (majority-race) faces.
As expected, participants in the high national identity condition showed better recognition of same-country faces, supporting the Categorisation-Individuation Model. Interestingly, only Malaysian participants in the high national identity condition showed better recognition of Malay compared to Chinese faces, suggesting that these ethnic minority participants needed additional motivation to individuate the other-race, racial-majority, faces.
These findings reflect the dynamic nature of face perception and demonstrate how national identity can influence the motivation to individuate other-race faces. In increasingly diverse communities, the results have implications for understanding cross-race interactions, and highlight the importance of considering situational factors in face perception research.
ABSTRACT. When we look at a visual scene, salient features, such as a region of high contrast or a recognizable object, attract our attention. Here, we examined how image saliency—modeled on eye-tracking data—influences how we expect the visual attention of others to be directed. We created a 3D scene in which a cone-shaped object (the agent) ‘looks’ at an image on a screen in the scene. The observer’s vantage point was on the other side of the screen, opposite the agent. The screen was semi-transparent so the observer could see the image being displayed on the screen and the agent on the other side. On each trial, an image would appear on the in-scene screen and the agent would turn to ‘look’ at the image. The agent’s movement was controlled by a gaze pattern from a real observer, whose eye movements were recorded while they were looking at the same image as the agent or a different image. Participants (N = 24) judged whether the agent’s ‘gaze’ was matched to the image displayed on the in-scene screen. We find that participants are able to detect a mismatch between the agent’s movements and the displayed image. Discrimination sensitivity was modulated by the overlap between the agent’s gaze and the salient image features: participants struggled to identify a mismatch when the mismatched gaze aligned with the salient features of the displayed image. Further analysis suggests that participant performance was not solely driven by the salient low-level image features, with participants likely using a combination of low- through to high-level image features to determine how consistent the agent’s gaze was with the expected gaze behaviour. Our findings indicate that the same processes that drive an individual’s attention may also contribute to how we perceive other people’s visual attention.
ABSTRACT. Completely locked-in syndrome (CLIS), a condition in which individuals lose voluntary muscle control, leaving them unable to talk, move the body, show facial expressions, or even blink. In this study, we propose a novel approach for the development of a low-cost, practice-free, and non-invasive communication assistance system for individuals with impaired extremity and ocular mobility, including those with CLIS. Given the loss of traditional communication channels in CLIS patients, pupil responses become essential for effective communication. Leveraging object-based visual attention mechanisms, we investigated the potential of a method for inferring the target image based on the shifts in attention to spatial frequencies. Previous studies have shown that images with high spatial frequency (HSF) induce pupil dilation, and images with low spatial frequency (LSF) induce pupil constriction. Also, our previous investigation revealed pupil change ratio became larger upon shifts in attention to different spatial frequencies. Based on this phenomenon and attentional concentration, we attempted to predict the image that the participant is focusing on within a hybrid image. The components of spatial frequency in the images were alternated at varying frequencies to examine their pupil responses, and one or both images contained motion to attract participants’ attention. The results showed an amplified magnitude of pupil dilation and constriction when employing dynamic hybrid images rather than static hybrid images. From the pupil change ratio and the tendency of pupil changes, we estimated which image the participants were focusing on and calculated the estimation accuracy. The method generated an average accuracy of more than 88% regardless of alternation frequency or dynamic/static conditions of the stimulus. These findings show the potential for future advances in assistive technology to improve communication and quality of life for people with severe physical impairments.
ABSTRACT. Nonverbal communication, particularly eye contact, is crucial in human-human interaction. However, bridging the gap between vision science and robotics presents notable challenges in integrating eye contact functionality into robotic behavior generation. This study reviews the effectiveness of the biomimetic eye model and machine learning techniques in robots to mimic human eye-head gaze movements, specifically Vestibulo-Ocular Movement (VOM). VOM is considered primarily for stabilizing visual perception, but we assume it also plays an important role in conveying social cues like attention and interest in face-to-face interactions. Drawing inspiration from the human visual system, we introduce an innovative approach for enhancing the user’s perception of the robot’s direct gaze, departing from conventional methods. In our approach, we positioned the camera on the robot's forehead to synchronize the robot's visual field with its head movements, providing a stable subjective vision for the robot. We then used an attention-retina model to generate eye movements, combining visual attention and the structure of the retina, while coordinating head movements based on eye-head coordination. The resulting eye and head gaze behaviors effectively replicated human gaze movements with VOM, enhancing the user’s perception of the robot's direct gaze, particularly with its coordinated eye and head movements. We demonstrate a specific application of this approach, anticipating its broader utility in generating varied robot behaviors and enriching human-robot interaction, thereby fostering advancements in robotics seamlessly integrated into diverse human-centric environments.
ABSTRACT. Biological motion perception (BMP) refers to humans’ ability to perceive and
recognize the actions of living beings solely from their motion patterns, sometimes
as minimal as those depicted on point-light displays. While humans excel at
these tasks without any prior training, current AI models struggle with poor
generalization performance. To close this research gap, we propose the Motion
Perceiver (MP). MP solely relies on patch-level optical flows from video clips as
inputs. During training, it learns prototypical flow snapshots through a competitive
binding mechanism and integrates invariant motion representations to predict
action labels for the given video. During inference, we evaluate the generalization
ability of all AI models and humans on 62,656 video stimuli spanning 24 BMP
conditions using point-light displays in neuroscience. Remarkably, MP outperforms
all existing AI models with a maximum improvement of 29% in top-1 action
recognition accuracy on these conditions. Moreover, we benchmark all AI models
in point-light displays of two standard video datasets in computer vision. MP
also demonstrates superior performance in these cases. More interestingly, via
psychophysics experiments, we found that MP recognizes biological movements
in a way that aligns with human behavioural data. All data and code will be made
public.
[1] The Precision Test of Metacognitive Sensitivity and Confidence Criteria
ABSTRACT. Humans experience feelings of confidence in their decisions. In perception, these feelings are typically accurate – we tend to feel more confident about correct decisions relative to incorrect decisions. The degree of insight people have into the accuracy of their decisions is known as metacognitive sensitivity. Currently popular methods of estimating metacognitive sensitivity are subject to interpretive ambiguities, because they assume people have normally shaped distributions of different experiences when they are repeatedly exposed to a single input. If this normality assumption is violated, calculations can erroneously underestimate metacognitive sensitivity. We have devised a new method of estimating metacognitive sensitivity that is more robust to violations of the normality assumption. The improved method can easily be added to standard behavioral experiments, and we have made Matlab code freely available to help other researchers implement our analyses and experimental procedures.
[3] Evidence against levels of processing theories of visual awareness
ABSTRACT. There is debate about whether awareness during visual perception occurs abruptly (all-or-none) or gradually. One influential view is the levels of processing (LOP) theory which states that visual awareness depends on the stimulus. Low-level stimuli, such as color, evoke gradual awareness, while high-level stimuli, such as object identity, elicit abrupt, all-or-none perception. A critical source of evidence supporting LOP is that self-reported perceptual clarity measures reveal more intermediate values of perceptual clarity for low- (e.g. color) than high- (e.g. letter) level stimuli. Here we provide several pieces of evidence inconsistent with this theory.
First, previous studies confound stimulus level with category-flatness. Does increased perceptual clarity of X versus blue reflect that a noisy perception of X is perceived as X due to large category priors for letter stimuli (e.g. there is no meaningful halfway point between X and M but there is between blue and red)? Consistent with this, when we generated a high-level stimulus set that lacked meaningful category boundaries (by morphing unfamiliar faces into a continuous space) our results revealed gradual awareness.
Second, by varying foil-target similarity, we show that an assumption underlying perceptual clarity measures—that they measure stimulus clarity not the difficulty of perceptual judgments—is incorrect.
Finally, existing studies do not equate performance. This is problematic because high-level stimuli have better performance, and the apparent all-or-none nature of categorical stimuli may reflect few intermediate perceptual clarity ratings due to high confidence. Consistent with this, preliminary data suggests gradual awareness for all stimuli, including letters, when task performance is equated across stimuli. Taken together, our findings reject the notion of separate awareness pathways for high- vs. low-level stimuli and suggest that gradual versus all-or-none perception depends on other methodological properties.
[5] Melanopsin modulation of the background alters our attentional state
ABSTRACT. Recently, the studies suggesting the blue light exposure alters our cognitive state and mood have much attracted notice. Blue light, the short-wavelength light is known to be much received by melanopsin-containing retinal ganglion cells, ipRGC. Since ipRGCs have been indicated to be the system of capture or coding the ambient light, it means our ambient light levels could enhance our cognition or attention. However, most previous studies have used "blue light" to stimulate melanopsin, which means that they stimulate not only melanopsin, but also cones. In other words, those results could have been confounded by the effect of the "blue" color itself on cognition as well as melanopsin. In the current study, we used the house-developed four-primary display system to control the amount of melanopsin stimulation independently. Observers were required to see the background light, which modulates melanopsin stimulation for 6 minutes for initial adaptation and start the attentional task. The conventional attentional blink task was used to see whether melanopsin stimulation affects the overall attentional levels or attentional processing of the target. The results showed that the correct rate of the T1 was better in the low melanopsin- stimulation condition. However, the correct rate of the T2 was not significantly different, and the graph shape was similar between the condition. Our results suggested that the background melanopsin modulation enhances our overall attentional state, arousal level, possibly by changing our concentration levels to the relevant tasks. Moreover, higher correct rate of the low melanopsin- stimulation condition indicated that there was "an appropriate amount of melanopsin stimulation" for our arousal level. Our results revealed that appropriate amount or duration of melanopsin stimulation, even that was just for 6 minutes, enhances our attentional level. This would help in the development of illumination environments that will benefit our life like driving or learning.
[7] Metacognition in Working Memory: The Role of Prior Beliefs
ABSTRACT. Research on metacognition in long-term memory (LTM) suggests that people infer their LTM performance based on their prior beliefs about memory ability. However, given that the contents of working memory (WM) are immediately accessible to consciousness, it is plausible that monitoring of WM contents relies primarily on memory quality itself and is unlikely to be influenced by prior beliefs. To test this hypothesis, we examined whether item size would influence prior beliefs and subsequently influence meta-WM judgments, but not objective performance, as observed in meta-LTM contexts. In two experiments, we manipulated item size and tasked participants with performing a color WM task followed a confidence report. Results from Experiment 1 showed no significant difference in objective WM performance between the two item sizes; however, participants reported significantly higher confidence for larger items. Experiment 2 further tested whether size cues influence participants' prior beliefs. By manipulating feedback, we established three different prior beliefs across participant groups: larger items are remembered less well, larger items are remembered better, and items of both sizes are remembered similarly. The results showed that in the group holding the belief that larger items are less well remembered, participants no longer reported higher confidence for larger items. Conversely, participants in the other two groups continued to report higher confidence for larger items. These findings reveal a generality between meta-WM and meta-LTM: people make metamemory judgments based on their prior beliefs about overall memory ability.
[9] The Effect of writing direction change on object-based attention in words
ABSTRACT. Due to object-based attention, attention captured by a single object can spread from the fixed gaze position to the entire area of that object. The dynamic updating hypothesis suggests that when the boundaries of the attended object change, the direction of attention spreading captured by the object is immediately adjusted to match the altered boundaries. Word stimuli composed of combinations of letters can be perceived as grouped based on semantic information. In word stimuli without physical connections distinguishing groups, the direction of attention spreading might not be immediately adjusted when the positions of the letters determining the word boundaries change. This study investigated whether the direction of attention spreading captured by the word changes according to the changed writing direction using Korean words. Two Korean words, each consisting of four letters arranged in a 2x2 grid, were presented. When the letters were read in a direction different from the writing direction, they were perceived as two non-words. A cue was presented in one of the four letters, and the target stimulus was presented either in the same position as the cue, a different position within the same word, or in a different word. There were conditions where the writing direction changed or did not change simultaneously with the presentation of the target. The experimental results showed that the response time was faster when the target was presented in a different position within the same word compared to when it was presented in a different word, based on the writing direction before the change. The results suggest that attention captured by the word might have spread according to the writing direction before the target appeared. This implies that understanding the word's meaning might be important in determining the direction of attention spreading within the word.
[11] Developmental Changes in Left-Lateralized Word-Related N170 in Chinese Children: A Longitudinal Study
ABSTRACT. Previous research using Event-Related Potentials (ERPs) indicated that the left-lateralized word-related N170 component is a crucial physiological indicator associated with proficient word processing. Cao et al. (2011) demonstrated the left-lateralized N170 emerged in Chinese children as early as the second grade of primary school. However, gaps remain regarding the developmental trajectory of left-lateralized N170 in relation to reading experience from the first to the second grade.
To elucidate the developmental characteristics of word-related N170, we conducted a longitudinal investigation involving 30 Chinese children at grade one (mean age: 7.1 years,16 males), tracking changes in word-related N170 from grade one to grade two, spanning an 8-month interval. We found that in the first grade, no significant difference existed between the amplitudes of left and right N170 within occipito-temporal brain regions. However, by the second grade, word-related N170 amplitudes between the right and left occipito-temporal cortex were evident, with the right N170 exhibiting significantly diminished amplitudes compared to the left N170. Comparative analysis between grade two and grade one data indicated that while the amplitude of left word-related N170 remained stable, there was a notable reduction in the amplitude of right N170 in the second grade compared to the first grade. Furthermore, this reduction in right N170 waveform amplitude positively correlated with increasing levels of literacy. These findings suggest that the establishment of left-lateralized word-related N170 predominantly arises from a decrease in right hemisphere activation. Moreover, they underscore the influence of reading experience on the neural substrates of word processing, with reduced engagement of the right hemisphere observed as literacy skills advance.
These findings contribute to our understanding of the neural mechanisms underlying the development of reading in children.
Acknowledgement: This work was supported by a grant from The National Social Science Fund of China(21BYY109).
[13] Effects of refractive imbalance and text difficulty on steady-state visual evoked potentials in the visual cortex and the visual word form area during reading
ABSTRACT. Introduction
Normal binocular vision (accommodation and convergence) is thought to be helpful for reading task at short distances, as visual signals from each eye can combine together to give improved vision (binocular vision summation). The aim of this study was to understand how binocular vision disruption, induced through refractive imbalance between the eyes, affects neural processing in the visual cortex (Oz) and the visual word form area (VWFA).
Method
Adult participants (≤35 years of age) who were expert readers (at least one-year Bachelor degree and no history of reading deficits) had steady state visual evoked potentials (SSVEP) recorded while reading text (easy, complex and scrambled) flashing at 5.7Hz. Refractive imbalance was simulated by wearing lenses (-2, 0 +2 DS) in front of the dominant eye over their full refractive correction. Active electrodes were placed at Oz and the left posterior parietal cortex (VWFA) to measure neural responses (amplitude at 5.7 Hz) evoked by dominant, non-dominant and binocular stimulation (viewing condition). The SSVEPs were analysed with fast Fourier transformation (low SNR data < 3.0 were excluded) and analysed using linear mixed models.
Results
Fifteen subjects (20-34 years) participated. The SSVEPs indicated that the refractive imbalance condition was not significantly associated with neural response (amplitude) in Oz and VWFA for each viewing condition (P>0.05). No significant amplitude difference was shown in Oz regarding text difficulty stimuli, however, in the VWFA, scrambled and complex text evoked higher amplitude than easy text (P<0.001 and P =0.004 respectively).
Conclusions
Refractive imbalance (simulated) had no effect on the neural response at the visual cortex or the VWFA. Reading the complex and scrambled text resulted in higher neural activity at the VWFA, but not at Oz, than reading simple text. The potential mechanism is increased cognitive burden due to the effort of reading complex and scrambled words.
[15] Effect of Pupillary Synchronization Phenomenon on Facial Impression
ABSTRACT. Pupil synchronization, the phenomenon in which one's pupil size mirrors that of the person they are observing, has been extensively studied and interpreted as a form of conforming behaviour in communication. Previous research has primarily focused on identifying the circumstances under which this phenomenon occurs, including variations in race, ethnicity, and age. However, limited attention has been given to exploring how pupil synchronization influences the impressions of observers. In this study, we aimed to investigate the effects of pupil synchronization on facial impressions by manipulating pupil synchronization and asynchronization of stimulus faces. Specifically, we conducted experiments to confirm the occurrence of pupil synchronization when observing faces with varying pupil diameters and to assess how differences in the patterns of varying pupil diameters affected the impressions formed by participants. To achieve this, we utilized stimuli comprising average faces selected from a face database. Participants were then asked to evaluate facial impressions while observing faces with synchronized and desynchronized pupil sizes. During the observation, we measured the participants' pupil responses to confirm whether pupil synchronization occurred. The results showed that the presence or absence of pupil synchronization of the stimulus face changed the subjects’ impressions of familiarity, social desirability, and activity for the stimulus face. We ensured that participants’ pupils changed along with the pupil change of the stimulus face under the expansion condition. Based on the results, we calculated the correlation between the subjects’ pupil diameter and the impression of assessment items; however, we didn’t find any significant correlation between them. These results suggest that while the evaluation of others by the subjects may change subjected to pupil synchronization, the observers’ pupil synchronization itself does not accurately reflect the evaluation of others.
[17] A comparative definition of the ventral frontal-temporal white matter fasciculi in the human and macaque monkey
ABSTRACT. In the human and the macaque monkey brain, the cross-talk between the frontal and temporal regions of the cerebral cortex is crucial for the high-level regulation of behaviors based on previously stored and current incoming perceptual information. In this work, we provide a comparative definition of the major "ventral" white matter fasciculi or bundles that interconnect the frontal and temporal cortical regions in the human and monkey brain: the Uncinate Fasciculus, and the Dorsal and Ventral Frontal-Temporal Extreme Capsule Fasciculus. We first performed a careful review of existing tract-tracing studies in the macaque monkey to elucidate the monosynaptic connections that link the frontal and temporal regions through the ventral pathway (i.e. the connections that run beneath the insula). Next, based on cortical homologies between the human and macaque monkey brain, we defined various cortical ROIs in the human brain that corresponded to the terminations of the various frontal-temporal monosynaptic connections observed in the monkey. Finally, using these homologous ROIs, we reconstructed the Uncinate Fasciculus, and the Dorsal and Ventral Frontal-Temporal Extreme Capsule Fasiculus in the human brain using diffusion-weighted MRI tractography. By providing a comparative definition of the frontal-temporal pathways in the human and monkey brain, this work will be useful in bridging anatomical and functional studies relating to the frontal and temporal regions across the two species.
[19] Reproducibility of Low Voltage Fast Activity (LVFA) across seizures Facilitates localizing the epileptogenic zone (EZ)
ABSTRACT. Low voltage fast activity (LVFA) constitutes a characteristic electrophysiological marker of the epileptogenic zone (EZ) in focal seizures of human epilepsy. However, extensive clinical practice confirms it difficult to distinguish the EZ from the propagation zone (PZ) based solely on LVFA. Drug-refractory focal epilepsies are network diseases associated with EZ, PZ and non-involved networks (NIN). Moreover, for the same patient, seizure semiology usually remains similar across multiple seizures. Nevertheless, the feasibility of employing the reproducibility of individual electrode ictal patterns, within- and between-network connectivity patterns across seizures to delineate EZ remains to be elucidated. First, a semi-automated method was proposed to precisely localize the onset time of LVFA at each electrode. Subsequently, non-linear correlation and mutual information methods were utilized to evaluate the similarity of waveforms in the time domain and the power spectrum in the frequency domain across seizures for the same electrode in the same patient, and to compare the reproducibility of the EZ, PZ, and NIN. Representational Similarity Analysis (RSA) was employed to calculate the reproducibility of connectivity patterns within and between the EZ, PZ, and NIN across seizures in the same patient. Then, Support Vector Machine (SVM) was used to measure whether these reproducibilities could assist in localizing the EZ. The individual electrodes within the EZ and their internal connectivity patterns exhibited higher reproducibility across seizures from preictal period to ictal period. Between networks, the connectivity patterns between EZ-PZ demonstrated significantly higher similarity from interictal phase to ictal phase, and even NIN-EZ connectivity patterns showed a certain degree of repeatability across seizures during ictal phase. Furthermore, the results from the SVM indicate that the reproducibility across seizures contributed to the identification of EZ. These results indicate that reproducibilities across seizures enable more precise delineation of the EZ, contributing to the control of seizure through surgical resection.
[21] Biological Motion Cues Modulate Visual Working Memory
ABSTRACT. Previous research has demonstrated that biological motion (BM) cues can induce a reflexive attentional orienting effect, a phenomenon referred to as social attention. However, it remains unknown whether BM cues can further affect higher-order cognitive processes, such as visual working memory (WM). The present study aimed to probe this issue by adopting traditional WM change detection tasks with non-predictive (50% valid) pre-cues or retro-cues. Specifically, point-light BM stimuli were used as the non-predictive central cues, which were either provided prior to the presentation of the memory items (pre-cue) or presented during the retention interval (retro-cue). We also adopted feet motion sequences as non-predictive cues to further investigate whether WM performance would be affected by local BM cues without global configuration. Results showed that, for both pre-cues and retro-cues, items presented at the location cued by the walking direction of BM were remembered better than those at non-cued locations. Crucially, this effect could be extended to feet motion cues, reflecting the key role of local BM signals in modulating WM. More importantly, such BM-induced modulation effect was not observed with inanimate cues (i.e., dot motion and arrow), which can also elicit similar attentional effects. Our findings suggest that the attentional effect induced by life motion signals can penetrate to higher-order cognitive processes, and provide compelling evidence for the existence of “life motion detector” in the human brain from a high-level cognitive function perspective.
[23] Enhanced Semantic Encoding of Biological Motions in Working Memory: Insights from Intracranial EEG Analysis
ABSTRACT. We navigate our dynamic environment through discrete yet significant events, with biological motions—movements of animate entities—being particularly salient. People demonstrate superior memory performance for biological motions compared to non-biological motions, yet the mechanisms underlying this phenomenon remain unclear. This study investigated the encoding mechanisms of working memory (WM) for biological motions using intracranial electroencephalogram (iEEG) with high temporal and spatial resolution. iEEG data were recorded from seven epilepsy patients who viewed biological and non-biological motions and were tasked with memorization.
Results from representational similarity analysis revealed that brain regions associated with semantics, motor control, and social cognition encoded biological motion information more robustly. Specifically, high-frequency iEEG signals (60–160 Hz), reflecting neuronal discharge rates, were analyzed. Comparison of these signals across brain regions during memory tasks showed stronger responses in areas linked to motion and semantic comprehension (left anterior central gyrus, bilateral middle temporal gyrus, right superior temporal gyrus) when memorizing biological motions compared to non-biological motions. Additionally, we conducted connectivity analysis and found that brain regions associated with semantics, motor control, and social cognition exhibited stronger connections with memory-related brain regions during the encoding of biological motions.
These findings suggest a deeper semantic processing of biological motion information in the brain, highlighting the role of regions involved in motion perception and semantic comprehension. This enhanced processing likely contributes to better memory encoding of biological motions, which convey vital information crucial for human survival.
[25] Effects of walking on visually induced self-motion perception
ABSTRACT. When an observer moves in depth, an expanding (or contracting) optical flow of the environment around the observer is presented to the observer's eye. Optical flow is known to induce illusory self-motion perception even in stationary observers. This phenomenon is referred to as vection. Many studies have examined the various factors affecting vection in stationary observers. This study examined the vection while walking on a treadmill. In the experiment, participants were asked to walk (walking condition) or stand (standing condition) on a treadmill and to view the expanding or contracting optical flows. The walking speed was fixed at 4 km/h, and the optical flow speed was either 4 or 20 km/h. The participants were asked to press and hold a key whenever they felt vection during the stimulus presentation. Additionally, after the stimulus motion ceased, the participnats were asked to rate the magnitude of vcetion (from 0 to 100) they felt during the stimulus presentation. Overall, treadmill walking reduced vection; on average, vection onset latency was longer, and the total duration of vection was shorter in the walking condition than in the standing condition. The inconsistent direction and speed between walking and optical flow did not affect vection.
[27] Comparative Analysis of Dynamic Vision in Action and Non-Action Video Gamers
ABSTRACT. Video games have become a major leisure activity, significantly influencing human perception through long-term exposure. Dynamic vision is crucial for responsiveness and accuracy in fast-paced game environments. This study investigated whether different video gaming experiences correlate with three dynamic visual perception abilities. The dynamic visual acuity task measured the ability to recognize moving objects (Landolt-C ring) in detail. The motion coherence task assessed the ability to integrate local moving signals by measuring the minimal percentage of local signals needed to recognize the motion direction embedded in noise motion dots. The motion suppression task measured the ability to inhibit surrounding moving information by determining the difference in duration required for participants to discriminate the direction of a moving Gabor patch in small and large visual fields. We compared three types of video game players: action gaming players (AGPs), non-action gaming players (NAGPs), and individuals with minimal or no gaming experience (Control). Results showed that AGPs and NAGPs outperformed Control in dynamic visual acuity tasks, with comparable performance between AGPs and NAGPs (AGPs = 12.31 cpd; NAGPs = 12.20 cpd; Control = 9.52 cpd). Similar performance was also observed in motion coherence tasks (AGPs = 35.05%; NAGPs = 32.99%; Control = 47.15%). However, no significant differences were found in motion suppression tasks (AGPs = 125 ms; NAGPs = 121 ms; Control = 126 ms). These findings suggest that long-term exposure to video games influences dynamic vision, enhancing performance in dynamic visual acuity and the ability to integrate local information. The similarity in motion suppression tasks among groups may stem from inadequate consideration of gaming patterns. Overall, this study offers preliminary insights into the relationship between video game types and dynamic vision, emphasizing the potential impact of this form of entertainment on visual perception.
ABSTRACT. PURPOSE: It has been assumed that the sensitivity to visual objects does not have a difference between left and right visual hemi-fields (VF), in general. However, we have found that the sensitivity to motion with isoluminant stimuli is higher in left VF than in right (Asaoka, Kojima, Yoshizawa, 2022, Vis.Res.). Thus, we have investigated if there is such a condition with monochromatic stimuli. Here we report an experimental result that the motion sensitivity was higher also in the left VF with achromatic stimuli. METHODS: We used achromatic Gabor patches of 4 deg in size. Spatial frequency (SF) of the gratings was either 0.5, 4.0, or 6.0 cpd. The five Gabors were placed along on the circumference of a half circle with the radius of 7 deg, either in the left or right hemifield. The Gabors moved together concentrically or eccentrically, so that they perceive contracting motion or expanding motion. The moving stimuli were presented for 180 ms while presenting an alphabetic number on the center of visual field, i.e. fixation point. We have measured the contrast thresholds for detecting the direction of the motion stimuli by means of the method of constant stimuli, while asking to name the number to avoid eye movement (Dual task). Fifteen naïve university students (ave. 22.87 years old), with normal or corrected-to-normal visual acuity, participated in this experiment. RESULTS and CONCLUSION: The thresholds in each condition was examined by repeated measure ANOVA. The factors of Spatial Frequencies, Motion Direction (Contraction, Expansion), and Visual Field (Left, Right) showed significant main effects (SF: p<.001, η2 =0.481; MD: p<.001, η2 =0.035; VF: p=0.015,η2 =0.006). The sensitivity difference in detecting global motion between left and right visual fields infers the hemispheric difference of visual cortical function.
[2] Prioritization in working memory via metacognitive judgements
ABSTRACT. We have control over the contents of working memory: Retro-cue paradigms reveal that explicitly cued items are prioritized, resulting in retinotopic neural enhancement and improved behavioral performance.
But how flexible is this prioritization? Here we examine whether participants can prioritize items in working memory based on metacognitive judgments in the absence of external cues. We combined the standard retro-cue paradigm, where participants selectively focus on one of three oriented lines during memory maintenance, and a ‘choose best’ paradigm (Fougnie et al. 2012), where participants use metacognitive information to select a best remembered item. Critically, the to-be-prioritized item was indicated either by an external spatial cue or by instructions to prioritize the best remembered item. These prioritized trials were compared with trials where no item was prioritized during memory maintenance and the to-be-reported item was indicated either by an external spatial cue (neutral trials) or by instructions to report the best remembered item. Participants performed better in cue-based prioritization than in the neutral condition of remembering three orientations without prioritization, replicating past retro-cue findings. Our results also reveal an ability to use meta-cognitive information. Participants performed better when allowed to choose the best item (compared to reporting a random item in neutral trials). Further, the combined condition (participants choose the best item to selectively maintain during maintenance) showed additive benefits of both prioritization and self-selection. Our findings add to a growing literature showing that participants not only have metacognitive access but can use this information to make meaningful decisions. Further, the results reveal that prioritization in working memory can be flexibly directed by purely top-down and self-directed means, similar to how attention operates in perception. In ongoing work, we are collecting neural data to investigate whether the internal metacognitive driven prioritization shares similar neural mechanisms with cue-driven prioritization.
[4] Causes and Consequences of losing visual consciousness: intracranial electroencephalography evidence in humans
ABSTRACT. In reality, even with obvious visual stimuli presented before us, we sometimes lose visual consciousness and do not respond. Using intracranial electroencephalography (iEEG), we found that instances of loss of consciousness(or miss), compared to correct hit, exhibit varying degrees of differences in the early visual cortex, ventral visual cortex, and higher-level networks. Specifically, before the stimulus appears, the power of high gamma (70-180Hz) is higher in the early visual cortex, ventral visual cortex, default mode network (DMN), and dorsal attention network (DAN) for misses compared to hits, with the greatest differences observed in the early visual cortex. After the stimulus appears, misses exhibit a higher peak power of high gamma in the early visual cortex compared to hits, and the timing of high gamma activity in both the early visual cortex and ventral visual cortex is delayed for misses. However, different patterns of high gamma activity were observed in the DAN. At later stages, differences in high gamma activity between misses and hits were also noted in the frontoparietal network (FPN), suggesting some form of higher-level network regulation over the primary cortex. These findings differ from previous observations of consciousness loss under near-threshold ambiguous visual stimuli and hold significant implications for real-life situations.
[6] Unraveling the Achromatic Dunhuang Murals: Visual Imagery for Deeper Understanding
ABSTRACT. There is a wide spectrum of visual imagery ability among individuals, with some having exceptionally vivid mental imaginations, while others may struggle to form clear mental images of visual stimuli (Pearson, 2019). These individual differences may contribute to the large variation in aesthetic appreciation of visual images. Here we aim to investigate the association between visual imagery ability and the aesthetic appreciation of Dunhuang murals, particularly the role of color in this association.
We selected 24 high-resolution Dunhuang murals and created a set of achromatic versions by altering the color. A total of 41 Chinese undergraduate students (26 females and 15 males) with normal color vision were asked to view the achromatic mural images for five seconds and then close their eyes to visualize the images for 5 seconds. They were then instructed to rate each mural on two 9-point Likert scales based on two dimensions: like vs dislike and understanding vs. not understanding. After two days, they went through the same procedure for the original 24 murals. At the end of the experiment, we evaluated participants’ visual imagery ability with Vividness of Visual Imagery Questionnaire.
Participants demonstrated a 12.5% higher level of comprehension for the achromatic images (5.39 ± .78) compared to the original color versions (4.80 ± .76, t= 4.94, p < .001). Despite this increased comprehension, participants expressed a preference for the chromatic images (5.21 ± .58 vs 5.45 ± .59, t= -3.61, p = .0010). Further analysis revealed a strong correlation between visual imagery ability and the level of understanding for achromatic images (r = .41, p = .010), but not for color versions. These results suggest that (1) preference and understanding are separate in aesthetic appreciation of Dunhuang murals, and (2) visual imagery ability facilitates a deeper comprehension of the achromatic Dunhuang Murals.
[8] Multi-Modal BCI Based Effects of Different Cognitive States on Arousal and Executive Vigilance Assessment
ABSTRACT. Vigilance is critical in many contexts, like driving, piloting, operation of heavy machinery, and surveillance/ monitoring, where users are required to maintain high vigilance levels for long periods of time. There are many literature on the application of vigilance prediction, using BCI, as well as vigilance enhancement techniques. Two types of vigilance, arousal (AV) and executive (EV), also need to be taken into account, however, most research only focused on one type in isolation. In contrast, our study aims to use human-in-the-loop (HITL) to provide users with feedback of their vigilance states and recommendations to those who require high vigilance levels in day-to-day operations, distinguishing between AV and EV and their applications. We also looked at the effect on vigilance by cognitive states (attention, workload, cognitive flexibility, emotional arousal) of the user, which also affect their performance. We collected EEG data from participants (n=30) while they followed experiment protocol comprising baseline cognitive states and baseline vigilance, AV and EV tasks in real-world contexts. Six band-power features were extracted from EEG data, and was applied statistical analysis methods. We used t-tests of bandpower features, binary classification accuracy, and Pearson’s correlation on model evaluation scores to analyse the data. We found workload correlates negatively and arousal positively with vigilance, but not for attention and cognitive flexibility. EEG is able to differentiate between AV and EV. These findings allow for HITL systems with better recommendations and intervention for appropriate contexts for users with different requirements, enhancing task performance in real-life situations.
[10] Chinese deaf readers showed bilateral VWFA for visual word form processing
ABSTRACT. Many deaf readers have reading difficulties and struggle to achieve the same reading level as their hearing peers. The Visual Word Form Area in the left occipitto-temporal cortex is crucial to expert reading and the left lateralization of the VWFA is closely related to the left lateralization of the spoken language. So, will the deaf readers form VWFA for expert reading and is it left lateralized in the deaf sign language users?
We recruited 20 Chinese profound deaf subjects and 21 hearing controls. The two groups were matched in educational level, literacy and non-verbal intelligence. In the first fMRI experiment, the subjects were asked to do a semantic or phonetic matching task for Chinese blocks and character matching task for Korean control blocks(All subjects did not know Korean). We found that compared to Korean, Chinese characters activated both the left and the right occipitto-temporal cortex in the deaf group but only the left occipitto-temporal cortex in the hearing controls. We defined the activation of Chinese>Korean in the left and the right occipito-temporal cortex as left VWFA and right VWFA respectively, and then used a fMRI-adaptation paradigm to investigate whether they play the same role in word processing. We found that in the deaf readers, the left VWFA and the right VWFA showed similar neural adaptation for real Chinese character repetition condition and unpronounceable pseudo Chinese Character repetition condition.This neural selectivity for visual form in the left and right VWFAs in deaf readers was similar to that in the hearing controls’ left VWFA.
These results suggested that deaf readers formed bilateral VWFA which coordinated in visual word form processing. The development of VWFA may be impacted by language experience.
Acknowledgement: This work was supported by a grant from The National Social Science Fund of China(21BYY109).
[12] Visualizing Internal Representations of Two Different Faces Using Classification Images
ABSTRACT. We have internal visual representations of individual faces, yet presenting these face images to others poses challenges. This study explored whether internal representations of two different faces (target faces A and B) could be depicted as visual images using the classification image technique (CI: Dotsch & Todorov, 2012). In the experiments, paired face images with random noise added or subtracted from the average female face (i.e., base image) were presented, and participants were asked to select one of two images that resembled the target face. In Experiment 1, participants could view the target during the session; however, in Experiment 2, they could not see the target and instead memorized it beforehand. We computed the mean of all noise patterns a participant selected for each target, then generated the CIs for the two different targets. In both experiments, the CIs effectively and distinctly represented the features of the target faces (e.g., eye shape and cheek protrusion), and independent raters confirmed that each CI resembled the target face. Therefore, the current results suggest that internal representations of two different faces can be appropriately visualized using CIs.
[14] Differences in production effects of different fonts for displaying lyrics in music videos.
ABSTRACT. This study focuses on promotional video works that are mainly of a musical nature, so-called music videos.
In recent years in Japan, music videos often display lyrics used in songs as textual information on the screen.
Those letters are thought to have the effect of accurately conveying the lyrics themselves to the viewer, just like a karaoke screen.
However, since the display method is not uniform, it is thought that even the same lyrics may differ in some way depending on the way the text is displayed.
In this study, we hypothesized that the visual effect of music videos, in addition to the accurate transmission of lyrics, may also have a staging effect, such as enhancing the expressive intent of the music.
In order to test this hypothesis, we produced several music videos for the purpose of conducting a comparative experiment, and investigated the difference in the production effect on the viewer from the difference in the font of the text displaying the lyrics.
The purpose of the study was to verify the relationship between the difference in the expressive impression produced by the fonts and the image and quality of the music videos given to the viewers by tabulating the viewers' subjective evaluations of the fonts.
The survey asked about 100 people six questions about whether the font matched the music video.
Based on the above survey, text mining was used to analyze the results of the survey and the results of the questionnaire to examine the differences in the production effects of music videos due to the relationship between the effects of different fonts on the viewer, and to prove the hypothesis.
[16] Facial Clues and Vocal Blues: Deciphering the Guilt Code
ABSTRACT. The present study examines how facial and vocal cues jointly influence perceptions of guilt and sentencing decisions across ambiguous and concrete case types, bridging the gap between research on first impressions and their implications for legal proceedings. It investigates whether racial biases based on stereotypes and unfamiliarity with racial differences may undermine the credibility of judicial sentencing in Singapore's multicultural society. The study also explores the potential role of racial intergroup contact in alleviating such biases. Participants (N=150) rated 18 targets on judgements of guilt, desire to punish, and severity of punishment. Their levels of intergroup contact with major racial groups in Singapore were also measured. The results showed significant main effects of face race and case type on judgements of guilt and desire to punish the target. Specifically, Malay faces were judged as less guilty and less deserving of punishment compared to Chinese faces. With concrete evidence, participants rated the target as more guilty and more deserving of punishment. Intergroup contact exhibited complex effects, with target-race-specific and non-target-race-specific influences. For guilt judgements, increased Chinese contact reduced perceived guilt when paired with an Indian voice, whereas greater Malay contact increased guilt perceptions for concrete cases. For ratings of desire to punish, higher Malay contact reduced willingness to punish Malay targets in concrete cases, while Malay and Indian contact moderated punishment severity for concrete cases, with greater contact linked to more lenient punishments. The lack of interaction effects between ethnic faces, voices, and case type suggests that concrete evidence overshadows implicit biases in legal judgements. These findings emphasise the need for further research into the underlying mechanisms behind intergroup contact to account for such complexities in diverse contexts. Facilitating positive intergroup contact remains crucial to reduce prejudice and discrimination through increased interaction and dialogue rather than punitive approaches to racial discourse.
[18] Distinguishing the roles of parahippocampal cortex and retrosplenial cortex in scene integration
ABSTRACT. Humans perceive a coherent visual world across time and space. Previous research suggested the complementary roles of parahippocampal place area (PPA) and retrosplenial cortex (RSC) in scene processing, with PPA updating incoming sensory information while RSC maintaining a stable representation of the environment (Park & Chun, 2009). Using fMRI, we further tested how the PPA and RSC might integrate scene information by manipulating spatial-temporal sequences of scene segments. Specifically, each scene image was divided into three segments, with 66% overlap between the first and second segments and 33% overlap between the first and third segments. On each trial, participants (N=20) viewed 1) identical scene segments three times, 2) three segments of a scene sequentially (e.g., first-second-third segments), 3) three segments of a scene with a displaced order (e.g., first-third-second segments), or 4) segments from three scenes. Univariate analysis on response amplitudes and multivoxel pattern analysis on response patterns were conducted in PPA and RSC. Unsurprisingly, identical scene segments showed adaptation in both PPA and RSC, with lower response amplitude and different response patterns, compared with other conditions. Importantly, we expected that the role of updating information would be revealed by sensitivity to sequential vs. displaced scene sequences, whereas the role of maintaining a stable representation of the environment would be revealed by differences between same scenes (regardless of spatial-temporal sequences of segments) vs. different scenes. Comparing sequential vs. displaced conditions, we found no differences in either response amplitude or response patterns in PPA, but these conditions elicited significantly different response patterns in RSC. In contrast, PPA, but not RSC, showed significantly lower response amplitudes for segments from a scene presented either in sequential or displaced conditions, compared with completely different scenes. These findings provide new insights on the complementary roles of PPA and RSC in updating and maintaining scene information.
[20] Shared and Distinct Neural Codes for Within- and Cross-Species Biological Motion Perception in Humans and Macaque Monkeys
ABSTRACT. Throughout evolution, living organisms have honed the ability to swiftly recognize biological motion (BM) across various species. This ability is particularly pronounced when recognizing the BM of the same species, likely driven by social interactions. However, our understanding of how the brain processes within- and cross-species BM, and the evolutionary progression of these processes, remains limited. To investigate these questions, we examined brain activity in the lateral temporal areas of humans and monkeys as they passively observed upright and inverted human and macaque BM stimuli. In humans, we found that the human middle temporal area (hMT+) responded generally to both human and macaque BM stimuli. In contrast, the right posterior superior temporal sulcus (pSTS) exhibited selective responses to human BM stimuli. This selectivity was further evidenced by an increased feedforward connection from hMT+ to pSTS during the processing of human BM stimuli. In monkeys, the MT region processed BM stimuli from both species, but no subregion in the STS anterior to MT was identified as being specific to the conspecific BM stimuli. A comparison of these findings in humans and monkeys suggests that upstream brain regions (i.e., MT) may retain homologous functions across species, while downstream brain regions (i.e., STS) may have undergone differentiation and specialization throughout evolution. Taken together, these results provide insights into the commonalities and differences in the specialized visual pathway engaged in processing within- and cross-species BMs, as well as their functional divergence during evolution.
[22] Effects of the number and the motion type of Gabor patches inducing illusory global rotation
ABSTRACT. After viewing a moving stimulus in one direction (adaptation stimulus), an observer perceives a stationary stimulus (test stimulus) as moving in the opposite direction to the motion of adaptation stimulus. This phenomenon is known as motion aftereffect (MAE). When an observer views static Gabor patches whose carrier drifts vertically or horizontally with different speeds arranged to form a square except for the vertices, the observer perceives rotation of the square. Using this illusory rotation as an adaptation stimulus, we previously reported that rotational MAE of test stimulus (i.e., solid black square) in the opposite direction to that of adaptation stimulus. In addition, we investigated MAE by manipulating the number and motion type of Gabor patches and found no difference in duration of MAE by them (ECVP2022). The purpose of current study was to fortify the previous findings by investigating the spatial area of influence by illusory motion of adaptation stimulus, i.e., Gabor patch array. In two experiments, Gabor patches drifting vertically or horizontally with different speeds were presented together with a stationary square frame whose edges were on the center of Gabor patch. Participants were asked to press and hold a key corresponding to the rotational direction (clockwise or counterclockwise) whenever they felt rotation to a square frame. In Experiment 2 only patches drifting either outward or inward were presented together with a square frame. The results of Experiments 1 and 2 showed that rotational perception occurred to a stationary square frame irrespective of the number and the motion type of Gabor patches. Duration of rotational perception did not differ by them. We will discuss the present results in the context of the neural mechanisms of motion processing and psychological view point.
[24] Square versus triangular wave modulation in temporal asynchrony segmentation: an insight into underlying computation
ABSTRACT. Asynchronous change in the visual attributes across areas leads to visual segmentation. We recently found that motion direction is a less effective cue than luminance and color in segmentation (Chen et al. VSS2023). Motion direction change dynamics differ from luminance and color; it could be considered as a smooth (triangular-wave) modulation (of spatial phase), not an instantaneous (square-wave) modulation. It is pointed out that the triangular wave had a perceptual delay compared to the square wave, potentially interfering with the visual segmentation based on temporal asynchrony. Poorer performance in motion direction tasks might result from difficulty processing smooth changes rather than handling motion direction itself. This study introduces the square wave modulation and the triangular wave modulation to investigate the influence of the waveform on visual segmentation performance based on luminance, color, and motion direction. Our stimulus consisted of an array of 16×16 elements divided into four quadrants. Each element was a Gaussian bulb for luminance or color changes condition and a Gabor patch for direction change condition. The temporal pattern of a single trial was one of the square or triangular repetitive alternations at a given temporal frequency. Direction change under square wave modulation is based on apparent motion, while under triangular wave modulation, it is normal continuous motion. Two waveforms, A and B, with a 90-degree phase shift, were assigned to target and non-target quadrants. The observer’s task was to detect the target quadrant (4AFC). The result suggested that under the square wave modulation, the segmentation task reaches perfect performance within 1~5Hz for luminance, 1~3Hz for color, and 1~5Hz for direction change. While under the triangular waveform modulation, none of the attribute conditions can reach the perfect level. The experiment supports our hypothesis and further points out the underlying computation may be sensitive to the waveform difference.
ABSTRACT. Typically, most motion-sensitivity studies are performed with observers sitting in darkened rooms, often constrained by forehead support or/and a chinrest. This is fine if one is interested in the sensitivity of motion systems as such. However, organisms move, and it has become more and more clear that movement alters perceptual performance, where clearly vestibular input plays a role (see Davidson et al, Nature Communications, 2024).
In the current experiment we have observers perform a directional motion discrimination task (2AFC) while the participants are being moved by a motion platform (CKAS W25R 7DOF Motion System). To make sure the vestibular system is active, the speed of the motion platform is accelerating when the task is performed.
The stimulus is a moving Random Pixel Array (RPA). Random pixel noise is added if the observer indicates the correct motion direction 3 times in a row and a single mistake decreases the noise level, resulting in a threshold of 79% correct performance (LSNR method, see Frederiksen et al, Vision Research, 1993). The staircase ends after 12 reversals and the threshold is calculated by averaging of the last 6 reversals. The motion platform moved in the direction of stimulus, the opposite direction, or in a direction perpendicular to the motion direction.
The results show that there is, surprisingly, little or no effect of being moved by the motion platform as compared to the threshold under static conditions, suggesting that the effect of vestibular input is minimal under passive movement conditions.
[28] Decoding Emotion from Motion: A Comparison of Spatio-Temporal Graph Convolutional Networks (ST-GCN) and Human Accuracy
ABSTRACT. Human expression of emotion through body movement varies across individuals and contexts, posing significant challenges for both human and machine to interpret and detect. Traditional computer vision research has focused predominantly on action recognition and facial expressions for emotion detection, often overlooking the nuanced interpretation of bodily movements. To bridge this gap, we introduce a machine learning algorithm that harnesses motion capture data to recognize human emotions. We tested this algorithm with the Diverse Bodily E-Motion Dataset (DBEMO), which comprises motion capture data from 34 Japanese professional performers. This dataset includes 13 emotions (7 basic emotions, 5 social emotions, and neutral) across three personalized scenarios, each with three levels of intensity (low, middle, high).
We applied Spatio-Temporal Graph Convolutional Networks (ST-GCN) for emotion recognition task using data from DBEMO, achieving accuracies of 36% and 26% under different conditions—split by motion and split by performer, respectively. To benchmark the performance of our model, we assessed human accuracy using dynamic point-light animations derived from six performers’ data, focusing on the middle intensity of eight emotions (7 basic + neutral). Twenty-two participants engaged in an 8-alternative forced choice (8AFC) emotion judgment task. Training the ST-GCN solely on basic emotions for a fair comparison, the model attained an overall accuracy of 37.5%, closely following human performance at 41.6%. Both human judges and the ST-GCN model were particularly good at recognizing 'Neutral' and 'Anger', but less effective at identifying 'Contempt'.
This study evaluated ST-GCN on emotion recognition capabilities in body expressions within diverse scenarios, and directly compares the accuracies of human versus machine assessments. The findings indicated the nuanced capabilities of automated systems in approximating human emotion recognition patterns, providing valuable perspectives for future psychological studies on human perception and interaction in the realm of affective computing.
[30] Intuitive knowledge of physical movements compresses subjective duration
ABSTRACT. Humans possess an innate understanding of physical motion that is essential for navigating and interacting with their environment. While extensive studies have examined how this intuitive physics influences cognition and subsequent behaviors, the focus has primarily been on spatial aspects, such as spatial working memory and attention. The temporal dimension, however, remains underexplored. This study aims to bridge that gap by investigating how intuitive physics affects duration judgments. We conducted a duration-discrimination task using animations of ball collision as experimental stimuli. Participants were asked to judge which of two sequentially presented animations was shorter in duration. One animation involved balls moving according to Newtonian mechanics, while the other showed balls moving randomly, serving as the baseline. Our findings revealed a temporal compression effect for movements congruent with Newtonian mechanics (Experiment 1), even when low-level noise was introduced (Experiment 2). Further investigation (Experiments 3A and 3B) indicated that the reasonable directionality of the movements was a key factor in the time-compression effect. Specifically, this time-compression effect persisted even when only the directions after collision adhered to Newtonian mechanics, while the velocities were random. In contrast, when the directions after collision were unreasonable, but the velocities remained consistent with Newtonian mechanics, perceived durations were dilated instead. Moreover, as angular noise increased from low to high, the temporal distortion transitioned from compression to dilation, with a turning point occurring around 50% (Experiment 4). Overall, our study demonstrates that intuitive physical knowledge can elicit a time compression effect, highlighting a significant interaction between intuitive physics and temporal cognition.
ABSTRACT. Saccadic localisation of targets of various properties has been extensively studied, but rarely for texture-defined figures. Previously (Sidhu, Allen & Keeble, 2023), we investigated how information from a texture target is processed in order to provide a signal for eye movement control. These texture targets comprised line element arrays, with orientations of the lines configured to form figure-ground percepts. However, psychophysical studies have shown that segregation can occur either when there is abrupt changes in orientation across space, that is, a texture edge, but also with gradual changes. Thus, we created various orientation profile configurations to examine the role of edge profiles in driving eye movements. The orientation change of figure from background was either abrupt (Block), varied spatially according to a Cornsweet profile (Cornsweet), or varied spatially according to a logistic curve (Blur). We found that irrespective of the orientation profile used, the visual system effectively segregates a texture figure from the background to accurately plan a saccade to the target figure. Importantly, despite the different profiles having distinct local salient regions based on orientation contrast cues, saccadic landing position was largely unaffected by this, and were instead driven by the representation of the whole target shape. Here, texture-defined stimuli with various orientation profiles were once again created to have rectangular-shaped target-figures at 4.6° and 9.1° of eccentricities. In addition, we differentially adjusted the weight, i.e. orientation contrast, on either side of the target edge to investigate whether it would influence the saccadic landing position. We found that this manipulation did indeed affect the saccadic landing position, whereby saccades were influenced by the target edge with a greater weight. The results indicate that saccades are planned not only on the representation of the target shape, but also the weight/mass of the target.
ABSTRACT. Statistical learning, a fundamental human capability, has been demonstrated to play a crucial role in various cognitive processes, such as memory, language, and attentional control. While statistical learning has been extensively investigated in neuroscience, particularly in relation to higher-order brain regions associated with learning, it remains unclear whether such learning extend to our oculomotor system, which is integral to our perceptual processing.
To explore this, we employed a visual search task while concurrently recording eye movements and electroencephalography (EEG) signals, during the task participants searched for a uniquely shape (target) among other search elements. In some trials, a salient distractor (colored differently) was present. Two conditions were included: 1) in the learning condition, the salient distractor appeared more frequently at one location (high-probability location) compared to other locations (low-probability locations).; 2) in the baseline condition, the salient distractor appeared with equal probability across all locations.
The behavioral findings showed that when a distractor appeared at high-probability locations, response times were significantly reduced compared to other locations, suggesting a learned suppression effect due to statistical learning. Notably, in the learning condition, a reduction in microsaccade rate was observed before the search onset compared to the baseline condition, indicating the involvement of oculomotor system in preparing for learned suppression. Additionally, microsaccades were more frequently directed towards high-probability locations, with their latency correlating with behavioral response times. Further analysis on EEG signals revealed a reduction in alpha (8-12 Hz) power before the search onset in the learning condition, accompanied by reduced inter-trial alpha phase coherence persisting throughout the pre-search onset period. These findings suggest that statistical learning proactively modulates late attentional selection, involving both the oculomotor system and neural oscillations, indicating an intricate interplay between them to support learned suppression.
ABSTRACT. Attention experiments typically use endogenous cues to trigger top-down attention shifts. To investigate attention controlled by self (self-initiated attention), we developed a new types of visual search experiment. There were four discs on a display with a rapid serial visual presentation (RSVP) of letter stimuli. The task was to identify the number of discs with target presentations in the RSVP sequence (between 0 and 4). To perform the task, participants searched targets at each disc to judge whether targets were included in the sequence of the disc. They shifted gaze to another disc when they found a target at the disc, or when they decided to go to another disc, judging the disc as the one without a target. Attention shift was initiated by target detection (cued) in the first case, and by participants (self-initiated) in the second case. We simultaneously recorded EEG and gaze data while participants overtly shifted attention. We analyzed EEG data locking to the saccade onset to shift gaze, and compared amplitude of several frequency band among different types of gaze shifts. In this presentation, we focus on alpha powers (8-13Hz) for the self-initiated and cued attentions. The analysis revealed that differences in occipital and parietal alpha powers are statistically significant between the self-initiated and cued attention shifts when average of all electrodes were analyzed. Since attention is known to reduce alpha wave power, this suggests that more attentional is required for self-initiated than instructed attention shifts. The analysis also showed the significant difference in alpha power averaged over all electrodes between the two types of attention shifts for the period from -0.4s to 0s before a saccade. These differences should reflect differences of mechanisms of the two types of attention. This is a first step for understanding of the mechanisms involved in self-directed visual attention.
ABSTRACT. Previous studies have identified various social and behavioural indicators of trust toward social media content. However, the physiological measures of trust perception, such as eye movements, has not been fully understood. This study investigated the question of whether we can predict trust by eye movement parameters.
Thirty participants viewed 384 composite images of individuals reporting news items (3 emotional expressions * 4 attires * 4 information veracities * 2 genders * 4 ethnicities) in a random order, and rated the speaker and content on trustworthiness (from 0 to 100, least to most trustworthy). Their eye movements were also tracked.
Using analysis of variance to assess fixed effects in a linear mixed models, we found that larger mean pupil size was significantly associated with lower trustworthiness ratings for speakers (F(1, 7181.70) = 14.06, p < 0.001), with a marginal effect on news content (F(1, 4477.60) = 3.59, p = 0.06, η² = 0.0008). Notably, higher saccade counts were significantly linked to higher trust in speakers (F(1, 11125.90) = 6.19, p = 0.01).
Consistent with prior research, individuals who appeared happy (F(2, 11194.40) = 1094.22, p < 0.001) and wore uniforms (F(3, 11196.80) = 180.26, p < 0.001) were perceived as more trustworthy. Similarly, emotion (F(2, 8363.10) = 52.08, p < 0.001) and attire of the speaker (F(3, 8364.7) = 27.8766, p < 0.001) significantly influenced news content trustworthiness . Interestingly, speaker ethnicity only impacted speaker trustworthiness (F(3, 11194.20) = 22.41, p < 0.001).
We found that pupil size and saccade count can predict trust. An enlarged pupil size reflects increased cognitive processing or arousal in response to untrustworthy individuals or information. A higher saccade count suggests the search and use of additional information to foster trust in speakers. Our findings provide new insights into trust perception and fake news detection.
The human face has special significance as a visual cue, helping us to track the emotional reactions and attentional focus of others, shaping social trait impressions (e.g., attractiveness and trustworthiness), and helping us to identify those people familiar to us. While face processing has received much attention in vision science, the mechanisms that shape the everyday experience of faces are still only partially understood. What are the core dimensions of facial information represented in the visual system? How is this information extracted from the visual signals relayed to the brain from the retina? How do implicit processes, such as physiological responses or evolutionary pressures, align with our perceptual experience of faces? This symposium showcases recent discoveries and novel approaches to understanding the visual processing of faces in the human brain. Talks range from the use of intracranial neural recordings to uncover cortical and subcortical responses underlying face perception, data-driven approaches to defining the social dimensions observers perceive in faces, characterisation of the link between face features, perception and physiology using psychophysics and computational models, and analysis of the biological and evolutionary factors that shape face impressions. Together this provides a snapshot of exciting developments occurring at a key interface between vision science and social behaviour.
Organizer: Colin Palmer (National University of Singapore)
ABSTRACT. Understanding the mechanisms of face perception is crucial for elucidating how the human brain processes complex social signals. Utilizing intracranial electroencephalogram (iEEG) recordings in the human brain, this talk highlights novel findings regarding subcortical and cortical mechanisms of face perception.
Our first study investigates whether the amygdala can rapidly encode invisible fearful faces. Using iEEG recordings, we measured responses in the human amygdala to faces with fearful, happy, and neutral emotions rendered invisible with backward masking. We discovered a short-latency (88 ms) intracranial event-related potential (iERP) in the amygdala, selectively evoked by invisible fearful faces compared to happy or neutral ones. Specifically, this rapid iERP showed a preference for the low spatial frequency (LSF) component of fearful faces. These findings provide converging evidence for the existence of a subcortical pathway dedicated to rapid fear detection in the amygdala.
Our second study focuses facial recognition beyond human identity, encompassing human subjects and deep convolutional neural networks (DCNNs), exploring how both entities process human and non-human faces. We identified neuronal populations showing preference for human or animal faces, respectively. These two neuronal populations exhibited distinct representational geometries in response to human and animal faces. Interestingly, the representational geometries of human- and animal-face-selective neuronal populations were both more closely aligned with visual DCNN models than semantic models. It also showed that as the DCNN layers deepen, they become more closely aligned with face-selective neuronal populations, highlighting the role of feedforward processing in forming the differentiation between human and animal face perception. This finding supports a two-pathway hypothesis for face processing in the human visual cortex and DCNNs, suggesting a convergent evolutionary mechanism between the human brain and artificial neural networks.
These studies show distinct pathways for processing emotional and categorical facial information, advancing our understanding of neural mechanisms in face perception.
ABSTRACT. Our ability to recognise facial expressions is the bedrock of human social communication. However, most of what we understand about how expression recognition is accomplished is based on research that has employed staged facial expressions at stimuli, potentially disconnecting facial morphology from genuine emotion and circumstance. Therefore, a reliance on staged stimuli might be obscuring our understanding of how faces are perceived and recognised during everyday life. In this study, our goal was to identify the core dimensions underlying the mental representation of expressive facial stimuli using a data driven approach. In two behavioural experiments (Experiment 1, N = 940; Experiment 2, N = 489), we used an odd-one-out task to measure perceived dissimilarity among two sets of faces; 900 highly-variable, naturalistic, expressive stimuli from the Wild Faces Database (Long, Peluso, et al., 2023 Sci Reports, 13: 5383) and 670 highly-controlled, staged stimuli from the NimStim database (Tottenham, Tanaka, et al., 2009 Psychiatry Res, 168: 3). Using Representational Similarity Analysis, we mapped the representation of the faces in the Wild and NimStim databases, separately, and compared these representations to behavioral and computational models. We also employed the state-of-the-art VICE algorithm (Muttenthaler, Zheng, et al., 2022 Adv Neural Inf Process Syst) to uncover the latent dimensions that reliably predicted behaviour towards the both sets of faces. Collectively, these results indicate that the representation of the Wild Faces was best characterised by perceived social categories, such as gender, and emotional valence. By comparison, facial expression category explained more of the perceived dissimilarity among the NimStim faces than the Wild Faces. These findings underscore the importance of stimulus selection in visual cognition and suggest that, under naturalistic circumstances, humans spontaneously use information about both social category and expression to evaluate faces.
ABSTRACT. Faces convey a wealth of social information that guides our impressions and interactions. While extensive research has examined the mechanism of explicit social judgments, the implicit encoding of social traits remains underexplored. This study investigates the spontaneous encoding of social traits through eye movements, pupillary responses, and neuronal activity in the amygdala and hippocampus. Participants viewed a set of faces varying in their facial features while their ocular and neuronal responses were recorded. Computational models were built to quantify how variations in facial features influenced these physiological measures. To reveal what kind of social information is implicitly encoded in these physiological responses, these models were then correlated with previously validated models that represent facial information used to social trait judgments. In a preliminary study (n=22), eye movement and pupillary response models revealed that pupil dilation encoded valence and power dimensions of social judgments, while fixation durations were primarily predicted by the power dimension. Study 1 (n=68) replicated these findings in a larger sample and found heightened sensitivity to these dimensions in individuals with higher levels of social anxiety. Study 2 (n=6) demonstrated that amygdala neurons primarily encoded valence in face evaluation (general positivity and negativity), while hippocampal neurons encoded both valence and power dimensions. These findings shed light on the physiological underpinnings of face-based social perceptions, offering novel insights into the implicit processes shaping social impressions. The multi-modal approach, combining eye tracking, pupillometry, and single-neuron recordings with computational modeling, reveals how social traits are spontaneously inferred from facial features and encoded in our physiological responses.
The evolutionary basis of preferences for male facial masculinity
ABSTRACT. Why do we find certain facial features attractive? Given that our facial preferences are linked to who we choose as romantic partners, theorists argue that there may be a biological basis to our preferences. It is suggested that we have evolved to be attracted to certain features because they provide cues to a high-quality mate. But what are these evolved cues, and what qualities do they actually signal? I will address these questions, focusing on a facial cue that has received much attention in the evolutionary literature – facial masculinity in men. I will describe a series of studies linking facial masculinity to physical health, lifestyle habits, and social behaviours in men. Together, the results suggest that facial masculinity in men may signal some aspects of good health as well as their tendency to be sexually unfaithful. Therefore, selecting a partner based on facial masculinity may involve balancing the benefits of having a healthy partner vs the costs associated with having an unfaithful one.
ABSTRACT. A subtle yet ubiquitous feature of the human face is the presence of “eye glint” – specular reflections from the surface of the eye that vary with the position of light sources in the surrounding environment. The present study tested whether eye glint plays a role in face perception, particularly in how an observer perceives the gaze direction of a person they are viewing. Participants were shown life-sized images of faces on a computer screen that varied in eye direction, head rotation, and illumination. Images were produced using 3D-scanned models of human faces and physically-based rendering, allowing the presence of eye glint to be manipulated while controlling all other image features. Participants made judgements about when they shared eye contact with the face or reported perceived gaze direction using a matching method. In Experiment 1, the presence of eye glint had little influence on the accuracy or precision of perceived gaze direction when faces were viewed under simplified conditions, namely with the face oriented directly towards the viewer and centrally illuminated. In Experiment 2, a repulsive bias in the perception of gaze direction caused by changes in head orientation was slightly reduced when eye glint was present compared to when it was absent. In Experiment 3, biases in the perception of gaze direction caused by non-central illumination of the face were significantly reduced when eye glint was present. Together this suggests that eye glint can help an observer to maintain a robust sense of another person’s gaze direction despite variability in the appearance of the eye region caused by changes in head orientation and illumination direction. This may occur partly because eye glint provides a cue to the direction of illumination and highlights how very subtle visual features can shape the information gleaned from a face.
ABSTRACT. In daily life, objects form organized visual scenes through mutual associations, aiding in faster visual search. For instance, books in a library are organized using fixed codes, facilitating quicker retrieval. Studies on contextual effects suggest that attention can utilize background information for rapid target localization. However, direct target localization remains challenging even in structured scenes; individuals often prioritize non-targets despite the scene's regularity.
This study examines how regularity in visual scenes guides attention during search , using various basic visual features. Three experiments were conducted, each focusing on arrow , Landolt ring, or size within regular and random scenes. Results showed significantly shorter search times and a logarithmic relationship between search time and search space in regular scenes, contrasting with linear patterns in random scenes. Eye-tracking and computational modeling revealed that attention prioritizes positions most likely to reduce search space (i.e., highest expected information gain) in larger search spaces and targets with the highest probability as search space narrows.
The second part expanded scenes to combinations of two features with regular changes along horizontal or vertical dimensions, requiring participants to search for specific sizes and orientations. Compared to fully random scenes, regular scenes in both dimensions significantly reduced search times following logarithmic patterns. Similar attention mechanisms were observed as in the first part. Disrupting regularity in one dimension affected search times , while the other dimension's regularity still aided attention guidance, resulting in shorter search times than fully random conditions.
These findings discuss how visual systems leverage scene regularities for search, revealing an attention mechanism focused on reducing search space. This mechanism's stability across various feature combinations in regular scenes highlights its role in promoting visual search efficiency. Further exploration is needed to understand its application in real visual scenes and how visual systems flexibly switch attention for dynamic changes.
Hierarchical and Cascaded Acquisition of Spatial Contextual Cueing
ABSTRACT. In our daily lives, we constantly carry out visual searches to locate task-relevant information while ignoring irrelevant information, such as when searching for a mobile phone in a messy living room. During such daily visual search tasks, the target typically appears within an invariant spatial context of other visual objects. Studies have shown that such invariant spatial contexts are learned and facilitate visual search performance. This phenomenon is known as spatial contextual cueing. In this talk, I will discuss two theories that can account for the acquisition of spatial contextual cueing, namely, the instance theory of automatization and the reverse hierarchy theory of visual perceptual learning. In addition, I will discuss why and how spatial contextual cueing is acquired in a hierarchical and cascaded manner, as predicted by reverse hierarchy theory.
ABSTRACT. In visual search asymmetry, certain targets are easier to detect than others with opposite or different features. For example, it is easier to find a target with reversed vertical shading than one with reversed horizontal shading among uniformly shaded stimuli. Here, we show that this asymmetry persists even in inattentional blindness (IAB). During the experiment, subjects viewed naturalistic simulations of moving objects that were vertically or horizontally shaded. They counted the number of times a black ball crossed the midline and indicated when they saw an unexpected target. The unexpected target had a reversed shading gradient and was introduced at random times in 20% of the trials. Our findings showed that almost twice as many vertically shaded targets were detected compared to horizontally shaded targets, and this finding was not due to target visibility, false target detection rate or average counting accuracy. To elucidate these results, we propose a biologically inspired, computational IAB model based on predictive coding. Trained in an unsupervised manner using naturalistic video sequences, our model predicts upcoming video frames by minimizing expected errors inherited from preceding predictions. Remarkably, when we tested our model on the same videos used in our psychophysics experiments, we observed a more pronounced variance in predictive errors for horizontally shaded targets than for vertically shaded targets. Together, these findings suggest that the asymmetry in IAB is driven by top-down expectation biases derived from prior visual experience in humans and machines.
ABSTRACT. In our daily interactions with the complex environment, we embrace lots of visual information but cannot process all of them at once. Salient distractors often capture our attention, disrupting ongoing tasks. Recent studies suggest that, through statistical learning, prior experiences regarding distractor locations can reduce distraction by suppressing their corresponding locations. However, the proactive neural mechanisms, especially the involvement of alpha oscillation, supporting this learned suppression remain unclear.
To systematically examined the neural evidence for learned suppression, we adopted electroencephalogram (EEG) recordings in humans when performing the additional singleton task, in which participants were required to search for a unique shape (target) while ignoring a salient distractor. Notably, the salient distractor was presented more often in one specific location (high-probability location) than in other three locations (low-probability locations) in the learning condition, introducing statistical learning which should result in the suppression of the frequent distractor location. We employed frequency tagging in EEG recordings to separately assess frequency tagging response to different distractor locations, and measured alpha activity simultaneously. If learned suppression operates proactively, we expect corresponding neural activity would emerge before the stimulus onset.
The results showed significantly different tagging responses towards high-probability location and low-probability locations, and a general decrease in alpha (8-12Hz) power before the search onset, underpinning a proactive mechanism of learned suppression. Notably, changes in alpha oscillation preceded tagging responses, spanning frontal, parietal, and occipital regions, influencing subsequent neural modulation in discriminating between high- and low-probability locations. These findings highlight the crucial role of pre-search alpha activity in the higher-level control of learned suppression, and its connection to proactive attentional selection.
ABSTRACT. Visual search is a crucial cognitive process in everyday activities, enabling individuals to distinguish specific targets from surrounding distractors. The target salience significantly impacts search efficiency, with more salient targets typically identified more quickly. Previous studies have shown that highly salient targets can elicit responses within about 400 ms after the presentation of the search array. During this short period, our brain undergoes considerable neurophysiological activity; yet, little is known about the specific neural responses occurring within this period.
To explore this, we recorded electroencephalography (EEG) signals while participants engaged in a visual search task, in which participants identified a tilted bar (tilts of 3° or 25°) among distractors while maintaining fixation on a central point.
Behavioral results demonstrated a transition from inefficient to efficient search as target salience increased, evident in faster response times and reduced errors. With a multivariate analysis on EEG data over the parietal and occipital electrodes, we successfully tracked target processing across different salience levels. The results revealed enhanced target selectivity within the low-frequency band (1-13 Hz) for the most salient target (25° tilt), bolstered by concurrent high-frequency (HF, 20-60 Hz) activity aiding in target detection. Moreover, neural synchronization increased with intensified target salience, evidenced by maximal inter-trial phase coherence (ITPC) within 1-5 Hz for the most salient target. We also observed enhanced coupling between HF activities and ongoing low-frequency phase (1-15 Hz) preceding response, for the most salient target relative to other targets. Lastly, as target salience increased, the N2pc component—a hallmark of attentional processes—exhibited shorter latencies and increased amplitudes, localized to the precuneus and superior temporal gyrus (STG). These findings collectively underscore the role of target salience in modulating visual search, via collaborating different brain activities, shaping a network of neural responses aiding in processing salient signals.
ABSTRACT. The “X’(extreme) periphery” can be defined by visual eccentricity larger than 60 deg., up to the limit of the visual field (90 deg., approximately). This should be of the central focus of vision scientists who are interested in the situations where the brain needs to solve the maximum degree of ambiguity in visual inputs. Ironically, very little has been studied and known. Thus, we have conducted a series of experimental studies to test our “brain compensation hypothesis” in the past one decade or so, which are reviewed in this talk with three main themes.
1) Auditory-visual integration. A flicker appears faster in the periphery than that with the same frequency (say 5Hz) in the fovea, but they can be entrained by synchronous sounds. An auditory primer not always facilitate visual detection, but it does so when motion direction is consistent (visual moving motion with auditory Doppler stimulus), or at a particular frequency (eg. 300 Hz).
2) Color/location cluttering. Colors and locations are often misperceived in the X’periphery, especially when crowded. Not only that there are color/relative location mistakes, some new phenomenological effects are reported in the X’periphery, such as flashing, dynamic changes of color, filling-in, etc. A cortical-magnification-like factor tends to be found when the critical size for such illusions is measured at each eccentricity.
3) Action capture. When the observer moves its own hand behind the display, which can be either dynamic random noise or a low-frequency (say 2-3 Hz) flicker, the visual stimulus is often “captured” by the hand, thus appear to move along with it. Own action tends to yield somewhat stronger effect than observing other’s, but the latter still yields a substantial capture effect.
In short, the effects that have been found in the periphery are enhanced or qualitatively extended (ie. new effects are found) in the X’periphery. A large part of the findings can be accounted for by the cortical magnification, in that the same size-dependent perceptual processing operates across eccentricities (including the X’periphery) and that it is just the size scaling differences. Related to this, we still need to make two points. First, even if it is just cortical magnification, the psychophysical findings in the X’periphery would have a wide and profound impacts in the real-world situations, including driving, VR and display technology, sports, entertainment, etc. Second and more significantly, there is something more noticeable, on top of what would be expected from the cortical magnification - especially related to clarity/stability and confidence on the percept in the X’periphery. At a glance, it may be puzzling as to why invalid (illusory) percept with confidence is adaptive or beneficial in any fashion. From the biological and evolutionary perspective, a quick “false alarm” can often be better than a slow “hit” or no decision, especially when it comes to risk detection.
The “gist” of cancer and other adventures in “use-inspired basic research”
ABSTRACT. We perform visual searches all the time (Where is the cat? Where is the light switch?). Some of those search tasks are more consequential than others (Is there a tumor in this mammogram? Is there a weapon in that carry-on bag at the airport?). Those socially important search tasks raise issues that are worth studying. The result is “use-inspired basic research” that teaches us new facts about visual attention in general and may help improve performance in specific real-world tasks. We will consider three cases. First, radiologists sometimes get a ‘feeling’ that there is something wrong in an image before they actually locate any problem. Is this feeling something real? Is there really a ‘gist’ of cancer that an expert can assess in the ‘blink of an eye’? It turns out that there is global signal that can be detected and that might be clinically useful. Second, many radiologic exams generate 3D volumes of image data (e.g. the stack of images generated for a lung CT exam). How do radiologists search through such stimuli? Search through a stack of images is different than searching a 2D image or looking around the 3D world. By using eye tracking methods, we can see radiologists’ different strategies and ask if one strategy is better than another. Finally, tasks like airport security screening and breast cancer screening are searches for targets that are very rare (“low prevalence”). After all, you don’t want to be at the airport if 50% of bags contain weapons! It turns out that humans are not built for low prevalence search. Low prevalence induces us to miss targets. Why? And what can we do about this? Will artificial intelligence save us?