APCV 2015: 11TH ASIA-PACIFIC CONFERENCE ON VISION
PROGRAM FOR FRIDAY, JULY 10TH
Days:
next day
all days

View: session overviewtalk overview

09:15-10:15 Session 1: Keynote: Computational

Keynote in Computational given by Dr. Tomaso Poggio.

Location: FunctionHall
09:15
Visual cortex and deep convolutional architectures: towards a theory
SPEAKER: Tomaso Poggio
10:15-13:00 Session 2: Poster: Spatial vision, Color & Light, Multisensory processing

Poster Session in Spatial vision, Color & Light, Multisensory processing, and Computer vision.

Location: SemRMs
10:15
Language background changes audio-visual mapping of shapes-to-sounds
SPEAKER: unknown

ABSTRACT. Previous research has shown that visual stimuli with high spatial frequency are associated with audio stimuli with high temporal frequency (Evans and Treisman, 2010). These effects have not been tested for sounds from natural languages, despite some estimates suggesting that as many as 70% of the world’s languages employ pitch as a contrastive speech feature (Yip, 2002). We recorded linguistic sounds representing /i/ (the vowel in ‘feet’) and /u/ (the vowel in ‘shoe’) in the four lexical tones of Mandarin Chinese, and tested the mapping to objects with different spatial frequency characteristics, in two groups of participants: Naïve listeners, who were unfamiliar with the tonal language, and speakers of Mandarin Chinese (bilinguals dominant in Chinese). For naïve listeners, we replicated the association between spatial frequency and pitch: participants chose sounds with higher-pitched onsets for higher spatial frequency images (p=.001), whereas they chose sounds with lower-pitched onsets for lower spatial frequency images (p=.004). By contrast, Chinese speakers responded differently: They matched sounds with abruptly changing pitch (Tone 4) with high spatial frequency images (p=.004) and sounds with stable pitch (Tone 1) with low spatial frequency images (p=.004). Thus, while the association between spatial and temporal frequency in the naive group depends more on absolute temporal frequency (as predicted by theories of spatio-temporal coding), the association between spatial and temporal frequency in the Mandarin Chinese group is determined by the the dynamic pitch change of the tones – an important feature of Mandarin Chinese tone discrimination. The findings align with the Pitch Change hypothesis previously described by Shang & Styles (2014), but only for people whose dominant language involves monitoring the dynamics of pitch.

10:15
Was Colavita effect more likely a perceptual phenomenon?
SPEAKER: unknown

ABSTRACT. Colavita effect is a robust visual dominance phenomenon, in which the participants tend to neglect the auditory stimulus when presenting both visual and auditory stimuli. The cause of this effect is still in debate. One suggested attention competition while the other suggested a perceptual phenomenon. In this study, we varied the early visual processing features, visual target polarity, to clarify the controversy. Participant were required to complete 450 trials, including 150 visual stimulus only (uni-modal visual) trials, 150 auditory stimulus only (uni-modal auditory) trials and 150 bimodal trials that visual and auditory stimulus were presented simultaneously. Stimulus presenting duration was 50ms. Participants were required to respond as fast and accurate as possible and pressed one corresponding key for uni-modal visual, the other for uni-modal auditory, and pressed both for bimodal stimulus. The corresponding keys were counterbalanced between participants. Accuracy rate, response time were recorded. The visual stimulus was a circle disk and the auditory stimulus was 1000 Hz pure tone. In the experiment one, both visual and auditory stimuli were at suprathreshold level and the visual background was black. We replicated Colavita effect (N=9, p-value= 0.0198). In other words, in bimodal error trials, the frequency that participant reported uni-modal visual was significantly higher than uni-modal auditory. In experiment two, both visual and auditory stimuli were at participants’ detection threshold level. The visual target was either brighter or darker than the background for different sets of participants. The result on brighter target showed the visual dominant at bimodal error trials (N=8, p-value= 0.0258), while an insignificant result when target was darker (N=6, p-value= 0.0919). These results indicate Colavita effect can be affected by perceptual operation such as visual target polarity, but only in strength, not its direction.

Acknowledgement: This work was supported by NSC-101-2401-H-006-003-MY2 and NSC 102-2420-H-006 -010 -MY2 to PCH.

10:15
Explaining the perceptual fluorescence with two approaches: optimal color and spatial luminance distribution
SPEAKER: unknown

ABSTRACT. Fluorescence is the emission of light by materials that has absorbed a light or other electromagnetic radiation, which is a form of luminescence. However, the perception of fluorescence occurs not only to a fluorescent material but also a non-fluorescent material. This study aims to investigate the perception of fluorescence according to the optimal color hypothesis and spatial luminance distribution. Our idea is such that a non-fluorescence material is perceived as light emitting when its luminance exceeds the limit of the optimal color calculated with a broad spectrum like daylight. We performed psychophysical experiment to measure the fluorescent color judgement of non-fluorescent materials under 27 achromatic illuminations with the same chromaticity and different spectra to test this hypothesis. Response probabilities for fluorescent perception were fitted by three models; (1) simple luminance of stimuli, (2) optimal colors calculated by actual illuminant spectra, and (3) those calculated by pre-assumed daylight. As a result, coefficient of determination of the model of (3) pre-assumed daylight was significantly higher than other models. Furthermore, fluorescent surface of 3D objects may have less shadow than reflecting 3D objects. This study proposes a spatial luminance distribution of fluorescent 3D objects hypothesis that an object with less shadow is easier to perceive as fluorescence. The psychophysical experiments were performed with CGs which were adjusted as their shading become higher or lower and two observer task was arranged, which were 2AFC task (which one looks more fluorescent) and scaring task (how fluorescent the object is). The results of each experiment supported proposed hypothesis suggesting CGs with less shadow are easier to perceive as fluorescence.

10:15
Images on a transparent display with a uniform gray background evaluated by visibility matching and degradation category rating
SPEAKER: unknown

ABSTRACT. Several types of transparent displays have been developed for use in fields such as augmented reality and mobile/wearable computing. The quality of transparent displays significantly depends on the viewing environments. In particular, images overlap on a background sight behind the display, deteriorating the visibility. There is a need to develop indices for describing the image quality of transparent displays. In this study, we conducted two experiments to evaluate the images degraded by an image transmitted from the background. In the first experiment, the image was evaluated by using the visibility matching procedure in which an observer adjusted the luminance and contrast in the original image for replicating the visibility of the degraded image. Superimposing the background image on the displayed image simulated the image of transparent display with transmitted background. Two images, N6 (harbor scene) and S3 (cat painting), were selected as original display images from the standard image database of JIS X 9204. Background images were three different shades of uniform gray. The superimposed test image and the matching image for luminance-contrast adjustment were squares of 24 arc degrees, and were presented adjacent to each other on a non-glare IPS-LCD monitor. An observer, at the distance of 60 cm, compared the two images and performed the matching in a dark booth. In the second experiment, the quality of the images was evaluated by using the degradation category rating (DCR). Five male observers participated in both experiments. In the visibility matching, all observers set higher luminance and lower contrast for the backgrounds with higher luminance. The observers’ settings agreed well with those estimated from the superimposed images. In DCR, the observers’ ratings monotonically declined as the image contrast decreased. On the other hand, ratings declined with both increasing and decreasing the image luminance.

10:15
Implicit learning of association between feedback and action in visual search
SPEAKER: unknown

ABSTRACT. Animals including human can learn appropriate behavior from positive or negative feedbacks on their action. This study investigated whether participants learn appropriate behavior without knowledge of the relationship between their action and feedback. In this study, participants did visual search task (to find a ‘T’ letter among “L” letters) with 6 displays surrounding them without restriction of body movement. Alert sound and penalty was provided associated with their head movement. Specifically, alert sound was provided if head movement was less than 45 degrees during the last one second. Then the trial was terminated if the head movement was still smaller than 45 degrees during one second after providing the alert sound. No instruction of sound and penalty was provided during the experiment. The results demonstrated that the number of penalty decreased as experiment went forward. We analyzed participants’ head movements which are associated with feedback and penalty, and found that mean velocity of the head movements after providing alert sound became faster as participants experienced alert sounds. These results implied that participants learned the relationship between head movement and feedback, and changed their behavior so as to avoid penalty. We also asked participants to their introspection at the end of the experiment. None of them realized the criteria of feedback and penalty, and changed their behavior consciously, indicating that the learning occurred implicitly. These results suggest that at least in visual search, human can learn the association between feedback and one’s action, and change behavior even if there is no awareness of the relationship between them.

10:15
Degradation of display image due to glare of ambient light evaluated by using a visibility matching technique and analysis of their spatial frequency characteristics
SPEAKER: unknown

ABSTRACT. Glare of ambient light degrades the visibility of a display and can be distracting to viewers. Optical properties such as reflectance and haze factor are commonly used to evaluate the quality of any anti-glare treatment on the display surface. In practical situations, however, these optical properties often disagree with actual appearance or subjective evaluation. It is therefore necessary to develop new measures that are closely correlated to appearance. In this study, using a visibility matching technique, we evaluated the quality of display images on which a glare reflection image was superimposed. In the matching procedure, an observer adjusted the luminance and contrast in an original display image to replicate the visibility of the degraded image. The display image with glare was simulated by superimposing a reflection image on a display image, and was then converted to a gray scale image. Two images, i.e., the test image with glare, and a matching image for luminance-contrast manipulations, were presented adjacent to each other, on a non-glare IPS-LCD monitor. Both images were squares and subtended a visual angle of 24 degrees to the observer, at the distance of 60 cm. Five male observers carried out the visibility matching in a dark booth. The results showed that all the observers set higher luminance and lower contrast to replicate the visibility of the images degraded by the reflection of images with higher luminance. However, the observers’ settings were substantially lower, in both, luminance and contrast, as compared to those calculated from superimposed images. This indicates that observers had a tendency to segregate the appearance of the original display image perceptually from the reflection image. Finally, the spatial frequency characteristics of images were analyzed using 2D Fourier transform to derive a prediction function for luminance and contrast settings in the visibility matching.

10:15
Role of cardinal orientations in perceived upright of natural images
SPEAKER: unknown

ABSTRACT. The perception of up in a scene (scene upright) is dependent on its orientation content (Haji-Khamneh and Harris, 2010), with observers’ uncertainty reducing for manmade scenes which are believed to be richer in its (cardinal) orientation content. Here we examine how alterations in specific orientations within the scene affect the perceived upright, by transforming a manmade outdoor image (a photo of a house taken in the fronto-parallel plane at eye level). Aerial and side views of the image were created using 2D projective transformations that move the vanishing points of either the vertical or horizontal directions by different amounts. These non-uniform transformations were parameterized by their maximum effect on local image orientation (0°,5°,15°). Using a 2-IFC task with a method of constant stimuli, the observers (n=5) had to judge which of two centrally presented images (ISI=1000ms) appeared more upright. Within a block, images were of the same viewpoint and rotated in the fronto-parallel plane (-15°,-12°,-9°,-6°,-3°,0°,3°,6°,9°,12°,15°). Measures of the perceptual bias were obtained for each viewpoint. Compared to the undistorted (frontal) image, significant shifts in the perceived upright were observed for both horizontal (p<0.001) and vertical distortions (p<0.001), although not in a consistent direction. We found that the size of the shifts in perceived upright (irrespective of the direction of shift) for horizontal distortions (side views, mean=1.85°, SD=1.23°) were significantly larger (p<0.05) than vertical distortions (mean=0.97°, SD=0.60°). These results reveal the importance of vertical and horizontal structure in the perception of upright, and suggest that when these orientations are altered (through image transformation) observers try to combine them into an average estimate (of horizontal or vertical) that is subsequently used to estimate upright. The inconsistency in the direction of shifts suggest the idiosyncratic weightings on different edges of the transformed image to derive an average estimate.

10:15
Preverbal infants’ sound-shape association with single-syllables
SPEAKER: unknown

ABSTRACT. Sound-shape correspondence (e.g. ‘kiki’ corresponds to spiky shapes and ‘bubu’ to curvy shapes) is demonstrated on adults and infants with two-syllable pseudo words (e.g. ‘kiki’) as audio stimuli. It is unclear whether this correspondence is at a word-object level or a perceptual feature level. We investigated this topic by testing single-syllables (e.g. ‘ki’) and shapes on 8-10-month-old infants in Hong Kong.

We adopted 12 syllables (‘di’, ‘de’, ‘gi’, ‘ge’, ‘ki’, ‘ke’, ‘pi’, ‘pe’, ‘mo’, ‘mu’, ‘lo’, ‘lu’) and 8 shapes (4 curvy and 4 spiky) from previous studies (Ozturk et al., 2012, Peña et al., 2011, Kirkland & Nielsen, 2009) and let 30 Chinese-speaking (Mandarin and Cantonese) adults validate the best-matched shapes (spiky or curvy) associated to the 12 selected syllables.  Only those received 70% adult agreement would be used on infant testing later; 4 sharp sounds (‘di’, ‘gi’, ‘ki’, ‘pi’) and 4 round sounds (‘lo’, ‘lu’, ‘mo’, ‘mu’) passed this criterion.

For infant testing, we displayed a spiky and a curvy shape side-by-side on the screen when a selected syllable from adult rating was played. Infants’ gazes were recorded by a Tobii T120 eye-tracker. We found that infants looked significantly longer to congruent pairs (spiky shape with sharp sound) ‘pi’, ‘di’ and ‘gi’.  They also looked longer to the incongruent pairs (according to adult rating) - spiky shape with round sound ‘mu’ and round shape with sharp sound ‘ki’.  However, there was no significant looking time difference between shapes when syllables ‘lo’, ‘lu’ and ‘mo’ were played.

The result suggested lip-shapes producing single-syllable stimuli may be critical for Chinese infants’ sound-shape association.  Our tested infants reacted less to combinations with consonants that could be produced without salient (e.g. protruding) lip-shape (/l/, /m/) to trigger sound-shape association.  Future studies should take note of it when selecting appropriate stimuli.

10:15
The effect of scene inversion on egocentric direction and position perceptions
SPEAKER: unknown

ABSTRACT. Driving requires drivers to perceive where they are in the lane (i.e., egocentric position perception) and whether they go along a desired path (i.e., egocentric direction perception). Previous studies have provided evidence that motion perception is essential during driving in the context of visuomotor control (e.g., Donges, 1987; Godthelp, 1986). The current study, however, focused on a more primitive level, static information in visual scene itself (e.g., lane edges) and human visual perception system. We prepared two types of image structure conditions to examine whether the “uprightness factor” works similarly or differently in the processes of the egocentric position perception and the egocentric direction perception. In a static road landscape image “far” information is seen at an upper location than “near” visual information (normal image condition) and such locations are horizontally inverted (inverted image condition). The experiment consisted of two different detection tasks; the egocentric direction detection task and the egocentric position detection task. In each task, two road images (250 ms) were presented sequentially with a blank display in between (900 ms). Participants were asked to judge which image was viewed straight in the direction detection task, and to judge which image was viewed from the center of the lane in the position detection task. As a result, the accuracy of the egocentric direction detection task was lower in the inverted image condition than in the normal image condition, while the accuracies the egocentric position detection task were not different between the two image structure conditions. These results indicated that the “uprightness factor” is important for egocentric direction perception. The robustness against the image inversion is, however, observed in the egocentric position perception. The two visual perceptions for visuomotor control are, therefore, dissociable.

10:15
The integration of local features in the global coherence task: the interactions between orientation and motion
SPEAKER: Pi-Chun Huang

ABSTRACT. It is still not clear how the visual system forms a global percept when its local features provide insufficient or inconsistent information. In this study, we used an array of moving Gabor elements (moving speed = 1.72 degrees/ sec, spatial frequency = 2.35 cycles/degree and bandwidth= 0.73 octaves) whose motion direction and orientation were manipulated independently. Thus, the interactions between the local features and the global percept could be investigated systematically. In each display, signals that had either a coherent motion direction or a coherent orientation were embedded in noises whose orientation and motion direction were randomized. The observer was required to determine the motion direction of the signals in the motion coherence task and also was required to determine the orientation of the signal in the global orientation task in separate runs. A two-alternative forced choice with a constant stimuli method was used to determine the coherent threshold. In experiment one, the coherent rate as defined by the task-irrelevant feature (e.g. motion direction) was manipulated from 0% to 75%, and the coherent threshold for detecting relevant feature (e.g. orientation) were measured. In experiment two, the correlations between motion and orientation were manipulated from -1 to 1. When the correlation is equal to 1, this means the motion direction of the Gabor is always orthogonal to its orientation. Our results (N=5) showed the motion coherence threshold (20%) to be slightly lower than the orientation coherence threshold (25%). Neither the ratio or the correlation between these two features influenced the coherence threshold. Our results showed the independent visual processing for motion direction and orientation, and it was determined that our visual system can efficiently ignore irrelevant features in order to complete a global task.

10:15
Influence of display type and rendering method on contrast sensitivity assessment
SPEAKER: unknown

ABSTRACT. We compared the contrast sensitivity function (CSF) measured on various display technologies through different software and hardware techniques that increase luminance resolution (8-bit to 16-bit including a quasi-continuous resolution). Contrast thresholds for 6 spatial frequencies (0.5–16 cpd) were measured using a staircase method (1-up/3-down) through a horizontal/vertical discrimination task. Each trial consisted in a Gabor patch (sigma = 2 degs) presented for 0.5 seconds with a random orientation. In the same session, the spatial frequencies were randomly interleaved and a full CSF was obtained in less than 10 minutes. All measurements were made using the same software and same computer. All display configurations were Gamma-corrected with a mean luminance of 60 cd/m2 and viewed at a distance that provides a Nyquist frequency of 32 cpd. To assess the reliability and accuracy of each configuration, the average of 10 CSF measurements for the same subject was fitted with a Difference of Gaussians model to extract the peak frequency, cut-off frequency, bandwidth and DC level. We found a positive correlation between peak sensitivity and luminance resolution which suggests that the quasi-continuous resolution technique (noisy-bit) can provide up to 20-bit of luminance resolution. The 3 best techniques, Bits# (14-bit), Display++ (16-bit) and noisy-bit, provided similar estimates for the CSF properties despite 3 different displays (CRT, IPS-LCD and TN-LCD, respectively), while the bit-stealing technique (11.6-bit) appeared insufficient to provide a reliable CSF: peak frequency was under-estimated and both cut-off frequency and bandwidth were over-estimated. We conclude that more than 12-bit of luminance resolution is required to fully assess the CSF, which can be reliably estimated with the noisy-bit method even on a low-quality LCD display.

10:15
The first and third order statistics of background element size modulate perceived target size
SPEAKER: unknown

ABSTRACT. As demonstrated in the Ebbinghaus illusion, the perceived size of an object depends not only on the size of its retinal image but also the size of the background elements. We investigated the effect of the statistics of background element size distribution on the perceived size of a target. The target was a disk (240 arcmin diameter) on a frontoparallel plane. The background texture consisted of 5000 randomly distributed disks whose size drawn from distributions varied in mean (60 to 600 arcmin), standard deviation (0 to 0.27 fold of the mean) and skewness (-0.37 to 0.37). We used a two-interval forced-choice paradigm to measure perceived target size at various background textures. In each trial, the target with a background was presented in one interval while a reference disk on a blank background was presented in the other. The task of the observers was to determine which interval contained a larger disk. We measured the point of subjective equality (PSE) for the perceived target size with a staircase procedure. In general, the perceived target size decreased with mean background disk size until it reached 360 arcmin. After that the perceived target size changed little with further increase of mean background disk size. The variance of the background element size did not affect the perceived target size. The perceived target size on a background with negatively skewed distribution was smaller than that with a symmetric distribution when the mean of the distribution was small but larger when the mean was large. Our result shows that only the first and third order statistics, but not the second order statistics of the background can modulate the perceived target size.

10:15
Scenic views through a window affect the perception of space brightness of a room
SPEAKER: unknown

ABSTRACT. Daylight is an easily available natural energy source, and there has been an increasing demand in its use for maintaining bright environments with energy conservation in today society. Recent studies have reported that illuminance does not enhance space brightness as efficiently as expected despite increased illuminance by daylight. However, these studies used frosted glass windows; therefore, no scenic view was visible. We often observe scenic views through windows in daily life situations; therefore, scenic views could possibly modulate the perception of space brightness. In this study, we investigated this possibility by using a window, which provided scenic views as well as daylight. The experiment involved two scale models simulating offices: one was a room with a window (test room) and the other was without (reference room). There were two types of scenic views (natural or urban landscape) with or without a human-shaped board covered with a full-length photograph of a man. Daylight was simulated by fluorescent lamps and its intensity was manipulated by changing the number of the lamps. Room illuminance from ceiling lights without daylight (i.e., base illuminance) was also manipulated. Participants viewed the two models repeatedly and rated space brightness of the test room relative to that of the reference room. The results revealed that the efficiency of simulated daylight for brightness enhancement was much lower than that of horizontal illuminance. A comparison between the present results and those of a previous study revealed that brightness enhancement by daylight was lower with a scenic view than without. The results did not change with the presence/absence of the human-shaped board. These results suggest that scenic views can modulate the perception of space brightness.

10:15
Is a tablet PC with an OLED display suitable for color vision experiments?
SPEAKER: unknown

ABSTRACT. The tablet PC with an OLED display had wide gamut and acceptable price. Is it suitable for color vision experiments? We chose two table PCs with the OLED display (SM-T800 and GT-P6800, Samsung) to evaluate the display characteristics of the devices. Images of stimuli were generated and saved as a 16-bit color deep PNG file by software running on a separated PC. The images were displayed on the devices through a web browser (Chrome for Android, software version 39.0.2171.93, Google). The spectrum distribution was measured by a spectroradiometer (Eyeone, GretagMacbeth). A physiologically-relevant LMS color space (Smith, V.C., Pokorny, J., 1996. Color Res. & App. 21, 375–383) were used in the experiment. Results showed that the display error rate ( [measured_value - expectant_value] / expectant_value) of luminance coordinate were ranged within ±5% while those of L, M, S values in the LMS space were much larger than that of luminance coordinate especially that of S value.

10:15
Mental pressure might enhance vection
SPEAKER: unknown

ABSTRACT. Exposure to a visual motion field that mimics the retinal flow produced by locomotion typically induces a compelling illusion of self-motion (vection). We recorded vection strength with and without mental pressure. Participants held either a full glass of water (high pressure) or a half glass of water (no pressure) while experiencing vection, and were instructed not to spill any of the water. The experiment employed a between-subject design with 20 undergraduate and graduate school students (10 in each condition). All participants reported normal vision and had no history of vestibular system disease. Stimuli were generated and controlled by a computer (MacBookPro, MD101J/A; Apple) and presented on a plasma display (3D Viera, 65 inch, Panasonic, 1920 × 1080 resolution with a 60 Hz refresh rate). The experiment was conducted in a dark chamber. Optic flow displays consisted of 1240 randomly positioned dots per frame with projected global dot motion that simulated forward self-motion (20 m/sec). Stimulus duration was 40 s. The stimuli subtended 100 deg (horizontal) × 81 deg (vertical) of visual angle at a viewing distance of 57 cm. Results showed that in the full-glass condition, vection latency was shorter, its duration was longer, and its magnitude was larger than in the half-glass condition. Thus results showed that vection was enhanced in the high-pressure condition. We propose that vection was facilitated when participants consciously thought about not moving. Finally, vection strength can be modulated by our mental states. We assume that the pressure, anxiety, and stress resulting from trying not to spill the water enhanced vection. Having a full grass of water is very easy to be replicated and this method can be used in various visual perceptual tasks, e.g. motion perception, shape perception, and face perception. We hope that our method will be utilized in various fields of psychology.

10:45-12:30 Session 3: Talk: Learning & Adaptation

Talk Session in Learning & Adaptation.

Location: FunctionHall
10:45
Developmental changes in face identity processing: fNIRS-adaptation studies
SPEAKER: unknown

ABSTRACT. Previous developmental studies using near-infrared spectroscopy (NIRS) revealed that the infants' bilateral temporal areas, especially right temporal area, were involved in the face processing (e.g., Otsuka et al., 2007; Nakato et al., 2009). However, it remains unclear whether facial identity is processed in these face sensitive temporal regions in infants. To this end, we applied the neural adaptation paradigm (e.g., Grill-Spector et al., 1999) to NIRS measurement. Using NIRS, we measured the hemodynamic responses in infants aged 5- to 8-months during the 10 sec presentation of the same person's face (same-face condition) and five different faces (different-face condition) when manipulating the size, viewpoints and expressions of the face. By comparing the brain activity between two conditions, we examined whether 5-8-month-olds showed the significant lower response (adaptation) to the same-face condition than different-face condition. In Experiment 1, when face stimuli were presented only in frontal view and without any facial changes, we confirmed that 5- to 8-month-olds showed the adaptation to the same-face condition. Even if the sizes of face stimuli were changed (Experiment 2), adaptation to the same-face condition also occurred in 5- to 8-month-olds. However, when the faces were presented with changes in viewpoints (Experiment 3) and expressions (Experiment 4), only 7-8-month-olds showed the adaptation to the same-face condition, but not 5-6-month-olds. Our results suggest that (1) infants’ bilateral temporal areas are involved in the processing of facial identity, (2) facial identity processing is size-invariant at least 5 months of age, and (3) facial identity processing invariant to changes in view and facial expression becomes robust and efficient over 7 months of age. We are examining the infants' hemodynamic changes for a female face in own- and other-race category. In addition to NIRS-adaptation study, we will show the patterns in hemodynamic response to own- and other-race faces.

11:00
Amodal completion and autistic traits in facial expression aftereffect
SPEAKER: unknown

ABSTRACT. Human visual system is highly intelligent; most objects we see in our daily life are partially occluded by other object(s). Instead of perceiving them as isolated fragments, however, we have no trouble viewing them as a continuing object. This phenomenon, amodal completion, gained its popularity in the last three decades. However, most studies focus on the process of low-level stimuli (e.g., simple geometry shapes), the present study tested amodal completion in high-level facial expression and its relation with autistic traits via visual adaptation. We first generated a set of test faces whose expressions ranging from happy to sad. To interfere the process of amodal completion, six sets of adapting faces were also generated by manipulating the dynamics of facial parts. All adapting faces were displayed at the same location and size as test faces. Participants judged facial expression of the test faces as “happy” or “sad” on a two-alternative forced-choice (2-AFC) research paradigm via a key press. Baseline condition without any adapting stimulus was also included. We also measured Autism-Spectrum Quotient (AQ) traits of participants during the experiment. Significant aftereffect were found when the adapting face was perceived as coherent (amodally completion occurred), but not in the disrupted condition. Furthermore, AQ score is partially correlated with adaptation aftereffect. It therefore suggests that amodal completion occurs in high-level emotional faces and influence the perception of a subsequent face, and it may be related to autistic traits. Our findings may shed light on the mechanisms of face perception and autism.

11:15
Motion smoothness aftereffect is based on adaptation to local differences in motion vectors.
SPEAKER: unknown

ABSTRACT. Human visual system can extract various types of information, like locomotion directions, 3D shapes of object, object material, etc., from spatio-temporal pattern of local motion signals. Among many potential visual cues included in motion flows, spatial smoothness of motion flow plays an important role, for example, in perceiving the liquids from motion signals (Kawabe et al., Vision Research, 2015). How the visual system analyzes the smoothness of motion flow, however, remains unclear. To approach this problem, we are investigating the mechanism underlying a novel visual aftereffect, termed motion smoothness aftereffect, in which the perceived smoothness of motion field becomes higher after adaptation to non-smooth flows than after adaptation to smooth flows (Maruya, Kawabe, & Nishida, VSS2015). In our original demonstration, the adaptation stimuli consisted of a dense array of moving noise patches with various levels of smoothness. In this study, we rendered the adaptation stimuli spatially sparse by replacing a half of the patches in alternate rows and columns of original adaptation patterns with uniform gray fields (sparse patterns). This manipulation selectively removed short-range relative motions among adjacent patches, while it affected the overall smoothness ratings for the adaptation stimuli only modestly. We examined how the decrease in relative motion components affected the aftereffect. We used sparse patterns made from smooth and non-smooth motion patterns as adaptation stimulus and original dense patterns with intermediate smoothness as test stimulus. Observers’ task was to rate by number (1-5) the perceived smoothness of test stimulus after adapting to a sequence of either smooth or non-smooth sparse patterns. We found no smoothness aftereffect after adaptation of the sparse patterns. This implies that the change in the perception of motion flow smoothness in the motion smoothness aftereffect is mainly produced by adaptation of the neural mechanisms encoding short-range relative motions among nearby motion signals.

11:30
Inter-trial adaptation to fast translating dots reveals direction and orientation effects: evidence for motion streaks
SPEAKER: unknown

ABSTRACT. Using a long series of brief, rapid motion trials, we investigated short-term adaptation using an inter-trial analysis. In experiment one, observers viewed brief motion bursts of 200 ms drawn from a range of directions spanning vertical and indicated whether the direction appeared vertical or not. Responses were compiled into Gaussian ‘perceived vertical’ distributions. Subjective vertical was very accurate with a tight bandwidth. In an inter-trial analysis, response distributions for the current trial (t) were re-compiled based on the direction in the preceding trial (t-1). This revealed clear effects of inter-trial adaptation: for small negative directions (1.5° & 3°) on trial t-1, distributions shifted positively, and small positive directions on t-1 shifted distributions negatively. This pattern shows the classical repulsive aftereffect. For larger t-1 directions, the adaptation effect changed to an attractive aftereffect. In a second experiment, we interleaved upward and downward dot motions, each varying around vertical with the same angular range. The inter-trial analysis revealed much stronger adaptation effects, and all effects were attractive. Because direction-selective neurons have a preferred direction and a null response to the opposite direction (whereas up/down does not change orientation), interleaving up/down motion cannot produce inter-trial motion adaptation. Instead, we attribute the strong attractive effects to motion streaks, the oriented ‘trails’ produced by neural temporal integration which are presumed to activate orientation-selective units. We conclude that experiment 1 contained adaptation due to motion and to orientation, that experiment 2 isolated the orientation component, and that both components adapt rapidly to reveal inter-trial adaptation. Subtracting Experiment 2’s results from Experiment 1 predicts the motion-only component. Experiment 3 tested this by interleaving upward grating motion with upward dot motion, finding that grating adaption (containing no streaks) produced strong, broad repulsive adaptation, as predicted.

11:45
Under-stimulation at untrained orientation may explain orientation specificity in perceptual learning
SPEAKER: unknown

ABSTRACT. Perceptual learning (PL) can transfer completely to an orthogonal orientation if the latter is exposed through an irrelevant task in a Training-plus-Exposure (TPE) paradigm (Zhang et al., 2010). This and additional evidence for learning transfer to new locations/hemisphere after double training (Xiao et al, 2008) suggests that PL reflects cognitive changes beyond the early visual areas. However, it is unclear why PL is orientation specific in the first place and why exposure to the transfer orientation enables learning transfer. Here we used a continuous flashing suppression paradigm to investigate the role of orientation exposure in TPE training. Foveal orientation discrimination was always trained at one orientation. In other blocks of trials flashing white noise was presented to one eye, which suppressed the awareness of an orthogonal Gabor (sometimes a letter C) presented to the other eye. In Experiment I, the observers reported the color (red/green) of a small dot centered on the flashing noise images. They were not told that an orthogonal Gabor was shown to the other eye. This bottom-up orientation exposure produced partial learning transfer to the orthogonal orientation. In Experiment II, the observers guessed whether a Gabor/C was presented, but the orthogonal Gabor was not shown. Such top-down only “orientation exposure” led to no learning transfer. In Experiment III when the orthogonal Gabor did show, learning transfer was complete with this combined bottom-up and top-down orientation exposure. These results indicate that bottom-up orientation exposure is required for learning transfer, and that orientation specificity may result from under-stimulation of untrained orientations, possibly because these orientations are unstimulated or even suppressed during training. Although top-down influence itself has no impact on learning transfer, it can boost the effect of bottom-up exposure, so that high-level learning can functionally connect to new orientation inputs for complete learning transfer.

12:00
Adaptation to Symmetry Axis --- Towards Understanding the Cortical Representation of Shape ---
SPEAKER: unknown

ABSTRACT. Symmetry has long been considered as an influential factor for grouping and figure-ground segregation as well as a candidate for representing shape as medial axis. A recent psychophysical study has further reported the adaptation to symmetry in random dot patterns [Gheorghiu, Bell & Kingdom, VSS 2014]. However, natural images are not precisely symmetric in terms of geometry, thus quantification of the degree of symmetry (DoS) has been needed. We have proposed DoS that is computed based on the degree of the overlap of contours between the two sides divided by the optimal axis of reflection symmetry [Sakai & Kurematsu, VSS 2014]. DoS showed an agreement with the perception in the judgment of symmetry axis of natural contours. The result indicated that DoS reflects the perception of symmetry, and suggested that the participants unconsciously used symmetry axis to perceive symmetry. We performed psychophysical experiments to examine whether symmetry axis is an adaptable feature of the visual system. Specifically, we tested whether the perceptual tilt of symmetry axis is altered by adaptation. We generated a set of stimuli that consisted of mirror-symmetric arrangements of random dots. The stimuli were comprised of a small number of random dots so that their symmetry axes were invisible. A pair of stimuli whose axes were tilted ±10o from the vertical was presented for adaptation. Another pair of stimuli with a distinct dot pattern/contrast was presented without tilt. Using a staircase procedure, we measured the apparent tilt of the symmetry axes. The results showed the significant adaptation to symmetry axis. We also performed another set of experiment with natural contours as adapter, and observed significant adaptation. These results indicate that symmetry axis is an adaptable feature in the visual system, suggesting that the perception of symmetry axis is a basis for symmetry perception.

12:15
Adapting to sad scenes and words can lead to changes in face emotion perception.
SPEAKER: unknown

ABSTRACT. Adapting to an emotional face can lead to aftereffects when judging a subsequently presented face; for example, after being shown a sad face for a few seconds, participants are more likely to rate a following face as happier. It remains unclear, however, if these adaptation aftereffects only arise from the low level visual properties related to a face, such as a mouth’s curve, or whether the emotional valence of a non-face stimulus can have an influence on subsequent face emotion judgments too. In the present experiment, we tested whether sad scenes and words could also lead to adaptation aftereffects when judging face emotion. Using morphing software we created a series of faces that ranged from happy through to sad and asked participants to rate whether the faces were happy or sad in a baseline condition. Participants then had to judge whether these morphs were happy or sad again after being adapted to sad scenes, sad words, or a sad face. We found that sad scenes, sad words and the sad face categories all led to adaptation aftereffects in face emotion perception relative to the baseline condition; faces presented after the adapting stimuli appeared happier to the participants. These results suggest that emotion adaptation aftereffects are not solely driven by the low level visual properties related to a face’s emotion, but can also be produced by the emotional valence of non-face stimuli.

13:30-15:00 Session 4: Invited Talks: Psychophysics

Two invited talks in Psychophysics given by Dr. Fred Kingdom and Dr. Sheng He.

Location: FunctionHall
13:30
New adventures with dichoptic colours
SPEAKER: Fred Kingdom
14:15
Functional significance of feedback signals in early visual cortex
SPEAKER: Sheng He
15:00-17:45 Session 5: Poster: Faces, Objects, Perceptual organization

Poster Session in Faces, Objects, Perceptual organization.

Location: SemRMs
15:00
Holistic and featural processing for 2D and 3D face recognition
SPEAKER: unknown

ABSTRACT. Face perception is special. Face recognition is processed holistically, whereas the recognition of inverted faces is processed featurally. However, these studies tested on 2-dimensional (2D) faces. Therefore, the extent to which such findings can be applied to more realistic stimuli, such as 3-dimensional (3D) faces is unclear. The current study examined whether 2D and 3D faces were processed at different degrees of holistic and featural processing using face inversion paradigm. Twenty-five participants completed a face-matching task consisting of upright and inverted faces that were presented in both 2D and 3D formats in the form of stereoscopic images. Participants were required to wear 3D glasses during the experiment. It was found that 3D upright faces were recognised with significantly greater accuracy than 2D upright faces. This provides evidence that the enriched visual information in 3D improves the precision of face recognition mechanisms. On the other hand, we did not find any significant difference in accuracy or reaction time between 2D and 3D inverted faces. Since upright faces preserves the holistic structure, whereas inverted faces disrupts the holistic structure while preserving individual features, the current findings suggest that 3D advantage occurs at the holistic processing level but not at the featural processing level for facial stimuli. Our study sheds light on the mechanisms of face recognition and object recognition in general.

This research is supported by BeingThere Centre (NTU), funded by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office.

15:00
What is learned to discriminate eye of origin of visual inputs after multiple weeks of training?
SPEAKER: unknown

ABSTRACT. Humans cannot discriminate ocular origin of visual input when informative cues such as feeling-in-the-eye (from input luminance) are made uninformative (Ono and Barbeito, 1985). One week of daily practice with feedback is insufficient (Blake and Cormack, 1979). We consider extended training.

Two authors practiced 400 trials a day over multiple weeks (bar occasional holidays). A trial involved a central, binocular, fixation stimulus; then a 200ms dichoptic test stimulus containing monocular bars of random luminance arranged on a slightly jittered Manhattan grid, with vergence anchored by binocular dots at the centroid of each quartet of bars; and finally a binocular mask. Observers reported the central target bar's ocular origin, receiving immediate feedback. The target's orientation (vertical or horizontal) was constant on each day. Distractor bars were oblique; independent of the target's ocular origin, a randomly chosen (about) half were clockwise from vertical and shown to one eye; the others were anticlockwise and shown to the other. A random third of trials contained no target bar and demanded a distinct response. Training started with vertical targets.

Performance rose above chance consistently after 14-20 training days, ultimately reaching 70-80% correct. Observers became more confident, remaining unable to name discriminating features. On her 30th day of vertical target training, one observer noticed an ocular bias in the apparent tilt of the target, perhaps from astigmatism. Her performance dropped to chance when the target was randomly tilted within 10 degrees of vertical on each trial. The second subject's performance survived this manipulation, despite his astigmatism. Performance dropped to chance when the target was first switched to horizontal; then roughly followed the original learning curve. Subsequent retesting on vertical targets showed no interference. New glasses lacking astigmatism correction worsened performance to the lower bounds of daily performance fluctuations. More diagnosis will hopefully identify what he learned.

15:00
Alternation between- and within-class selectivity across the ventral occipital cortex: evidence by region-of-interest and whole-brain multi-voxel pattern analyses
SPEAKER: unknown

ABSTRACT. The Ventral Occipital-Temporal (VOT) cortex has been implicated in face and object processing, but how does their shape influence the processing remain largely unclear. One of the earlier study, Wong et al., (2009. PLOS One 4:12: e8405) found that compared to before training condition, Ziggerin training led to increased activities in medial (than lateral), and more in anterior (than posterior) fashion in the categorization (than individuation) task, suggesting a increases in medial/anterior region for between-category selectivity after training. In this study, we partly address this hypothesis by having 18 novices (11 from NCKU, 7 from CUHK) view between- and within-category Ziggerin examples, making either categorization or individuation judgments (across-run manipulated) on a 5-repetitions-1-test (totally 6 TRs) blocked fMRI. ROI MVPA was applied on 50 ventral occipito-temporal areas (5x5 ROIs along each 10-mm y and z axis, and 25 ROIs along the posterior occipital lobe of each hemisphere), and their Between-Class and Within-Class Classification (BCC and WCC) under categorization (Cat) and individuation (Ind) tasks were calculated by both ROI MVPA and searchlight mapping. ROI MVPA results showed an (significantly) alternating up-down-trend in the RH ROIs, and a fall-then-up trend in LH ROIs, in only the BCC-Cat condition, consistent with the suggestions from our earlier (PLOSOne2009) findings. Such alternation of MVPA classification accuracies was both consistent and idiosyncratic across subjects, as revealed by the searchlight mapping results. Together, these data lend extra support to the suggestion of “division of labor” of VOT areas to different task demands. Speculations of such mechanisms were also discussed.

15:00
The effect of luminance values of edges on figure-ground assignments
SPEAKER: unknown

ABSTRACT. We examined whether luminance values of edges affect figure-ground assignments. Dark- or light-gray region was presented on the right or left side of an edge or on the upper or lower side of an edge. Subjects reported which region was perceived as a figure. The results showed that light-gray region in the high luminance edge condition was significantly perceived as a figure more frequently than that in the low luminance edge condition. Dark-gray region in the low luminance edge condition was significantly perceived as a figure more frequently than that in the high luminance edge condition. There was no anisotropy in figure-ground assignments between upper and lower locations of the region. These results showed that luminance values of edges affect figure-ground assignments, and that regions with luminance values closer to those of the edges were perceived more frequently as a figure.

15:00
Anger or disgust? Adults are confused about it!
SPEAKER: unknown

ABSTRACT. The degree of distinctiveness between emotions is not uniform (Ekman, 2004). In children, facial expressions of disgust have been found to be less well recognized than expressions of other basic emotions (Gagnon, 2010). We aimed to investigate whether this confusion manifests in adults. In Experiment 1, we used four basic negative facial expressions from the Taiwanese Facial Expression Image Database (TFEID, Chen, 2007): disgust, anger, fear, and sadness. The intensity of each facial expression was either high or low. The participants were asked to judge the emotional category and intensity of each expression. We used multidimensional scaling method to analyze the data. The results indicate that participants were more likely to confuse disgust with anger, especially low intensity disgust with high intensity anger, than to confuse disgust with fear or sadness. In Study 2, we used an emotion priming paradigm to investigate whether the confusion between low intensity disgust and high intensity anger resulted from overlapping the facial features of these two expressions. We used faces expressing high intensity negative emotions (disgust, anger, fear, or sadness) as primes and corresponding low intensity negative expressions as targets. Presentation times of the primes were manipulated to be short (33 ms) or long (100ms). The participants were asked to judge whether the target face expressed disgust or not. The results reveal when the prime expressed anger, it would facilitate participants’ performance to judge the target face as disgust. No such priming effect was found when the primes expressed fear or sadness. Collectively, our findings suggest that there was confusion between anger and low intensity disgust even in adults, and that this confusion resulted from overlapping the facial features of presented faces.

15:00
Perceived displacement of dots in Giovanelli’s illusion: apart from the center of the circle?
SPEAKER: unknown

ABSTRACT. Dots, aligned linearly, are perceived misaligned when each of them lies within irregularly arranged circle (Giovanelli, 1966). While this illusion is generally explained with an illusory displacement apart from the center of the circle, it remains an open question. In the present study we examined the illusion by using a paired comparison method. Stimulus figures contained 4 dots in a square shape; each of them was surrounded by misaligned larger circle, which is called an inducer here. In Exp.1 we manipulated radius of the inducers with keeping their center position fixed. A pair of stimuli was drawn and displayed side by side on an LCD monitor in each trial. Participants were asked to view it and forcibly choose one which appeared more distorted. This procedure was repeated for all stimulus pairs. The result suggested that the participant perceived the dots shape more distorted as the inducers’ radius decreased. This is inconsistent with the idea of the displacement apart from the inducer's center. In Exp.2 we examined a factor of spatial interval between the dot and an edge of the inducer separately from that of its center position. For this purpose we adopted ellipses as the inducer instead of the circles. The spatial interval and the center position were controlled systematically by manipulating the length of ellipse axes. The result was again inconsistent with the conventional “apart from the center” idea. Rather these results can be well explained with the spatial relation between the dot and the inducer’s edge. That is, the dots are likely to be perceived displaced not apart from the center of the inducer, but towards the inducer’s edge in Giovanelli’s illusion. This interpretation is discussed in relation with the gravity lens illusion (Naito and Cole, 1994).

15:00
The effect of orientation and length of added segments over the mortar line on the Cafe wall illusion
SPEAKER: unknown

ABSTRACT. The Café wall illusion refers to an illusion that the orientation of the gray line (mortar line) between displaced rows of alternating black and white blocks is perceived to be tilted. The present study investigated how external stimuli added to the illusion stimuli affects the pattern of illusion and the confidence of the perception of the illusion. Specifically, short line segments with various orientations (horizontal, additive to the perceived tilt, or opposite to the perceived tilt) were placed over the mortar line. Importantly, in one condition, the line segments were within the mortar line, while in the other condition, the length of line segments increased such that the segments reached beyond the mortar line. The results showed that when the line segments were within the mortar line, the additive-oriented line segments magnified the Café wall illusion. By contrast, when the line segments reached beyond the mortar line, the opposite-oriented segments evoked the largest illusion. The confidences of the perception were similar across conditions. These results suggest that Zollner effect generated by segments that were placed over the mortar line should affect Café wall illusion.

15:00
Influence of color on the facial attractiveness judgment
SPEAKER: unknown

ABSTRACT. Facial attractiveness can be judged even with a short exposure duration as several tens of milliseconds to the face. However, it has been known that the perceived facial attractiveness with shorter exposure duration tends to be higher than that after longer exposure durations. In the current study, we examined how the facial attractiveness evaluations differ between faces presented in full-color and those in gray-scale and how the influence of color information changes depending on exposure duration. In the experiments, 58 Asian female facial photos and 40 Asian male facial photos were divided into two groups to have similar levels of attractiveness and presented with 2 exposure durations (20 or 1000 ms) and in original full-color or in gray-scale with similar luminance levels. Separate experiments were conducted for the female facial photos (43 participants) and the male facial photos (45 participants). The participants evaluated the facial attractiveness of the photos with a 7-point scale. The photos were presented for the pre-assigned exposure duration in a randomized order. The results showed that the perceived attractiveness was generally lower with the 1000-ms duration. In addition, color information did not have any effect on attractiveness evaluation except that the gray-scale photos were rated more attractive when the female facial photos were presented for 1000 ms. The finding may have implications on the difference in cues used to judge attractiveness of male and female faces. This work was partly supported by grants from the Cosmetology Research Foundation, JSPS, JST.

15:00
Do I Know You? The Own-Race Bias and Eye Tracking for Face Recognition in Malaysians and Caucasians
SPEAKER: unknown

ABSTRACT. The own-race bias (ORB) is the phenomenon in which people's ability to recognise faces of their own race is better than for faces from other races. Although the own-race bias is a robust phenomenon sustained by meta-analytic study (Meissner & Brigham, 2001), only recently have researchers questioned whether identical underlying strategies are used when observers of different races perceive faces. To further understand the mechanisms underlying ORB and how perceivers of different races in Malaysia and Caucasians process human faces, the current study aimed to: (1) explore the ORB in Malaysian (i.e. Malaysian Malay, Malaysian Chinese, and Malaysian Indian) and Caucasian observers by comparing their recognition performance; (2) investigate how the eye movement patterns produced by Caucasians differ from those of individuals from the three major ethnic groups in Malaysia when performing an old/new face recognition task involving own- and other-race faces. In the face recognition task, participants viewed a number of faces during the learning phase and then subsequently viewed half of the previously presented faces intermixed with distracter faces. During the recognition phase, participants were required to determine whether each face had been seen in the learning phase. We measured recognition accuracy, sensitivity and response bias in 94 young adults in order to examine their face-processing ability. The participants’ eye movements were also recorded with the use of an eye tracker. Broadly in line with findings from previous studies on ORB, the behavioural results showed that young adults had superior face recognition performance for own-race faces across different ethnic groups. The eye-tracking results showed that participants from different ethnic groups tend to adopt broadly similar eye movement patterns although some significant differences were found.

15:00
The effect of horizontal and vertical strokes on the efficiency of Chinese character recognition in central vision
SPEAKER: unknown

ABSTRACT. Objective: It was found that horizontal information might be more important than vertical information in recognizing faces (Dakin & Watt, 2009 Journal of Vision 9(4):2 1-10), a category of objects which shares homogeneous internal structure with Chinese characters. Chinese readers learn strokes first when they learn to write Chinese characters. This study is to examine the fundamental mechanism of Chinese character recognition regarding the stroke orientation information utilization.

Methods: 26 Chinese characters with comparable spatial complexity were digitally filtered into horizontal-stroke and vertical-stroke conditions. The filtered stimuli were either covered by noise or noiseless when they were presented on the screen. Three native Chinese and two non-Chinese participants joined this study. The participants were required to do a 26 AFC recognition task in which the contrast threshold of recognizing noisy and noiseless filtered characters was measured. In order to calculate the recognition efficiency, human participants’ performance was compared with that of an “ideal observer” (i.e., an optimal computational model) of this task.

Results: Native Chinese readers used vertical-stroke (efficiency = 1.6%) information around 3 times more efficiently than horizontal-stroke (efficiency = 0.6%) information when recognizing Chinese characters (p = 0.048). On the contrary, non-Chinese participants utilized vertical-stroke (efficiency = 1.2%) and horizontal-stroke (efficiency = 0.8%) information similarly in recognizing Chinese characters (p = 0.534).

Conclusion: Native Chinese readers had greater sensitivity to differentiate the vertical-stroke and horizontal-stroke information compared with that of non-Chinese readers. The results implied that possessing expertise facilitated the component (i.e., stroke orientation) processing of Chinese character recognition. This mechanism is not consistent with the recognition mechanism of other category of objects such as faces, for which expertise might be marked by holistic processing.

15:00
Facial attractiveness modulates temporal attention in rapid serial visual presentation
SPEAKER: unknown

ABSTRACT. Attractive faces catch the viewer’s eye in a rapid and mandatory fashion. Recent research has demonstrated that an attractive face captures greater spatial attention than an unattractive face does, even if the appraisal of facial attractiveness is task-irrelevant. Little is known, however, about the temporal characteristics of visual attention for facial attractiveness. We used a rapid serial visual presentation (RSVP) procedure to examine whether an attractive face captures greater temporal attention in a stream of face images presented in rapid succession. In our experiments, a number of faces were successively presented in a short time, and participants were required to identify two female faces embedded in multiple distractor faces (male faces for Experiment 1 and animal faces for Experiment 2). Each face was presented for 160 ms in Experiment 1 and 120 ms in Experiment 2. In Experiment 1, we manipulated facial attractiveness of a first female target (attractive, neutral, or unattractive) and temporal distance of onset between a first and second female target (320 ms, 640 ms, or 1280 ms). We found that identification of a second female target (T2) was impaired when a first female target (T1) was attractive rather than neutral or unattractive at 320 ms SOA, suggesting that the attractive T1 enhances the attentional blink. In Experiment 2, we manipulated facial attractiveness of a second female target and temporal distance of onset between two female targets (360 ms, 720 ms, or 1080 ms). Results showed that the attractive T2 was correctly identified better than unattractive faces. Taken together, our findings indicate that facial attractiveness is spontaneously appraised, and an attractive face captures greater temporal attention even in a rapid stream of multiple face presentations.

15:00
Evaluating Human Performance in Dynamic Perspective Invariant Face Recognition
SPEAKER: unknown

ABSTRACT. The aim of this study is to investigate and derive plausible consistent eye gaze scan path, and set of facial features learnt from unfamiliar faces in unconstrained dynamic motion (rigid and non-rigid motion) for subsequent recognition tasks using psychophysical experiments. Existing literature reported a shared observation that face recognition performance, in terms of accuracy for face verification and identification, are enhanced when subjects have been trained using faces in dynamic motion compared to when they only learn with static face images (Knight and Johnston, 1997; Lander and Chuang, 2005; Xiao et al., 2013). Although prior work suggested that dynamic motion provides additional information with an increase in number of frames to the identity of a face than static images (O’Toole et al., 2002; Schultz et al., 2013), the type of additional information (e.g. facial features, gaze scan path) underpinning that conclusion has not been discovered and explained. Our experiments aim to identify such features that are learnt by human subjects from dynamic motion to gain insight on the possible strategies engendering human’s superior performance across difficult verification conditions in unconstrained face recognition tasks, such as variations in illumination, viewing perspectives (poses), expressions, and age. Given that participants are generally unfamiliar with the face stimuli presented during the experiment, we are probing for generalized eye gaze scan paths and/or set of frequently fixated facial features for the respective verification condition. Such eye gaze scan path strategies and features will be evaluated later for potential translation into computational models in machines to emulate the competence of human recognition system in the hopes of improving current state-of-the-arts face recognition technology in the field of artificial intelligence.

Corresponding Authors have equal contributions to this work.

15:00
Ladies, don’t be angry: a coarse and negative facial expression tend to be judged a man
SPEAKER: unknown

ABSTRACT. We investigated whether or not our ability in gender identification on emotional faces was influenced by coarse information of the face image which was filtered by low spatial frequency (cut-off frequency was 6 cycles/image) in Experiment 1 and was presented with a brief time (30 ms and was followed by a 100ms mask) in Experiment 2. Participants needed to identify the gender of the faces with four kinds of expressions (happy, sad, fear, and angry). Results showed that in both experiments, participants had higher accuracy to male faces than to female faces when these facial expressions were negative. The identification on an anger man and a happy woman had an advantage among low-pass filtered faces. However, only an anger man had an advantage when faces were fast presented. We concluded the impact of emotional expressions on identifying a face gender was processed quickly and through an early and coarse visual pathway.

15:00
Modest effect of perspective distortion on object recognition
SPEAKER: Ryosuke Niimi

ABSTRACT. The viewing distance from an object alters not only the object image’s size but also its shape. When viewing distance is very short (i.e., close-up, wide-angle), the object image has strong perspective distortion. In contrast, long/infinite viewing distance results in little/no perspective distortion—namely, a telephoto or parallel-projection image. Does perspective distortion affect visual object recognition? Experiment 1 examined naming accuracy for briefly presented and masked images of common objects. Perspective distortion was varied across three levels: strong (wide-angle), normal, and no distortion (parallel projection). Object image size was kept constant. The "normal" condition simulated a 50 mm focal length lens for a 35-mm film camera. A subjective rating experiment on goodness of view confirmed that the stimulus images looked better in the normal condition than in the other two conditions. Strong distortion yielded lower naming accuracy than normal and no distortion. Experiment 2 examined whether the accuracy reduction was due to inconsistency between the strong perspective distortion and the participants' observing distance (approximately 57 cm) from the stimulus image. In Experiment 2, the observing distance was shortened to 15 cm; the stimulus images of the strong distortion condition were optically consistent with this. The same result pattern was observed, signifying that strong perspective distortion per se affects object recognition. However, the effect size was not strong (less than 10% in either experiment). Further, the effect was even absent in a preliminary experiment using a speeded word-picture verification task. It may be that perspective distortion affects object recognition through some collateral factors such as familiarity.

15:45-16:30 Session 6: Talk: Perception, Action, & Decision making

Talk Session in  Perception, Action, & Decision making.

Location: FunctionHall
15:45
ANISOTROPY IS ALL AROUND ME – PERCEIVED DISTANCE CHANGES IN NEAR SPACE

ABSTRACT. Perceived distance is anisotropic in a sense that people tend to perceive vertical distances above them as larger than horizontal distances in front of them. We argued that this effect might be due to gravity integration into our perceptual action schemes. Namely, if one reaches to grab something above, his movement opposes gravity and therefore it requires more effort than reaching for something on horizontal direction. If perceived distance above would be enlarged it would help in action performance since it would ask for more effort. Surprisingly, in previous studies we gained perceived distance anisotropy on larger distances, 3m and larger, but not on near distances, such as 1m, which would be expected according to this hypothesis. Therefore, we performed two experiments, in a reduced cue situation, in which participants visually matched distances of two dim light stimuli on two directions, vertical and horizontal. Participants (14+13) were in an upright position and only difference between the experiments was in standard distances. In the first experiment standard distances were 1m, 3m and 5m, while in the second they were 0.4m, 0.6m, 0.8m and 1m. Results of the first experiment show a significant difference between two directions only on 3m and 5m distances, but not on 1m distance. On the contrary, results of the second experiment show significant differences between two viewing directions on all examined distances, 0.4m, 0.6m, 0.8m and 1m. We can conclude that the absence of significant difference in the first, as well as in all previous experiments, on closer distances is probably a statistical artefact, since errors grow with increasing the distance, which does not happen if we use closer distances only. Results are in line with hypothesis on gravity integration since they show anisotropy exists in far and in near space.

16:00
Grasping slanted objects under conditions with varied depth information
SPEAKER: unknown

ABSTRACT. To accurately grasp a planar object that is slanted in depth, the visual-motor system must use depth information to determine its 3D size and shape. We investigated whether conditions that produce errors in perceived 3D slant produce similar errors in the trajectory of the fingers when grasping a slanted object. We compared binocular and monocular viewing, and stimuli with and without cue conflicts. Monocular viewing has been found to produce underestimation of slant, and perceived slant in cue conflict conditions would generally intermediate to the slant specified by monocular and binocular cues. Stimuli were objects with planar faces and 0.75 cm thickness. Four base objects had faces that were random, isotropic shapes with average diameter of 5.4 cm. We also created compressed versions of each base object that were scaled by 0.82 in the vertical direction. For these objects, the aspect ratio of the projected contour would suggest a higher slant. Objects were positioned 35 cm in front of the observer at a slant of either 0°, 35°, 45° or 55° relative to the line of sight. Subjects reached to pick up the object with their thumb and index finger while wearing markers that tracked the 3D positions and orientations of their fingers. We found that subjects oriented their fingers in preparation for grasping, such that different slant conditions could be clearly distinguished when the fingers were close to objects. There was less preparatory rotation of the grasp axis in monocular conditions compared to binocular conditions, consistent with underestimation of perceived slant. In cue conflict conditions, the orientation of the grasp axis was between the expected orientation based on monocular and binocular cues, consistent with an intermediate perceived slant. The results suggest that a common 3D representation underlies perception of slant and control of hand during grasping.

16:15
Active vision: experience in badminton influences how you allocate vision in this dynamic sport.
SPEAKER: unknown

ABSTRACT. Badminton differs from other racquet sports in that the target (a shuttlecock) is very light. Flight trajectories are fast and unpredictable, placing high demands on the visuo-motor system. However, the absence of any bounce point decreases the amount of visual information available to the player in planning an effective response (Chajka et al., 2006; Hayhoe et al., 2012). Previous studies of the use of gaze in cricket and squash show that by using a combination of head and eye movements, players anticipate critical locations such as an expected bounce point. Additionally, experienced players in these sports are shown to initiate predictive saccades to bounce points where task relevant information about trajectory can be obtained (Land & McLeod, 2000). Here, we look at how gaze is used in the absence of any bounce point and if experience makes a difference. Using a portable eye tracker, we recorded the use of gaze by novices and experienced badminton players. When receiving a shot in the absence of a bounce point, all players’ gaze was tightly coupled to the shuttlecock trajectory. However, the experienced players kept their gaze ahead of the moving target, while novices relied on pursuit and catch up saccades (t(30) = -3.52, p = 0.001.). Additionally, novices were unable to disengage vision with the shuttlecock until they had hit it, while the experienced players had the time to saccade back towards the opponent well before executing the return shot (t(10) = 5.37, p < 0.001). Third, while head movements in all players were the most pre-emptive gaze components, the experienced players showed much earlier use of head movements than novices (p < 0.001). Like squash and cricket, vision is predictive even in the absence of a bounce point. And, with experience, you make better use of high level prediction.

17:00-19:00 Session 7: Symposium: Perception & Action

Symposium: Perception & Action

Location: FunctionHall
17:00
Visual information for interception
SPEAKER: Eli Brenner

ABSTRACT. People are much better at intercepting moving targets than one would expect on the basis of their temporal precision in various other visual and motor tasks. It is also not clear how people can hit an accelerating (falling) ball despite being extremely poor at visually judging acceleration. Or how they deal with retinal, neuronal and muscular delays. I will present evidence that the high temporal precision in interception is achieved by a combination of continuously controlling the hand’s position on the basis of visual information about the target’s position and velocity, and moving in a way that minimizes the influence of errors in visual judgments. Support for this proposal can be found by comparing the precision of interception under many circumstances. An important finding from such a comparison is that it shows that people mainly fine-tune where they try to hit the target during the movement, rather than when they try to do so.

17:17
Extracting self-motion and depth information from monocular 2-D image sequences using the properties of primate visual motion neurons
SPEAKER: John Perrone

ABSTRACT. Humans have an amazing ability to extract 3-D depth information from the 2-D retinal image motion occurring in a single eye during movement through the environment. On paper, this is a difficult theoretical problem that requires: (1) The accurate measurement of the image velocity despite variations in contrast and spatial scale, (2) The global estimation of the observer’s heading direction from velocity estimates spread over widely distributed areas of the visual field, (3) The removal of image motion components generated by eye and body rotations, and (4) Estimation of relative depth from the appropriate local velocity measurements specified by the derived heading direction. We know that humans can solve this problem but it is an open question as to how it occurs. Over the years we have developed a model of the V1-Middle Temporal (MT/V5)-Medial Superior Temporal (MST) motion pathway that is considered to be the locus of many of the mechanisms responsible for extracting depth from motion. We have recently completed some initial successful tests of a system that uses the properties of V1, MT and MST neurons to extract depth from 2-D video sequences. This scheme represents a possible neural mechanism by which our visual system is able to generate depth maps that correspond to our perception of a 3-dimensional world. It could also provide the foundation for a more compact and economical sensor for robots and autonomous vehicles that currently rely on a multitude of sensors to extract information about the environment in front of them.

17:34
Visual effects of haptic feedback are large but brief and local
SPEAKER: Qasim Zaidi

ABSTRACT. Perceiving the correct shapes of objects is necessary for inferring object qualities, manipulating tools, avoiding obstacles, and other aspects of functioning successfully in the world. Since observers can estimate object properties from larger distances using vision than they can from touch, generally vision makes predictions that touch relies on, such as the shape of a handle or chair. However, since the information in retinal images is inherently under-determined, the inferential power of vision arises from employing intelligent heuristics/assumptions/priors, but this inevitably leads to illusory percepts in some cases. What are the possible functions of touch in such cases? Observers could rely entirely on the haptic percept and ignore the erroneous visual percept, or touch could temporarily correct the visual percept, or there could be longer lasting effects if observers learn to change their visual prior assumptions or weights for different visual cues. We tested these possibilities by measuring the effects of various types of haptic feedback on the perception of images that evoke incorrect visual percepts despite being proper perspective projections of 3-D surfaces. Observers viewed 8×8° images at the proper distance through a monocular aperture, while actively “touching” the virtual 3-D surface with a SensAble PHANTOM Omni stylus. The results show that in the perception of 3-D shapes from texture cues, haptic information can dominate vision in some cases, changing percepts qualitatively from convex to concave and concave to slant. The effects take time to develop, are attenuated by distance, drastically reduced by gaps in the surface, and fade rapidly after the cessation of the feedback. These dynamic shifts in qualitative perceived shapes could be a key to whether haptic feedback modifies the gain of neurons responsible for percepts of 3-D curvatures and slants, or the shape-tuning, or whether haptic-visual interactions happen after independent decisions in the two modalities.

17:51
Motion Perception and the Moving Observer.

ABSTRACT. Traditionally, motion perception research is performed under strict conditions. Observers were immobilized by chinrests, often in dark rooms where sparse stimuli were presented on standard CRT-displays. This has facilitated important insight into the thresholds of our motion detection system. However, most motion that we process and perceive is generated as a result of our own movements (optic flow). Under these conditions other senses, in particular the vestibular system, also supply the brain with essential information. Therefore, in the current experiments, we measured our sensitivity to motion stimuli in a moving environment. Our observers were secured over a 6 degrees of freedom motion base, in which we can physically move the observer in line with the visual stimuli. In the main experiment participants are asked to find a single dot that moved inconsistent with the flow field. We tested the performance when the observer is stationary, and moving, both as a function of time after onset of the movement of the stimulus. Surprisingly, the results show that observers are at least as good under moving conditions as under stationary conditions, if not better.

18:08
Where do we pay attention to when driving in underground tunnels?
SPEAKER: Hong Xu

ABSTRACT. Driving in the underground tunnels requires a heavier cognitive load than driving in an open road environment. Previous Studies have found that drivers under greater cognitive load tend to have tunnel vision while driving. It thus suggests that their visual attention is primarily focus on the road ahead, oblivious to the peripherals, mirrors and instruments. To test this possibility in real situations for drivers and underground tunnels, we recorded 21 drivers’ behavioral patterns and tracked their eye movements while they were driving in an underground tunnel express way (Kallang Paya Lebar Expressway, KPE) in Singapore. We found that most of the drivers focused their eye positions at the vehicles in front of them (same lane or the lane next to the driver’s) and rarely looked at the mirrors when they were instructed to drive straight ahead. In comparison, eye movements are more broadly distributed among the environmental features when driving in an open road environment. When instructed to change to the left or right lane, they looked at the mirrors and dashboard before and after the action. More often, they tend to sweep the tunnel ceiling near the ceiling lights and the boundaries of the lane in front of them. At curves, they tend to look at the wall of the tunnel (at the inner side of the curve) with reflective materials and thus brighter than the other areas. Since we simply instructed the subjects to drive ahead, this freely eye movement pattern may reflect the key elements for saliency of the scene in underground tunnel environment – motion and illumination.

18:25
Playing Physical Sport Improves Visual Functions
SPEAKER: Rui Ni
18:42
Action video game play improves visuomotor control
SPEAKER: Li Li

ABSTRACT. Can action video game play improve visuomotor control? If yes, can it be used in training daily visuomotor control tasks such as driving? In this talk, I will present the experiments that addressed these questions by testing non-video-game players with a commonly used manual control task. After playing a driving or first-person shooter video game, participants improved significantly in their performance on the manual control task. The improvement was highly correlated with the improvement in lane keeping. In contrast, no improvement in the control performance was observed for participants who played a non-action video game. Our model-driven analysis revealed that action gaming in general improved the responsiveness of the sensory-motor system to input visual signals for motor control. The present study provides the first empirical evidence for a causal link between action gaming (for as short as 5 hrs) and enhancement in visuomotor control and suggests that action video games can be beneficial training tools for driving.