View: session overviewtalk overview
Zoom Link: https://uni-sydney.zoom.us/j/88901281313?pwd=aDNEMk9YeFpLdit2RlFXbjFBQjVvQT09 Meeting ID: 889 0128 1313, Password: cgi2023
Prof. Hongbo Fu, Professor, City University of Hong Kong, School of Creative Media
Talk title: Towards More Accessible Tools for Content Creation
Abstract: Traditional game and filming industries heavily rely on professional artists to make 2D and 3D visual content. In contrast, future industries such as metaverse and 3D printing highly demand digital content from personal users. With modern software, ordinary users can easily produce text documents, create simple drawings, make simple 3D models consisting of primitives, take images/videos, and possibly edit them with pre-defined filters. However, creating photorealistic images from scratch, fine-grained image retouching (e.g., for body reshaping), detailed 3D models, vivid 3D animations, etc., often require extensive training with professional software and is time-consuming, even for skillful artists. Generative AI, e.g., based on ChatGPT and Midjourney, recently has taken a big step and allows the easy generation of unique and high-quality images from text prompts. However, various problems, such as controllability and generation beyond images, still need to be solved. Besides AI, the recent advance in Augmented/Virtual Reality (AR/VR) software and hardware brings unique challenges and opportunities for content creation. In this talk, I will introduce my attempts to lower the barrier of content creation, making such tools more accessible to novice users. I will mainly focus on sketch-based portrait generation and content creation with AR/VR.
Bio: Hongbo Fu is a Professor at the School of Creative Media, City University of Hong Kong. Before joining CityU, he had postdoctoral research training at the Imager Lab, University of British Columbia, Canada, and the Department of Computer Graphics, Max-Planck-Institut Informatik, Germany. He received a Ph.D. degree in computer science from the Hong Kong University of Science and Technology in 2007 and a BS degree in information sciences from Peking University, China, in 2002. His primary research interests fall in computer graphics, human-computer interaction, and computer vision. His research has led to over 100 scientific publications, including 60+ papers in the best graphics/vision journals (ACM TOG, IEEE TVCG, IEEE PAMI) and 20+ papers in the best vision/HCI conferences (CVPR, ICCV, ECCV, CHI, UIST). His recent works have received a Silver Medal from Special Edition 2022 Inventions Geneva Evaluation Days (IGED), the Best Demo awards at the Emerging Technologies program, SIGGRAPH Asia in 2013 and 2014, and the Best Paper awards from CAD/Graphics 2015 and UIST 2019.He was the Organization Co-Chair of Pacific Graphics 2012, the Program Chair/Co-chair of CAD/Graphics 2013 & 2015, SIGGRAPH Asia 2013 (Emerging Technologies) & 2014 (Workshops), Pacific Graphics 2018, Computational Visual Media 2019, and the Conference Chair of SIGGRAPH Asia 2016 and Expressive 2018. He was on the SIGGRAPH Asia Conference Advisory Group and is currently Vice-Chairman of the Asia Graphics Association. He has served as an Associate Editor of The Visual Computer, Computers & Graphics, and Computer Graphics Forum.
Zoom Link: https://uni-sydney.zoom.us/j/89602498532?pwd=VGJqbW5xTUJ1RkVNK0l1Ky9HYmkydz09 Meeting ID: 896 0249 8532, Password: cgi2023
09:00 | AMDNet: Adaptive Fall Detection based on Multi-scale Deformable Convolution Network PRESENTER: Keyi Zhang ABSTRACT. Human falls are a critical health issue, as recent studies by the World Health Organization (WHO) have shown that falls have become a major cause of injury and death worldwide. Therefore, human fall detection is becoming an increasingly important research topic. Deep learning models have potential in fall detection, but they face challenges such as limited utilization of global contextual information, insufficient feature extraction, and high computational requirements. These limitations result in problems such as low accuracy, poor generalization, and slow inference. To overcome these challenges, this study proposes an Adaptive Fall Detection Network (AMDNet) based on multi-scale deformable convolutions. The main idea of this method is as follows: 1) Introducing an improved multi-scale fusion module (CDCC block), enhances the network's ability to learn object details and semantic features, thereby reducing the likelihood of false negatives and false positives during the detection process, especially for small objects. 2) Using the Wise-IoU v3 with two layers of attention mechanisms and a dynamic non-monotonic FM mechanism as the boundary box loss function of the AMDNet, improves the model's robustness to low-quality samples and enhances the performance of the object detection. This work also proposes a diversified fall dataset that covers as many real-world fall scenarios as possible. Experimental results show that the proposed method outperforms the current state-of-the-art methods on a self-made dataset. |
09:12 | A HRNet-Transformer Network Combining Recurrent-tokens for Remote Sensing Image Change Detection PRESENTER: Lingjie Hu ABSTRACT. Deep learning is developing rapidly and has achieved significant results in the field of remote sensing image change detection. Manual inspection is time-consuming and labor-intensive compared to deep learning, which makes it an inevitable trend to replace manual labor. In this paper, we propose a Siamese network structure called HTRNet (A HRNet-Transformer Network Combining Recurrent-tokens) for detecting changes in typical elements in bitemporal remote sensing images. Our approach addresses the following challenges: Most current convolution-based change detection algorithms obtain high-level semantic features from images by increasing the network depth and expanding the perceptual field. However, this approach leads to the loss of spatial information in the image features. To overcome this, we utilize HRNet to better preserve spatial and channel information in the image features; Transformer-based approaches require converting basic features into token sets. However, tokens with a non-uniform distribution of bitemporal semantic information can cause the model to learn spurious correlations and deviate from true semantics. To address this, we propose a Recurrent-tokens module to enrich contextual information and reduce the model's inherent bias; The final output is a binary mask map that considers both the classification result of pixels and the classification position. We employ the Cosine-embedding loss to measure the similarity between the generated mask and the ground truth. Experimental results demonstrate that HTRNet outperforms SOAT methods in several metrics on LEVIR-CD & DSIFN-CD datasets. Additionally, the predicted results show smoother edges, and our model exhibits good robustness. |
09:24 | UPDN: Pedestrian Detection Network for Unmanned Aerial Vehicle Perspective PRESENTER: Yulin Wang ABSTRACT. Pedestrian detection for Unmanned Aerial Vehicle (UAV) perspective has significant potential in the fields of computer vision and intelligent systems. However, current methods have limitations in terms of accuracy and real-time detection of small targets, which severely affects their practical application. To address these challenges, we propose UPDN, a novel network designed to improve detection comprehensive performance while maintaining high speed as much as possible. To achieve this objective, UPDN incorporates two key modules: the Spatial Pyramid Convolution and Pooling Module (SPCPM) for enhancing small target features and the Efficient Attention Module (EAM) for improving network efficiency. The SPCPM effectively captures multi-scale features from pedestrian regions, enabling better detection of small targets. The EAM optimizes network operations by selectively focusing on informative regions, enhancing the overall efficiency of the detection process. Experimental results on the constructed dataset demonstrate that UPDN outperforms other classic detection methods. It achieves state-of-the-art results in terms of both Average Precision (AP) and F1 score, achieving a detection speed of 107.37 frames per second (FPS). In summary, UPDN provides an efficient and accurate solution for pedestrian detection for a UAV perspective. The improved performance of UPDN offers a more feasible and reliable approach for real-world pedestrian detection applications. |
09:36 | Learning Local Features of Motion Chain for Human Motion Prediction PRESENTER: Zhuoran Liu ABSTRACT. Extracting local features is a key technique in the field of human motion prediction. However, Due to incorrect partitioning of strongly correlated joint sets, existing methods ignore parts of strongly correlated joint pairs during local feature extraction, leading to prediction errors in end joints. In this paper, a Motion Chain Learning Framework is proposed to address the problem of prediction errors in end joints, such as hands and feet. The key idea is to mine and build strong correlations for joints belonging to the same motion chain. To be specific, all human joints are first divided into five parts according to the human motion chains. Then, the local interaction relationship between joints on each motion chain is learned by GCN. Finally, a novel Weights-Added Mean Per Joint Position Error loss function is proposed to assign different weights to each joint based on the importance in human biomechanics. Extensive evaluations demonstrate that our approach significantly outperforms state-of-the-art methods on the datasets such as H3.6M, CMU-Mocap, and 3DPW. Furthermore, the visual result confirms that our Motion Chain Learning Framework can reduce errors in end joints while working well for the other joints. |
09:48 | Hand Movement Recognition and Analysis Based on Deep Learning in Classical Hand Dance Videos PRESENTER: Qingtao Lu ABSTRACT. Hand movement recognition is one of hot research topics in the field of computer vision, which has received extensive research interests. However, current classical hand dance movement recognition has high computational complexity and low accuracy. To address these problems, we present a classical hand dance movement recognition and analysis method based on deep learning. Firstly, our method extracts the key frames from the input classical hand dance video by using an inter frame difference method. Secondly, we use a method based on stacked hourglass network to estimate the 2D hand poses of key frames. Thirdly, a network named HandLinearNet with spatial and channel attention mechanisms is constructed for 3D hand pose estimation. Finally, our method uses ConvLSTM for classical hand dance movement recognition, and outputs corresponding classical hand dance movements. The method can recognize 12 basic classical hand dance movements, where users can better analyze and study classical hand dance. |
10:00 | 4RATFNet: Four-dimensional residual-attention improved-transfer few-shot semantic segmentation network for landslide detection PRESENTER: Shiya Huang ABSTRACT. Landslides are hazardous and in many cases can cause enormous economic losses and human casualties. The suddenness of landslides makes it difficult to detect landslides quickly and effectively. Therefore, to address the problem of intelligent analysis of geological landslides, we propose a four-dimensional convolutional neural network based on residual-attention mechanism and transfer learning (4RATFNet) for few-shot semantic segmentation detection in the case of insufficient number of labeled landslide images. First, a residual-attention module is designed to fuse channel features and spatial features for residual fusion. Second, improved transfer learning is used to optimize the parameters of the pre-trained network. Third, the network downscales the four-dimensional convolutional kernel into a pair of two-dimensional convolutional kernels. Finally, the few-shot semantic segmentation network is used to extract support image features and complete the landslide detection for the same features in the query image. The experimental results show that the method can obtain better segmentation results and achieve higher mean intersection over union than traditional semantic segmentation methods when tested on Resnet50 backbone as well as Resnet101 backbone under the situation of insufficient sample size of labeled landslide images, indicating that our network has obvious advantages and wider applicability. |
10:12 | Reinforce Model Tracklet for Multi-Object Tracking PRESENTER: Jianhong Ouyang ABSTRACT. Recently, most multi-object tracking algorithms adopt the idea of tracking-by-detection. Related studies have shown that significant improvements with the development of detectors. However, missed detection and false detection are more serious in occlusion situations. Therefore, the tracker uses tracklet (short trajectories) to generate more perfect trajectories. There are many tracklet generation algorithms, but the fragmentation problem is still prevalent in crowded scenes. Fixed window tracklet generation strategies are not suitable for dynamic environments with occlusions. To solve this problem, we propose a reinforcement learning-based framework for tracklet generation, where we regard tracklet generation as a Markov decision process and then utilize reinforcement learning to dynamically predict the window size for generating tracklet. Additionally, we introduce a novel scheme that incorporates the temporal order of tracklet for association. Experiments of our method on the MOT17 dataset demonstrate its effectiveness, achieving competitive results compared to the most advanced methods. |
10:24 | Analysis of corporate community of interest relationships in combination with multiple network PRESENTER: Yipan Liu ABSTRACT. Visualizing the complex relationship among enterprises is ponderable to help enterprises and related institutions to find potential risks. Small and medium-size enterprises’ (SMEs) loans have higher risk and non-performing rates than other types of enterprises, which are prone to form the complex relationship. Nowadays, the analysis of enterprises’ relationships networks mainly focus on the guaranteed relationships among enterprises, but it lacks the holistic analysis of the enterprise community of interest. To address these issues, the concepts of the enterprise community of interest and the investment model withing enterprise community of interest are proposed; The centrality, density, mean path, and network diameter algorithms in graph theory are used to evaluate the network of enterprise community of interest; The problem of graph isomorphism are used to query the network of users interested enterprises’ relationships; The portrait of enterprise is used to evaluate the enterprise community of interest; In the end, we study the impact of debt relationship among enterprise community of interest. Based on these ideas, to verify the effectiveness of the method, an enterprise relationship network analysis system which included 6745 enterprise nodes and 7435 enterprise relationship data of Shanghai is developed to analyze the enterprises’ relationship. |
Zoom Link: https://uni-sydney.zoom.us/j/88901281313?pwd=aDNEMk9YeFpLdit2RlFXbjFBQjVvQT09 Meeting ID: 889 0128 1313, Password: cgi2023
09:00 | MS-GTR: Multi-Stream Graph Transformer for Skeleton-Based Action Recognition PRESENTER: Na Lv ABSTRACT. Skeleton-based action recognition has made great strides with the use of graph convolutional neural networks(GCN) to model correlations among body joints. However, GCN has limitations in establishing long-term dependencies and is constrained by the natural connections of human body joints. To overcome these issues, we propose a Graph relative TRansformer(GTR) that captures temporal features through learnable topology and invariant joint adjacency graphs. GTR provides a high-level representation of the spatial skeleton structure that harmoniously fits into the time series. Moreover, a Multi-Stream Graph Transformer (MS-GTR) is introduced to integrate various dynamic information for an end-to-end human action recognition task. The MS-GTR applies a double branch structure, where the GTR is implemented as the master branch to extract joint-level and bone-level features, and an auxiliary branch processes lightweight kinematic content. Finally, a cross-attention mechanism links the master branch and the auxiliary branch to complement the information in stages. Experimental results on the HDM05 and NTU RGB+D datasets demonstrate the potential of the proposed MS-GTR model for improving action recognition. |
09:12 | GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection PRESENTER: Feng Zhou ABSTRACT. The state-of-the-art group-free network (GFNet) has achieved superior performance for indoor scene 3D object detection. However, we find there is still room for improvement in the following three aspects. Firstly, seed point features extracted by multi-layer perception (MLP) in the backbone (PointNet++) neglect to consider the different importance of each level feature. Second, the single-scale transformer module in GFNet to handle hand-crafted grouping via Hough Voting cannot adequately model the relationship between points and objects. Finally, GFNet directly utilizes the decoders to predict detection results disregarding the different contributions of decoders at each stage. In this paper, we propose the group-free enhancement network (GFENet) to tackle the above issues. Specifically, our network mainly consists of three lifting modules: the weighted MLP (WMLP) module, the hierarchical-aware module, and the stage-aware module. The WMLP module adaptively combines features of different levels in the backbone before max-pooling for informative feature learning. The hierarchical-aware module formulates a hierarchical way to mitigate the negative impact of insufficient modeling of points and objects. The stage-aware module aggregates multi-stage predictions adaptively for better detection performance. Extensive experiments on ScanNet V2 and SUN RGB-D datasets demonstrate the effectiveness and advantages of our method against existing 3D object detection methods. |
09:24 | Research on Fabric Defect Detection Technology Based on RDN-LTE and Improved DINO PRESENTER: Zhongqin Chen ABSTRACT. In order to solve the problem of detecting various types of complex fabric defects such as different scale sizes, high fusion with the background and extreme aspect ratios generated in actual production environment, this paper proposes a defect detection method that combines super-resolution reconstruction technology with object detection technology. Firstly, the dataset is reconstructed using the super-resolution reconstruction technology RDN-LTE, which effectively solve the problem of high fusion between defects and background. Furthermore, the copy-paste technology is employed for data augmentation to enhance model robustness. Then the dataset is fed into the detection network DINO for training. To improve the receptive field of the model, Swim Transformer is used as the backbone network of the model instead of ResNet-50, and the scale features extracted by the model are increased from 4 to 5. The deformable attention mechanism is also introduced in the third and fourth stages of Swim Transformer to enhance the global relationship modeling. Finally, multi-scale training method is introduced to capture the defect features at different scales to further improve the model detection effect and training speed. The results of the three kinds of comparative experiments show that the method based on RDN-LTE and improved DINO has a better overall recognition rate for multiple kinds of fabric defects than other current methods. In order to solve the problem of detecting various types of complex fabric defects such as different scale sizes, high fusion with the background and extreme aspect ratios generated in actual production environment, this paper proposes a defect detection method that combines super-resolution reconstruction technology with object detection technology. Firstly, the dataset is reconstructed using the super-resolution reconstruction technology RDN-LTE, which effectively solve the problem of high fusion between defects and background. Furthermore, the copy-paste technology is employed for data augmentation to enhance model robustness. Then the dataset is fed into the detection network DINO for training. To improve the receptive field of the model, Swim Transformer is used as the backbone network of the model instead of ResNet-50, and the scale features extracted by the model are increased from 4 to 5. The deformable attention mechanism is also introduced in the third and fourth stages of Swim Transformer to enhance the global relationship modeling. Finally, multi-scale training method is introduced to capture the defect features at different scales to further improve the model detection effect and training speed. The results of the three kinds of comparative experiments show that the method based on RDN-LTE and improved DINO has a better overall recognition rate for multiple kinds of fabric defects than other current methods. |
09:36 | Anomaly Detection of Industrial Products Considering both Texture and Shape Information ABSTRACT. Anomaly detection of industrial products is an important issue of the modern industrial production in the case of shortage of abnormal samples. Although significant progress has been made in extracting rich information from the nominal data for anomaly detection, it is still challenging to solve the shape-bias problem caused by the local limitations of convolutional neural networks. To overcome this problem, in this work we design a novel framework for unsupervised anomaly detection and localization. Our method aims to learn global and compact distribution from image-level and feature-level processing of normal images. For image-level information, we present a self-supervised shape-biased module(SBM) aimed at fine-tuning the pre-trained model to recognize object shape information. As for feature-level information, our research proposes a pretrained feature attentive module (PFAM) to extract multi-level information from features. Moreover, given the limited and relatively small amount of texture-based class feature information in existing datasets, we prepare a multi-textured leather anomaly Detection(MTL AD) dataset with both the texture and shape information to shed a new light in this research field. Finally, by integrating our method with multiple state-of-the-art neural models for anomaly detection, we are able to achieve significant improvements in both the MVTec AD dataset and the MTL AD dataset. |
09:48 | An Interpretability Study of Unknown Unknowns for Clothing Image Classification PRESENTER: Huan Li ABSTRACT. “Unknown unknowns” are instances in which predicted models assign incorrect labels with high confidence, greatly reducing the generalization ability of models. In practical applications, unknown unknowns may lead to significant decision-making mistakes and reduce the ap-plication value of models. As unknown unknowns are ag-nostic to models, it is extremely difficult to figure out why models would make highly confident but incorrect predic-tions. In this paper, based on unknown unknowns identifi-cation, we investigate the interpretability of unknown un-knowns arising from convolutional neural network models in image classification tasks by interpretable methods. We employ visualization methods to interpret prediction re-sults on unknown unknowns, further understand predictive models and analyze the predictive basis of unknown un-knowns. We focus the application scenario of interpreta-bility of unknown unknowns on a clothing category recog-nition task (dress vs shorts) in e-commerce platforms, and observe some patterns of models making wrong classifica-tions that lead to unknown unknowns, which indicates that a CNN model that lacks of common sense can make mis-takes even for a large dataset. |
10:00 | Classification of toric surface patches PRESENTER: Lanyin Sun ABSTRACT. Triangular B\'ezier patch and tensor-product B\'ezier patch are widely used in Computer Aided Design (CAD). Toric surface patch is a multi-sided generalization of B\'ezier patch. In this paper, we study the classification of toric surface patches with the theory of equivalent polygons from Combinatorics, and get different types of toric surface patches. Moreover, an recursive algorithm is proposed to get the lists of arbitrary toric surface patches. Furthermore, several geometrical models are given to present shape possibilities of toric surface patches in Computer Aided Geometric Design (CAGD). |
10:12 | Integrate depth information to enhance the robustness of object level SLAM PRESENTER: Shinan Huang ABSTRACT. vSLAM (Visual Simultaneous Localization and Mapping) is a fundamental function in various robot applications. With the development of downstream applications, there is an increasing challenge for scene semantic understanding and stable operation in different scenarios. In this paper, we propose an object-level RGBD SLAM system that reconstructs objects using quadric surfaces and extracts planar information with lower measurement noise compared to point features. These extracted planes and original point features are tightly coupled as landmarks in the system to enhances the robustness of the system in different scenarios. Moreover, we utilize the edges of planes to inference unseen planes to obtain more structured constraints. The experiments conducted on publicly available datasets demonstrate the competitive performance of our framework when compared to state-of-the-art object-based algorithms. |
10:24 | PRESENTER: Hang Sun ABSTRACT. Given the problems of low-quality images and limited quantity of samples in the existing remote sensing image recognition, it is difficult to adequately extract the concealed distinguished features of images by adopting single attention mechanism. In this paper, a method is proposed to detect the region destruction of remote sensing images by integrating attention mechanism and capsule network module. The method begins by super-resolution processing of the raw destruction data using the BSRGAN super-resolution model, and data expansion of the processed images using various data augmentation operations. Then the multi-attention capsule encoder-decoder network MA-CapsNet (multi-attention capsule encoder-decoder network) model proposed in this paper is adopted for further processing. The low-level features are extracted with a cascading attention mechanism consisting of the Swin-transformer and the Convolutional Block Attention Module (CBAM). Finally, the CapsNet module captures the precise objective features and delivers the feature map to the classifier to detect the destruction region. In the experiment of region destruction detection in remote sensing images after the 2010 earthquake in Jacmel, Haiti, MA-CapsNet model achieved 99.64% accuracy on region destruction detection, which is better than the most advanced ResNet, GoogLeNet and Vision Transformer (VIT) models as well as the ablation experimental network model. The method improves the characterization capability of the model and solves the problem of poor detection accuracy of remote sensing image destruction regions in complicated backgrounds, which is of theoretical instruction for rapid acquisition of remote sensing image destruction and destruction evaluation. |
Zoom Link: https://uni-sydney.zoom.us/j/88901281313?pwd=aDNEMk9YeFpLdit2RlFXbjFBQjVvQT09 Meeting ID: 889 0128 1313, Password: cgi2023
11:00 | Visual analytics of CO2 emissions from individuals' daily travel based on large-scale taxi trajectories PRESENTER: Dongliang Ma ABSTRACT. Understanding the patterns of traffic-related carbon dioxide (CO2) emissions from different trip purposes is of great significance for the development of low-carbon transportation. However, most existing research focuses on the calculation of road CO2 emissions while ignoring the traffic-related CO2 emissions from daily trip. Accurately inferring trip purposes is a prerequisite for analyzing the patterns of traffic-related CO2 emissions from daily trip. The existing research on inferring trip purposes based on machine learning, probability, and rules has been proven effective, but it ignores door-to-door service (DTD) and the time-varying characteristics of the attractiveness of Points of Interest (POIs). In this paper, we propose a Bayesian-based method to infer trip purposes. It identifies DTD through spatial relation operations and constructs the dynamic function of POIs attractiveness using kernel density estimation (KDE). A visual analysis system is also developed to help users explore the spatio-temporal patterns of traffic-related CO2 emissions from daily trip. Finally, the effectiveness of the method and the system is verified through case study based on real data and positive feedback from experts. |
11:12 | Visual analytics of air pollution transmission among urban agglomerations PRESENTER: Shijie Chen ABSTRACT. The field of air quality research exhibits dynamic, dependent, and complex characteristics. To effectively address air pollution problems, it is essential to scientifically reflect the internal structure of air quality distribution and reveal the dynamic evolution of air pollution. In this study, a novel visual analytics method is proposed to address these challenges. Initially, the spatio-temporal characteristics of air quality data are mined to complete urban agglomeration division based on dimensionality reduction and clustering. Subsequently, the air pollution transmission network (APTN) is constructed through particle transport and correlation analysis. A progressive exploration analysis method based on multidimensional space transformation is then employed to explore the process of air pollution transmission. Furthermore, a visual analytics system is developed to facilitate the interpretation of the results. Finally, we demonstrate the effectiveness of our proposed method through the use of a real data set, three case studies, and receive positive feedback from domain experts. |
11:24 | Visual Analysis of Machine Tool Operation Mode Correlation Based on Parameter Category Coding PRESENTER: Jinxin Long ABSTRACT. Aiming at the problem that the machine tool operation data has many dimensions, the parameters relationship is complex, and its abnormal patterns and hidden correlation information are difficult to fully excavate, this paper proposed a visual analysis method for the correlation of CNC machine tool operation mode. Firstly, the parameter category encoding is carried out from the two aspects of the sliding window and time point of the machine tool operation data, and then the multi-parameter category encoding combination is clustered and association rule mining is carried out to extract the machine tool operation mode and parameter state association mode, and establish visual map. Integrating ease of use, flexibility, and interpretability, the visual analysis system MachineVis is further constructed, and a variety of interactive methods are designed to support users in discovering abnormal patterns of machine tool data, analyzing the changes of various parameters of machine tool data and capturing the relationship between each parameter. Finally, the validity and practicability of the system are proved by case studies. |
11:36 | CAGviz: a visual analysis method to explore cyber asset graphs of cybercrime gangs PRESENTER: Yinuo Liu ABSTRACT. In recent decades, cybercrime has become more and more frequent in people’s lives, causing serious consequences. Cybercrime groups usually hold a range of large and complex cyber assets to support the operation of their industries. Analyzing information about the cyber assets and the relationships among them can help people better understand the mechanism of cybercrime gangs. In this paper, based on an open-source cybercrime gang asset dataset, we propose a visual analysis system: CAGviz. The system helps to achieve tasks such as mining subgraphs of gang assets and extracting important assets. In addition, it also allows users to interactively view and analyze cyber asset information through a visual interface. Finally, two case studies illustrate the effectiveness of the system. |
11:48 | KDEM: A Knowledge-Driven Exploration Model for Indoor Crowd Evacuation Simulation PRESENTER: Yuji Shen ABSTRACT. Knowledge plays an important role in indoor crowd evacuation. However, most evacuation simulation models assume that the agent is familiar with the simulation scene and does not apply the posterior knowledge to the simulation. This also makes it difficult for these models to judge the real situation of unfamiliar and complex scenes effectively. This study proposes a Knowledge-Driven Evacuation Model (KDEM), focusing on indoor crowd evacuation. In order to adapt the model to more complex scenarios, we refine the building structure knowledge and add bridge knowledge, and clarify the role of building object knowledge of safety signs and dangerous sources. The SEIR model is used to construct the function of knowledge dissemination and decay, and knowledge discovery is proposed as another way of knowledge acquisition. An exploration module is proposed, using Dijkstra and ORCA models, to help agents plan goals and paths and avoid obstacles in action. The experimental results show that the KDEM model conforms to the real situation and can provide practical guidance for public safety. |
12:00 | GVPM: Garment simulation from video based on a priori movements PRESENTER: Jiazhe Miao ABSTRACT. Garment simulation plays an essential role in the virtual try-on and film industry. Computer graphics has been studied more on this. Our proposed GVPM method is cheaper to run and easier to deploy than the physical simulation-based method for 3D garment animation. Previous methods generate videos with insufficiently smooth transitions between adjacent frames. Therefore, we propose alleviating this problem with an a priori motion generation model that uses similar poses to reference the expected motion. The motion sequences are then passed to a physics-based garment model we set up, which recovers the pose grid from monocular video frames and extracts semantic information about the human body and garment. Then accurately predict how the garments in the video will deform according to the human pose. Thus, in contrast to some methods that use 2D images as input, GVPM is unaffected by body proportions and postures, and the simulated garment results are diverse. Finally, combined with the temporal cue attention optimization module we set up, the movements, joints, and forms are combined to optimize the dynamic garment deformation. As a result, our new approach can simulate 3D garment animations with both virtual and real-world behaviors and extract unknown body motions from other motion capture datasets. |
12:12 | Deep reinforced navigation of agents in 2D platform video games PRESENTER: Emanuele Balloni ABSTRACT. The use of Artificial Intelligence in Computer Graphics can be applied to video games to a great extent, from human-computer interaction to character animation. The development of increasingly complex environments and, consequently, ever increasing state-spaces, brought the necessity of new AI approaches. This is why Deep Reinforcement Learning is becoming widespread also in this domain, by enabling training of agents capable of out-performing humans. This work aims to develop a methodology to train intelligent agents, allowing them to perform the task of interacting and navigating through complex 2D environments, achieving different goals. Two platform video games have been examined: one is a level-based platformer, which provides a "static" environment, while the other is an endless-type video game, in which elements change randomly every game, making the environment more "dynamic". Different experiments have been performed, with different configuration settings; in both cases, trained agents showed good performance results, proving the effectiveness of the proposed method. In particular, in both scenarios the stable cumulative reward achieved corresponds to the highest value of all the trainings performed, and the policy and value loss obtained show really low values. |
12:24 | A Novel Approach to Curved Layer Slicing and Path Planning for Multi-degree-of-freedom 3D Printing PRESENTER: Yuqin Zeng ABSTRACT. In this study, we propose a novel approach to overcome the limitations of traditional 3D printing, including restricted degrees of freedom, the stair effect, and the need for additional support for manufacturing overhanging features. Our method includes a curved layer slicing algorithm and a surface path planning algorithm. This work presents five key contributions: (1) it mitigates the step effect commonly seen in 3D printing; (2) it eliminates the need for support structures typically required by traditional 3D printing; (3) it is applicable to complex topological shapes, including 0-loss and 1-loss lattices; (4) it achieves B-spline interpolation through Equidistant arc-length sampling, which is more efficient than Gauss-Legendre and other existing methods; and (5) it has a collision-free path planning strategy based on hierarchical priority to prevent collisions between the printing nozzle and the model being printed. Through rigorous simulation and comparison with other state-of-the-art algorithms, we have validated the feasibility and effectiveness of our approach. |
Zoom Link: https://uni-sydney.zoom.us/j/89602498532?pwd=VGJqbW5xTUJ1RkVNK0l1Ky9HYmkydz09 Meeting ID: 896 0249 8532, Password: cgi2023
11:00 | Human Joint Localization Method for Virtual Reality Based on Multi-Device Data Fusion PRESENTER: Zihan Chang ABSTRACT. Virtual reality (VR) utilizes computer vision, artificial intelligence and other techniques to enable interaction between users and virtual environments. Currently, the human joint localization for VR mainly relies on a single VR device. However, the localization accuracy of a single VR device cannot often satisfy practical application standards, due to the limitations of sensor accuracy, self-occlusion of human body and environmental lighting conditions. In this paper, a human joint localization method for VR based on the multi-device data fusion is proposed in order to achieve higher localization accuracy. Firstly, both Kinect and HTC Vive devices are utilized to separately capture motion data of human joints and the two sets of data are aligned temporally and unified in coordinates. Then the weights are respectively assigned to the two sets of data based on the different location of the human body. Next, particle filtering is adopted to combine the two sets of data. Finally, a bidirectional long short-term memory (Bi-LSTM) neural network model is deployed, where the bone length loss is incorporated into the loss function to further improve the localization accuracy. Experiment results show that the localization accuracy of the proposed multi-device data fusion based localization method outperforms that of the single device method. |
11:12 | VRGestures: Controller and Hand Gesture Datasets for Virtual Reality PRESENTER: Georgios Papadopoulos ABSTRACT. Gesture Recognition is attracting increasingly more attention over the years and has been adopted in main applications in the real world and in the Virtual one. New generation Virtual Reality (VR) headsets like the Meta Quest 2 support hand tracking very efficiently and are challenging the research community for more breakthrough discoveries in Hand Gesture Recognition. What has also being quietly improved recently are the VR controllers, which have become wireless and also more practical to use. However, when it comes to VR gesture datasets and especially controller gesture datasets there are limited data available. Point-And-Click methods are widely accepted and this is the main reason gestures are being neglected, combined with the shortage of available datasets. To address this gap we provide two datasets one with controller gestures and one with hand gestures, capable of recording with either controller or hand and even with both hands simultaneously. We created two VR applications to record for controllers and hands the position and the orientation and also each timestamp that we record data. Then we trained off-the-shelf timeseries classifiers to test our data, export metrics and compare different subsets of our datasets between them. Hand gesture recognition is far more complicated than controller gesture recognition as we take as almost thrice input and the difference is being analyzed and discussed with findings and metrics. |
11:24 | Mobile AR-based Robot Motion Control from Sparse Finger Joints PRESENTER: Di Wu ABSTRACT. Robot control aims to design a low-cost, flexible system to combine human intelligence. However, it is challenging to balance natural interaction and lightweight systems. Advances in Augmented Reality (AR) technology allows users to interact through mid-air finger. In this paper, we explore how to easily use 3D gestures in AR devices for robot motion control. To this end, we present a mobile AR-based system for users to intuitively control a robot on various tasks using gestures represented by sparse finger joints. First, we predefine primitive robot motions and the corresponding user gestures. Then, we train a neural network that maps finger joint poses into robot motions. Further, we improve the performance of the mapping to obtain a high-accuracy system by floating calibration and anchor-joint alignment. Finally, we conduct an ablation experiment on our proposed data alignment method and a usability study on robot motion control in real environments. The results show that our system has the advantages of great mobility, achieving high accuracy and realizing lightweight interaction. |
11:36 | The Role of the Field Dependence-independence Construct on the Curvature Gain of Redirected Walking Technology in Virtual Reality PRESENTER: Rui Jing ABSTRACT. Redirected walking (RDW) enables users to physically remain in a relatively small area while moving around large-scale virtual environments (VEs) by purposefully introducing scene moves into the VEs. Using the curvature gains in perceptual manipulation could improve the effectiveness of redirection. Considering the influence of FDI cognitive style and gender, users may also have different thresholds for curvature increase detection. However, most previous studies have not considered such differences in curvature increase detection thresholds. Therefore, in order to investigate the impact of FDI structure on human sensitivity to curvature gain in VR in a more detailed way, we create a VR experimental system and use the psychophysical “limit method” to re-estimate the sensitivity to curvature gain. The results show that FDI have obvious relationship with the curvature gain detection threshold, and gender moderates the relationship between them. Male individuals with Field-dependent (FD) have a lower curvature gain perception threshold. It can provide suggestions for RDW personalized design. |
11:48 | A new camera calibration algorithm based on adaptive image selection PRESENTER: Jian Huang ABSTRACT. Camera calibration plays an important role in the 3D reconstruction task. However, in the calibration process, users need to select some key images from a large number of calibration board images for further performing the maximum likelihood estimation of camera model parameters. Due to the subjectivity of this estimation, it is difficult to guarantee the consistency of the results obtained by different testers. In this paper, a new camera calibration image selection algorithm is proposed to obtain high accuracy intrinsic parameters. Users only need to acquire a series of checkerboard image sequences, randomly select one image from the image sequence each time, and determine whether the image can be used for camera calibration by calculating the angle error of single frame checkerboard corner. This method can adaptively select a small number of images from the image sequence for camera calibration. The experimental results show that this self-built calibration algorithm is not only simple in the operation process, but also has higher accuracy and consistency in calibration results when compared with traditional calibration methods |
12:00 | Hybrid Prior-based Diminished Reality for Indoor Panoramic Images PRESENTER: Liu Jiashu ABSTRACT. Due to the advancement of hardware technology, e.g. headmounted display devices, augmented reality (AR) has been widely used. In AR, virtual objects added to the real environment may partially overlap with objects in the real world, leading to a degraded display. Thus, except for adding virtual objects to the real world, diminished reality (DR) is an urgent task that virtually removes, hides, and sees through real objects from panoramas. In this paper, we propose a pipeline for diminished reality in indoor panoramic images with rich prior information. Especially, to restore the structure information, a structure restoration module is developed to aggregate the layout boundary features of the masked panoramic image. Subsequently, we design a structured region texture extraction module to assist the real texture restoration after removing the target object. Ultimately, to explore the relations among structure and texture, we design a fast Fourier convolution fusion module to generate inpainting results respecting real-world structures and textures. Moreover, we also create a structured panoramic image diminished reality dataset (SD) for the diminished reality task. Extensive experiments illustrate that the proposed pipeline is capable of producing more realistic results, which is also consistent with the human eye’s perception of structural changes in indoor panoramic images. |
12:12 | Virtual Reality for the Preservation and Promotion of Historical Real Tennis PRESENTER: Sony Saint-Auret ABSTRACT. Real tennis or "courte paume" in its original naming in French, is a racket sport that has been played for centuries and is considered the ancestor of tennis. It was a very popular sport in Europe during the Renaissance period, practiced in every layer of the society. It is still practiced today in few courts in the world, especially in United Kingdom, France, Australia, and USA. It has been listed in the Inventory of Intangible Cultural Heritage in France since 2012. The goal of our project is to elicit interest in this historical sport and for the new and future generations to experience it. We developped a virtual environment that enables its users to experience real tennis game. This environment was then tested to assess its acceptability and usability in different context of use. We found that such use of virtual reality enabled our participants to discover the history and rules of this sport, in a didactic and pleasant manner. We hope that our VR application will encourage younger and future generations to play real tennis. |
Zoom Link: https://uni-sydney.zoom.us/j/88901281313?pwd=aDNEMk9YeFpLdit2RlFXbjFBQjVvQT09 Meeting ID: 889 0128 1313, Password: cgi2023
13:30 | SLf-UNet: Improved UNet for Brain MRI Segmentation by Combining Spatial and Low-Frequency Domain Features PRESENTER: Jiacheng Lu ABSTRACT. Deep learning-based methods have shown remarkable performance in brain tumor image segmentation. However, there is a lack of research on segmenting brain tumor lesions using frequency domain features of images. To address this gap, an improved network SLf-UNet has been proposed in this paper, which is a two-dimensional encoder-decoder architecture combining spatial and low-frequency domain features based on U-Net. The proposed model effectively learns information from spatial and frequency domains. Herein, we present a novel upsample approach by using zero padding in the high-frequency region and replacing the part of the convolution operation with a convolution block combining spatial frequency domain features. Our experimental results demonstrate that our method outperforms current mainstream approaches on BraTS 2019 and BraTS 2020 datasets. |
13:42 | Exploring the Transferability of a Foundation Model for Fundus Images: Application to Hypertensive Retinopathy PRESENTER: Julio Silva-Rodríguez ABSTRACT. Using Deep learning models pre-trained on Imagenet is the traditional solution for medical image classification to deal with data scarcity. Nevertheless, relevant literature supports that this strategy may offer limited gains due to the high dissimilarity between domains. Currently, the paradigm of adapting domain-specialized foundation models is proving to be a promising alternative. However, how to perform such knowledge transfer, and the benefits and limitations it presents, are under study. The CGI-HRDC challenge for Hypertensive Retinopathy diagnosis on fundus images introduces an appealing opportunity to evaluate the transferability of a recently released vision-language foundation model of the retina, FLAIR [35]. In this work, we explore the potential of using FLAIR features as starting point for fundus image classification, and we compare its performance with regard to Imagenet initialization on two popular transfer learning methods: Linear Probing (LP) and Fine-Tuning (FP). Our empirical observations suggest that, in any case, the use of the traditional strategy provides performance gains. In contrast, direct transferability from FLAIR model allows gains of 2.5%. When fine-tuning the whole network, the performance gap increases up to 4%. In this case, we show that avoiding feature deterioration via LP initialization of the classifier allows the best re-use of the rich pre-trained features. Although direct transferability using LP still offers limited performance, we believe that foundation models such as FLAIR will drive the evolution of deep-learning-based fundus image analysis. |
13:54 | Research on Deep Learning-Based Lightweight Object Grasping Algorithm for Robots PRESENTER: Yancheng Zhao ABSTRACT. In order to enhance the efficiency and accuracy of robots in automated production lines and address issues such as inaccurate positioning and limited real-time capabilities in robot-controlled grasping, a deep learning-based lightweight algorithm for robot object grasping is proposed. This algorithm optimizes the lightweight network GG-CNN2 as the base model. Firstly, the depth of the backbone network is increased, and transpose convolutions are replaced with dilated convolutions to enhance the network's feature extraction for grasping detection. Secondly, the ASPP module is introduced to obtain a wider receptive field and multi-scale feature information. Furthermore, the shallow feature maps are merged with the deep feature maps to incorporate more semantic and detailed information from the images. Experimental results demonstrate that the algorithm achieves an accuracy of 81.27% on the Cornell dataset. Compared to the original GG-CNN2 network, the accuracy has improved by 11.68% with only a slight speed loss, achieving a balance between speed and accuracy. Finally, grasping verification is conducted on the Panda robot arm, with an average success rate of 89.62%, which validates the superiority of the algorithm and showcases the theoretical and practical value of this research. |
14:06 | The ST-GRNN cooperative training model based on complex network for air quality prediction PRESENTER: Shijie Chen ABSTRACT. In recent years, the importance of air pollution has been increasingly discussed by both the public and government. When making environmental policies, air pollution forecasts become an essential reference point for governments. However, due to the sparse spatial distribution of atmospheric monitoring stations, accurately predicting regional air quality has become a challenge. To address this issue, a neural network cooperative training and prediction model called "ST-GRNN" is proposed in this paper. This model incorporates complex networks, Extreme Learning Machines (ELM), Long Short-Term Memory Networks (LSTM), and Generalized Regression Neural Networks (GRNN) to identify temporal and spatial characteristics and accurately predict regional air quality. Specifically, the model corrects the spatial and temporal dependencies of air quality by constructing a complex network of air quality through particle transport simulation. Comparative experiments using real datasets to predict PM2.5 concentrations demonstrate that the ST-GRNN model outperforms other methods in terms of accuracy. |
14:18 | Multi-sensory consistency experience: A 6DOF motion system based on video automatically generated motion simulation PRESENTER: Hongqiu Luan ABSTRACT. As we perceive our surroundings in the real world through the integration of multiple senses, the experience of perceiving consistency through multiple senses in a virtual environment may enhance our presence. In this paper, we present a multi-sensory perception consistent 6DOF motion system. The system automatically extracts the motion trajectory of the virtual camera as motion data from video and maps the motion data to the 6DOF Stewart motion platform through a human perception-based wash-out algorithm and incorporates multi-sensory simulations of visual, auditory, tactile, and proprioceptive sensory perceptual consistency of the motion effect. The results of the user study showed that the system effectively enhanced the participants’ sense of realism and reduced the subjective perception of simulator discomfort. In addition, the system well supported users to self-create motion virtual environment through video, so that the public became the designer of motion experience content in the metaverse. |
14:30 | 3DP Code-based Compression and AR Visualization for Cardiovascular Palpation Training PRESENTER: Zhendong Chen ABSTRACT. The palpation procedure in cardiovascular examination involves touching a patient's skin to assess the health of their heart and blood vessels. In medical education, it is essential to master the techniques and methods of palpation. Nevertheless, traditional palpation training methods, such as textbook descriptions and subjective evaluations, may fail to adequately differentiate between tactile image patterns under varying vascular conditions. Moreover, the large amount of data involved in creating dynamic 3D tactile images for palpation complicates their storage, transmission, and display. This paper presents an augmented reality (AR) visualisation based on the 3DP code (three-dimensional palpation code) using a typical artery palpation example. By providing students with interactive 3D visualisations and simulations, AR technology may improve the effectiveness of palpation training. A set of encoding and decoding tools was also developed, as well as high-efficiency compression techniques to deal with a large amount of tactile data. According to the assessment results, the proposed method offers a data compression ratio (CR) of 1/360 while preserving physiological information to a degree of over 95%. Additionally, performance testing has revealed that the proposed webAR program is highly adaptable and can run smoothly on most mobile platforms. We believe that this tool could benefit both the medical and academic communities in terms of creating and presenting content to facilitating a better teaching experience. |
14:42 | Determine the camera eigenmatrix from large parallax images PRESENTER: Zhenlong Du ABSTRACT. Recovering camera parameters from a group of image pairs is an important problem in computer vision. The traditional 4-point solution is vulnerable to insufficient or too concentrated number of corresponding point pairs when the viewpoint changes greatly, which leads to the failure of camera parameter recovery. Epipolar Geometry is based on the theory of recovering camera parameters from a set of parallax image pairs. This paper presents a method to quickly determine the corresponding point pairs from a group of image pairs with parallax by calculating the polar geometry. By using: for the pixel point \textit{p} in camera (viewpoint) \textit{A}, all the pixel points corresponding to \textit{p} in camera (viewpoint) \textit{B} are located at the same polar line. Similarly, the line passing through the center of camera (viewpoint) \textit{A} and \textit{p} on camera (viewpoint) \textit{B} is located at a epipolar line. Therefore, when camera \textit{A} and camera \textit{B} are synchronized, the instantaneous point of two objects projecting to the same pixel \textit{p} at time \textit{t_1} and \textit{t_2} of camera \textit{A} is located on a epipolar line of camera \textit{B}. The camera eigenmatrix is calculated by using epipolar line pairs instead of points, and the search space for polar line matching is greatly reduced by using pixels recording multiple depths to accelerate the calculation of camera eigenmatrix, so as to obtain camera parameters quickly and accurately. |
14:54 | Action Recognition via Fine-tuned CLIP Model and Temporal Transformer ABSTRACT. Adopting contrastive Image-text pre-trained models, \eg, CLIP towards video classification has shown success in learning visual-text joint representation from web-scale data, proving remarkable "zero-shot" generalization ability for various datasets. However, most research are based on the datasets like Kinetics and UCF-101. These datasets focus more on appearance rather than temporal order information. In other words, these datasets may not reward good temporal understanding in videos. How to balance the spatial and temporal information in the video is still a problem. In this paper, we deal with this problem by focusing the performance on the action-centered dataset Something Something V2 because it contains a large proportion of temporal classes. We adopt the pre-trained language-image models like CLIP to further study the zero-shot ability. To further capture a detailed spatiotemporal information, a Transformer based temporal decoder is applied. |
Zoom Link: https://uni-sydney.zoom.us/j/89602498532?pwd=VGJqbW5xTUJ1RkVNK0l1Ky9HYmkydz09 Meeting ID: 896 0249 8532, Password: cgi2023
13:30 | An efficient algorithm for degree reduction of MD-splines PRESENTER: Zushang Xiao ABSTRACT. This paper analyzes the computational time complexity of the previously proposed methods for constructing dual basis functions. It presents a method that employs discrete numerical summation for computing the integral of a polynomial, enabling rapid calculation of the inner product of polynomial basis functions. Building upon this approach, an efficient algorithm is devised to address the degree reduction problem in MD-spline curves. This algorithm efficiently computes the control points after degree reduction, ensuring an "exact" optimal least squares approximation. |
13:42 | Dynamic ball B-spline curves PRESENTER: Ciyang Zhou ABSTRACT. A ball B-spline curve (BBSC) is an extension of a B-spline curve, which is a skeleton-based 3D geometric representation. BBSCs can represent 3D tubular objects with varying radius, such as trunks, plants, and blood vessels. To enhance the ability of BBSC, in this paper, we propose a physics-based generalization of BBSC called dynamic BBSC (D-BBSC), which can describe the deformation behavior of a BBSC over time. We provide the mathematical expression of D-BBSC and prove its several mathematical properties. We derive the equations of motion of D-BBSC based on Lagrangian mechanics and investigate the equations of motion when it is under linear geometric constraints. Additionally, a D-BBSC physical simulation system based on the finite difference method (FDM) is implemented and several experimental results are demonstrated. |
13:54 | Schatten Capped $p$ Regularization for Robust Principle Component Analysis PRESENTER: Lan Yang ABSTRACT. An elegant framework is called Robust Principal Component Analysis (RPCA) achieving widely applications in image processing, video surveillance and face recognition. Generally, RPCA can be described as decomposing a matrix into low-rank and sparse parts, which can be approximated by matrix nuclear norm and $\ell_1$ norm, respectively. However, the nuclear norm ignores the difference among singular values of target matrix. To address this issue, nonconvex low-rank regularizers have been widely used. Unfortunately, existing methods suffer from different drawbacks, such as inefficiency and inaccuracy. In order to better depict the low-rank part, in this paper, we study a new model for RPCA problem via schatten capped $p$ regularization. Then, we use the Alternating Direction Multiplier Method (ADMM) framework to solving an algorithm. Experimentally, Our algorithm is compared to state-of-the-art methods in practical applications such as image denoising, video background and foreground separation, and face de-shadowing. Especially, in image denoising, our algorithm can separate the noise better than other algorithms in the case of low noise level. |
14:06 | Sparse Graph Hashing with Spectral Regression PRESENTER: Zhihao He ABSTRACT. Learning-based hashing has received increasing research attention due to its promising efficiency for large-scale similarity search. However, most existing manifold-based hashing methods cannot capture the intrinsic structure and discriminative information of image samples. In this paper, we pro-pose a new learning-based hashing method, namely, Sparse Graph Hashing with Spectral Regression (SGHSR), for approximate nearest neighbor search. We first propose a sparse graph model to learn the real-valued codes which can not only preserves the manifold structure of the data, but also adaptively selects sparse and discriminative features. Then, we use a spectral regression to convert the real-valued codes into high-quality binary codes such that the information loss between the original space and the Hamming space can be well minimized. Extensive experimental results on three widely used image databases demonstrate that our SGHSR method outperforms the state-of-the-art unsupervised manifold-based hashing methods. |
14:18 | A Crowd Behavior Analysis Method for Large-scale Performances PRESENTER: Qian Zhang ABSTRACT. This study combines visual and athletic information to analyze crowd performance, using performance density entropy and performance consistency as visual descriptors and group collectivity as an athletic descriptor. We used these descriptors to develop a crowd performance behavior classification algorithm that can distinguish between different behaviors in large-scale performances. The study found that the descriptors were weakly correlated, indicating that they capture different dimensions of performance. The crowd behavior classification experiments showed that the descriptors were valid for qualitative analysis and consistent with human perception. The proposed algorithm successfully differentiated and described performance behavior in the dataset of a large-scale crowd performance and was demonstrated to be effective. |
14:30 | PRESENTER: Liang Wang ABSTRACT. In engineering applications, line, circle, arc, and point are collectively referred to as primitives, and they play a crucial role in path planning, simulation analysis, and manufacturing. When designing CAD models, engineers typically start by sketching the model's orthographic view on paper or a whiteboard and then translate the design intent into a CAD program. Although this design method is powerful, it often involves challenging and repetitive tasks, requiring engineers to perform numerous similar operations in each design. To address this conversion process, we propose an efficient and accurate end-to-end method that avoids the inefficiency and error accumulation issues associated with using auto-regressive models to infer parametric primitives from hand-drawn sketch images. Since our model samples match the representation format of standard CAD software, they can be imported into CAD software for solving, editing, and applied to downstream design tasks. |
14:42 | A ReSTIR GI method based on sample space filtering PRESENTER: Jie Jiang ABSTRACT. In real-time ray tracing applications, limited by the computational power of the hardware, only a small number of samples per pixel can be traced. Maximizing the rendering quality with a low sampling rate is an important issue for real-time ray tracing. ReSTIR GI employed the screen-space spatiotemporal resampling in multi-bounce paths, improving the rendering quality of a low sampling rate. However, the noise introduced by Monte Carlo sampling still exists. We propose a lightweight and efficient sample space filtering method applied to ReSTIR GI that filters the sample distribution before resampling, thus reducing the noise in the final rendering result. Compared to the original ReSTIR GI, our method achieves a smaller MSE loss ranging from 1.1x to 5.6x at the cost of an average increase in rendering time of 12%. |
14:54 | Group Perception Based Self-adaptive Fusion Tracking PRESENTER: Yiyang Xing ABSTRACT. Multi-object tracking(MOT) is an important and representative task in the field of computer vision, while tracking-by-detection is the most mainstream paradigm for MOT, so that target detection quality, feature representation ability, and association algorithm greatly affect tracking performance. On the one hand, multiple pedestrians moving together in the same group maintain similar motion pattern, so that they can indicate each other's moving state. We extract groups from detections and maintain the group relationship of trajectories in tracking. We propose a state transition mechanism to smooth detection bias, recover missing detection and confront false detection. We also build a two-level group-detection association algorithm, which improves the accuracy of association. On the other hand, different areas of the tracking scene have diverse and varying impact on the detections’ appearance feature, which weakens the appearance feature’s representation ability. We propose a self-adaptive feature fusion strategy based on the tracking scene and the group structure, which can help us to get fusion feature with stronger representative ability to use in the trajectory-detection association to improve tracking performance. To summary, in this paper, we propose a novel Group Perception based Self-adaptive Fusion Tracking(GST) framework, including Group concept and Group Exploration Net, Group Perception based State Transition Mechanism, and Self-adaptive Feature Fusion Strategy. Experiments on the MOT17 dataset demonstrate the effectiveness of our method. The method achieves competitive results compared to the state-of-the-art methods. |
15:06 | Cross-Modal Information Aggregation and Distribution Method for Crowd Counting PRESENTER: Yin Chen ABSTRACT. Crowd counting is a fundamental and challenging task in computer vision. However, existing methods are relatively limited in dealing with scale and illumination changes simultaneously. To improve the accuracy of crowd counting and address the challenges of illumination and scale changes, we adopt the concept of crowding degree information. Due to the fact that a count map can accurately obtain the population in an image and solve the occlusion problem, we use the count map as a specific form of crowding degree information and propose a new cross-modal information aggregation and distribution model for crowd counting. We first input the crowding degree information into LibraNet and modify it with Information Aggregation Transfer (IAT) and Information Distribution Transfer (IDT) modules to obtain a count map. Then, light information, thermal information and crowding degree information are respectively input into the network through RGB image, thermal image, and count map. A more accurate density map can be obtained through multiple convolution operations and IADM processing to improve counting accuracy. Finally, the density map is integrated to obtain the number of people. Experiments demonstrate that our methods provide superior quality and higher parallelism. Therefore, we can obtain higher-accuracy density maps by using light information, thermal information, and crowding degree information. |
15:18 | PRESENTER: Xinrui Ju ABSTRACT. The advancement of deep learning has significantly increased the efficiency of picture dehazing techniques. Convolutional neural networks can't, however, be implemented on portable FPGA devices because to their high computing, storage, and energy needs. In this paper, we propose a generic solution for image dehazing from CNN models to mobile FPGAs. The proposed solution designs lightweight network using depth-wise separable convolution and channel attention mechanism, and uses an accelerator to increase the system's processing efficiency. We implemented the entire system on a custom and low-cost FPGA SOC platform (Xilinx Inc. ZYNQ$^{TM}$ XC7Z035). Experiments can conclude that our approach has compatible performance to GPU-based methods with much lower resource usage. |