Ebook: Image Processing, Electronics and Computers
Computers and electronics have become so ubiquitous in daily life that we rarely give them a second thought these days, and even image processing is something else that we now largely take for granted.
This book presents the proceedings of IPEC2024, the 5th Asia-Pacific Conference on Image Processing, Electronics and Computers, held from 12 to 14 April 2024 in Dalian, China. The aim of the IPEC conferences is to facilitate exchange and collaboration between academic researchers and industry, driving interdisciplinary integration and innovative development in image processing, electronics, and computer engineering. A total of 207 full paper submissions were received for the conference, of which 61 papers were ultimately selected for presentation at the conference and publication here after a rigorous review process which took into account significance, novelty, and technical quality. These are divided into 3 sections: Image Processing; Electronics and Automation; and Computer Application. Topics covered transcend the boundaries of traditional disciplines and reflect the interdisciplinary nature of the conference. They include graphics, computer vision, signal and information processing, distributed systems, software information engineering, computer science, electrical engineering, and automation and control engineering.
The book provides a current overview of the latest developments, and will serve as a valuable resource for researchers, practitioners, and enthusiasts alike. It will be of interest to all those working in the fields of image processing, electronics, and computers.
Welcome to the Proceedings of the 2024 5th Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC2024). As one of the premier academic events in the Asia-Pacific region, the aim of IPEC2024 is to facilitate exchange and collaboration between academia and industry, driving interdisciplinary integration and innovative development in image processing, electronics, and computer engineering. The conference has established a highly diverse organizing committee, technical committee, and a list of speakers featuring 39 experts and scholars from 15 countries, including the United States, the United Kingdom, Canada, Italy, China, Spain, Poland, India, and Malaysia.
The proceedings encompasses a wide range of topics, spanning the boundaries of traditional disciplines and embracing interdisciplinary approaches. These include, but are not limited to, graphics, computer vision, signal and information processing, distributed systems, software information engineering, computer science, electrical engineering, and automation and control engineering. The diversity and depth of the papers presented here reflect the multidisciplinary nature of the conference, and highlight the interconnectedness of image processing, electronics, and computers.
The conference received a total of 207 full paper submissions. All papers were evaluated on the basis of their significance, novelty, and technical quality, and after careful review, the program committee selected 61 papers for inclusion in the conference. We would like to extend our heartfelt gratitude to those authors who contributed their valuable research to these proceedings. Their dedication, expertise, and passion have enriched scholarly discourse within our community. We would also like to express our sincere appreciation to the reviewers for their meticulous evaluation and insightful feedback, which ensured the high standard and scientific rigor of the papers included in this volume.
We hope that the conference proceedings will serve as a valuable resource for researchers, practitioners, and enthusiasts alike, inspiring further advancements and breakthroughs in the field of image processing, electronics, and computers. May this collection of papers ignite new ideas, foster innovation, and contribute to the betterment of our society.
With warm regards,
Conference Chair
Prof. Ljiljana Trajković, Simon Fraser University, Canada
The evaluation of camouflage effect can be efficiently completed through computer digital image processing. The decorative pieces on the camouflage net can effectively ameliorate the camouflage effect of camouflage net. In an effort to evaluate the camouflage effect of decorative pieces on the camouflage net, our research group organized experiments for research. This study used decorative pieces with different edge shapes, and determined the type of camouflage net used in the background through color extraction. The study compared the performance differences of camouflage before and after adding decorative pieces to obtain the research results based on computer image processing. As the research results clearly reveal, decorative pieces on the camouflage net can effectively heighten the effectiveness of camouflage net camouflage net, providing a method for quantitatively evaluating the camouflage effect of decorative pieces on the camouflage net through computer digital image processing.
The technological development of computer vision provides a new path for improving camouflage effects. As an important means of anti-reconnaissance, camouflage plays an important role in competing for information superiority. In an effort to ameliorate the camouflage effect of camouflage net based on computer vision, a method of using decorative pieces on the camouflage net was proposed, and the color determination and fixation methods of decorative pieces were studied. Taking forest type camouflage net as an example, this article compared the similarity of edge and color distribution before and after adding decorative pieces for experiments; Through comprehensive analysis of cost-effectiveness and universality, this article determined the color types and specific color information of the decorative piece production; This study determined the usage of decorative pieces and camouflage net through the average time of repeated experiments. On the basis of the above experiments and analysis, the color variety, layout, shape and size of highly fused decorative pieces on the camouflage net were determined, which provides a brand-new method for improving the camouflage effect of camouflage net based on computer vision.
This study focuses on enhancing the security of image transmission in Networking Systems of Artificial Intelligence (NSAI) by implementing an advanced encryption algorithm (AEA) based on chaotic algorithms. The research begins by exploring the fundamental concepts, structures, and practical applications of NSAI. It then introduces the development of the AEA, specifically designed to safeguard image data during transmission. The methodology involves integrating a chaotic algorithm into the encryption and decryption processes, thereby improving the confidentiality and integrity of the images. Additionally, an encryption protocol consistent with NSAI specifications is developed to further enhance the security of transmitted images. Through simulations, the AEA demonstrates exceptional performance, with minimal errors, high accuracy, and outstanding efficiency in encryption tasks. Overall, this study concludes that the AEA represents a significant advancement in securing image transmission within NSAI, providing robust protection for future applications.
The existing image style transfer algorithms can only handle a limited number of styles. In some common application scenarios, large-scale image style transfer is required to generate differentiated images, and a limited number of styles are clearly not sufficient to cope with such scenarios. In response to the above issues, this chapter proposes a style transfer algorithm based on variational auto-encoder. This algorithm encodes the styles in the training set into a high-dimensional continuous space, randomly sampling a point from this space to obtain a style encoding. Based on content information, the encoded and decoded images can be stylized. Because the space where style encoding is located is continuous, the number of styles is infinite. This article conducted comparative experiments and style sampling experiments on the COCO and WikiArt datasets. The experimental results showed that the VST algorithm has the ability to sample styles, fix content, and quickly synthesize high-quality images for style sampling; In addition, when the synthesis effect is similar, the synthesis speed of VST is 50 times that of GST.
Disease image analysis is vital in various real-world applications, particularly in the field of smart agriculture, where it plays a crucial role in crop disease diagnosis. Given the uncertainties associated with environmental conditions, there is a pressing need to improve the visual quality of crop images. In this paper, we propose an algorithm based on an improved FPAGAN algorithm to address the problem of color distortion and fuzzy texture details of tea disease images caused by low-light environments. Firstly, we combine the feature pyramid module, the attention module and the multi-scale aggregation module to improve the generative network, which effectively reduces the noise effect while increasing the attention to the feature information of the disease region. In order to distinguish between the generated image and the real image, the PatchGAN in the discriminative network is improved and the local discriminative strategy is adopted. Then, the Nash equilibrium is achieved through the continuous interaction between the generator and the discriminator. To guide the network training process, a joint loss function is introduced to suppress the image noise and make the enhanced image color more natural and balanced. Finally, ablation experiments are conducted on the FPANet architecture to verify the advantages of the proposed FPANet. The results show that the FPAGAN algorithm can reach an average value of 19.065, 0.735, and 7.435 on the three evaluation indexes of PSNR, SSIM, and information entropy, and the processed disease images outperform other image enhancement techniques in terms of color preservation, contrast improvement, and enhancement of image details. This holds a certain significance in the prompt detection of tea diseases and minimizing growth expenses.
Super-resolution tasks have been receiving widespread attention, contemporary research in the field of single-image super-resolution has seen a notable enhancement in performance, largely attributed to the integration of self-attention mechanisms. However, the disadvantage of the self-attention mechanism is that it significantly increases the amount of computation, which is difficult to accept in lightweight image super-resolution tasks. In this paper, a lightweight image Super Resolution model, Large Kernel Super Resolution (LKSR), is proposed. LKSR simply uses the neural network layer based on large convolutional kernel as the backbone layer. Experimental results show that the proposed LKSR algorithm is effective and offers a promising solution for applications where computational resources are constrained. The LKSR model, with its large convolutional kernels, is designed to capture more contextual information within an image, which is essential for enhancing the resolution without compromising the computational efficiency. This approach allows the model to achieve high-quality image upscaling while maintaining a balance between performance and computational demands, making it suitable for a variety of applications that require both sharp image details and efficient processing.
In order to solve the problem that the spatial structure and temporal dynamic structure of skeletal data are not clearly and fully utilized when using hand bone data for action recognition, a spatio-temporal synchronous graph convolution network with combined attention mechanism is designed. Using the method of 2D estimation and triangulation, the feature is projected into a single 3D volume, the 3D heat map is output, and the 3D joint coordinates are obtained by soft-argmax operation on the heat map. The spatial dimension and the temporal dimension of the bone data are separated, the spatial dimension is encoded according to the order of the related nodes, and the same related nodes are encoded in the temporal dimension, and the spatial embedding matrix and the temporal embedding matrix are obtained. The matrix is synchronously added to the spatio-temporal network sequence. The experiment is tested on the SHREC2017 data set, and compared with some representative gesture recognition methods, the results show that the algorithm has achieved good results in hand action recognition.
The application of computer image processing technology is more and more extensive, such as geological survey, disaster inspection, urban planning medical imaging, industry, agriculture, military and other fields. In the method of image denoising, the restoration model based on variational principle has been the focus of research in the field of digital image. In order to take into account the recovery of image edge and texture information and a more efficient optimization algorithm, a fractal-order total variational denoising method based on shearlet regularization terms is proposed in this paper. This model combines the advantages of fractional variational model and shear wave transform, which can not only effectively recover the edge and other details of the image, but also enhance the mid-frequency information of the image by using more adjacent pixel information, and retain the low-frequency component of the image nonlinearly. Thus, anisotropic features such as edges and curves of the image can be effectively preserved. In numerical calculation, by applying alternating direction method of multiplier (ADMM) to solve this model, we find that all ADMM subproblems have closed-form solutions. Finally, extensive experiments involving both denoising and texture extraction applications are employed to illustrate the validity of this model.
This paper presents a computer-assisted rapid trajectory extraction technique for soccer robots’ global vision system, focusing on Low-Rank Trajectory Recovery (LRTR). This system is crucial for providing robot posture data, enhancing decision-making, robot trajectories, and maneuver strategies. Given the competition rules demanding quick onsite vision system setup and adjustment, this method offers a speedy and accurate extracction process. It aims to refine trajectory tracking precision in high-level soccer matches, especially by processing varied noisy images in real scenarios. The study delves into theoretical and practical aspects, highlighting its effectiveness in minimizing extraction errors amid the unpredictability of robot movement in competitions. Utilizing robust subspace learning within LRTR significantly enhances tracking accuracy, object recognition, and scene understanding. The innovative trajectory feature extraction method evaluated here shows considerable promise in efficacy and adaptability. The findings advance computer vision system development and improve trajectory interpretation for diverse applications, including sports tracking and autonomous systems. When compared to other algorithms, this method stands out for its extraction precision and efficiency in robot operation, achieving an impressive extraction accuracy of ±2.67 mm without specialized targets, showcasing superior performance.
This study introduces an innovative trajectory prediction algorithm for table tennis, employing an optimized Unscented Kalman Filter (UKF) combined with a Simple Physical Motion (SPM) model. The conventional UKF algorithm, while effective in real-time predictions, often encounters significant deviations in short-term forecasts, especially when dealing with abrupt changes in a table tennis ball’s motion. To address this, our approach integrates UKF with SPM, effectively predicting the ball’s trajectory pre- and post-collision. The method begins by using UKF to predict the ball’s trajectory and landing point before collision, taking into account factors such as air resistance, gravity, and the Magnus force caused by the ball’s rotation. After collision, the trajectory is forecasted using a simplified collision rebound model and a kinematic model. This dual-phase approach significantly reduces trajectory prediction errors post-collision. This algorithm’s practical application is demonstrated in a constructed table tennis robot system, highlighting its superior real-time performance and accuracy, particularly in post-collision trajectory prediction. This makes it a valuable tool for advanced table tennis training and robotic interaction systems. This study contributes to the field of machine vision and robotic interaction by presenting a more efficient and accurate method for trajectory prediction, particularly in dynamic environments like table tennis. The algorithm’s lower hardware requirements, combined with its robustness and simplicity, underscore its potential in broader applications where accurate real-time trajectory prediction is crucial. This development not only advances the field of sports robotics but also has implications for various industrial and research applications where precise object tracking and prediction are essential.
In view of the behavior recognition technology of mobile vision devices in security scenarios, this paper first describes the research and application progress of behavior recognition technology in security scenarios, and expounds the difficulties in its actual detection tasks, such as camera movement, behavior occlusion, illumination variation, background interference, multi-view variation, and interclass similarity. Then, according to the different network structures, the architecture composition and recognition characteristics of the model based on the two-stream convolutional architecture, the 3D convolutional architecture and the behavior recognition method based on the self-attention mechanism are detailed analyzed and elaborated. Then, the deployment and application of relevant behavior recognition algorithms are carried out on high-performance GPU and embedded microcomputer platforms. The accuracy, detection rate, parameter quantity and actual detection effect are compared and analyzed. Finally, based on the theoretical analysis and comparative experimental results, the limitations of the current behavior recognition algorithm are summarized, and the development direction of algorithm lightweight and adaptability improvement is pointed out.
Low altitude flying objects have been widely used in many fields, bringing a lot of convenience to society, and have important strategic significance in both military and civilian fields. However, low altitude flying objects often pose a significant threat to the safety of low altitude areas in low altitude environments. In recent years, the number of accidents caused by drones has been increasing. In recent years, low altitude target detection technology has received increasing attention from researchers and has made significant progress in various research methods. It is urgent to conduct a systematic review of effective detection technologies for low altitude targets. This article provides a comprehensive overview and analysis of the latest progress in low altitude target detection. Initially, it outlined the contemporary research landscape, including various detection technologies such as radar, radio, and vision. Subsequently, a thorough review and analysis of these methods were conducted, and the main challenges in this field were explored. Finally, the discussion integrates key insights and insights drawn from the discussion.
Edge detection can greatly reduce the amount of data of the original image, eliminate many meaningless information, and retain the important structural attributes of the image. It is an important basis in the field of image analysis, such as image segmentation, region shape extraction, target detection and so on. It is also an important attribute of feature extraction in image recognition. Image edge detection based on Riesz transform is a new method of image edge detection, which has the characteristics of multi-resolution and multi-scale. In order to solve the problem of poor detection performance of traditional medical images under illumination changes, this paper proposes a medical ultrasound image information extraction model based on Riesz transform. In this paper, Riesz transform is used to replace the traditional Hilbert transform to process the image. The model uses different scale factors of Riesz transform to construct a transform space specially used to calculate phase consistency, and successfully obtains the feature image. The non-maximum suppression technique is used to detect the edge information of the image. The experimental results show that the model can effectively and quickly identify and retain the edge features in the image while suppressing the non-edge region response under uneven illumination.
Detecting targets located in non-line-of-sight (NLOS) areas within urban environments has attracted widespread attention in recent years. However, imaging localization algorithms based on multipath superposition often generate false target points, leading to interference in target location acquisition. This article proposes a new multi-target localization method specifically designed for L-shaped building scenarios without producing target ghosting, by studying the correlation between times of arrival (TOAs) and multipath types. Specifically, the propagation mode of the signal is first analyzed to construct a multipath echo model. The principle of specular reflection is utilized to introduce virtual radars, and virtual radar coordinates corresponding to each order of reflection are obtained. Then, the numerical relationship between different times of arrival and corresponding multipath types in the scene is studied. This, combined with an improved ellipse-cross-localization method, is used to determine the target position. Finally, numerical simulation results demonstrate that the proposed multi-target localization method effectively establishes the correlation between times of arrival and multipath types, enabling accurate localization of multiple targets in NLOS areas.
The widespread adoption of portable photography devices, such as smartphones, has led to an increased occurrence of moire patterns on camera screens, which significantly degrades the quality of visual perception. While most research has focused on the removal of moire patterns in images, video moire removal has not been well investigated. This paper focus on the removal of moire patterns in videos and propose a two-stage video moire removal network. The first stage aligns the information between adjacent frames through a multi-scale frame alignment network, accomplishing the temporal aggregation of information. The second stage is the moire removal net-work, which restores the contaminated areas from both texture and color perspectives. A novel dataset construction method is introduced, allowing for precise alignment between moire-affected images and reference images by capturing pure white background images to extract moire patterns. Extensive experiments on the TCL-V1 dataset and the new dataset demonstrate the effectiveness of the proposed scheme compared with state-of-the-art results in video moire removal.
This study addresses the challenges of facial detail loss and discriminator bias in image style transfer tasks. Building upon the DualStyleGAN network, the proposed the MixStyleGAN network, which improves upon the existing method by introducing a de-bias discriminator. The de-bias discriminator aims to remove style bias by generating mixed features of the original and reference images in the discriminator’s feature space, ensuring consistency between the prediction of mixed features and the original image. The AdaIN network is employed for feature map fusion across different resolution layers to achieve de-biasing effects. Additionally, localized discriminators are introduced to preserve facial details. Separate discriminators are employed for specific facial attributes such as eyes and mouth, enhancing the representation of local details through adversarial training. Experimental results on the FFHQ dataset demonstrate that MixStyleGAN achieves a 29.03% improvement in Arcface metric and a 12% improvement in LPIPS metric.
In today’s interconnected world, ensuring the safety of airports assumes paramount importance, given their pivotal role in facilitating global transportation. The rapid evolution of face recognition technology emerges as a pivotal tool in fortifying airport security measures. This study meticulously examines three prominent convolutional neural network (CNN) models: VGGNET, GoogLeNet, and ResNet, evaluating their effectiveness in recognizing faces across diverse and uncontrolled scenarios, leveraging the extensive LFW dataset. The findings unequivocally demonstrate ResNet’s superiority in unconstrained face recognition scenarios compared to its counterparts. Following rigorous training on the LFW dataset, the ResNet-50 model achieves a remarkable accuracy rate of 71.9%. Consequently, the study infers that the ResNet-50 model exhibits exceptional suitability for deployment within airport environments, offering heightened security protocols. This research underscores the pivotal role of CNN-based face recognition technology in enhancing the robustness of airport security measures, thereby contributing significantly to the safety and efficiency of global transportation hubs.
In the field of plant taxonomy, species often exhibit high levels of similarity in morphological features, color expression, and surface textures, while also containing rich and complex detail information. These characteristics pose significant challenges in the identification and classification process. Traditional machine learning cannot extract features comprehensively and accurately. This study leverages the Swin Transformer combined with image enhancement algorithms for plant image classification. On one hand, it benefits from enlarging the inter-class distance to improve classification accuracy; on the other hand, it addresses the issue of high computational complexity in large-scale plant image processing. By integrating the Swin Transformer with advanced image enhancement techniques in the task of plant image classification, a significant performance improvement has been achieved. Compared to using the Swin Transformer method alone, this integrated strategy has shown superior results, achieving an accuracy of 89.03% in plant classification tasks. This paper focuses on the plant image classification process based on the Swin Transformer with image enhancement algorithms.
In order to solve the problem of blurring of images affected by foggy days, an improved de-fogging algorithm founded upon AOD-Net is designed to address the challenges with color difference and unclear texture of details in images after de-fogging by AOD-Net image de-fogging algorithm. Firstly, the CBAM module is introduced to improve the network’s ability to extract both local and global characteristics of the input imagery through its attention mechanism, while decreasing the computational load and improving the robustness of the model; then the Sparse feature reactivation (SFR) module is added to boost the effectiveness of feature reuse and enhance the quality of the model for defogging; and finally, the hybrid loss function of MS-SSIM loss function and L1 loss function is used to improve the contrast and brightness of the defogged image. contrast, brightness and color saturation of the defogged image. The findings indicate that the defogged images from the improved algorithm exhibit superior performance metrics of peak signal-to-noise ratio and structural similarity index, which are crucial for improving image quality, enhancing the performance of the visual system, and broadening its application in various fields, with the potential impact of significantly improving the productivity of the society and the quality of life.
With the rapid development of artificial intelligence technology, research on autonomous driving technology is becoming increasingly popular. In the autonomous driving panoramic imaging system, it is necessary to use image stitching technology to perform real-time stitching of image information collected by multiple cameras around the vehicle body. Therefore, requirements for image data processing include large data volume, high speed, and low power consumption. The parallel computing characteristics of Field Programmable Gate Array (FPGA) can accelerate key image stitching algorithms. This article focuses on hardware acceleration of the RANSAC (Random Sample Consensus) algorithm in image stitching. The algorithm is designed in parallel using high-level synthesis (HLS) technology and optimized accordingly to find a suitable method for algorithm optimization. Comparing the FPGA algorithm circuit with the Open CV algorithm program, the hardware running time of the algorithm is only 2.816ms, and the processing speed is 38 times faster than the latter. The RANSAC algorithm implemented by FPGA meets the requirements of real-time image processing and can be applied in real-time image stitching systems.
Object detection is an important branch of panoramic image scene understanding. Panoramic images possess characteristics such as a wide field of view, significant distortion, and rich content, which leads to constant changes in the convolutional domain of the panoramic image, thus resulting in the fact that using a convolutional kernel of the same shape is not sufficient to perform convolutional feature extraction on the panoramic image. Therefore, traditional perspective-based object detection algorithms frequently exhibit poor performance on panoramic challenges. An enhanced YOLOX-based panoramic image object detection method is suggested as a solution to this problem. To improve the feature extraction capabilities for distorted objects in panoramic photos, an effective feature extraction network is built by integrating deformable convolution v2 and atrous spatial pyramid pooling (ASPP) into the backbone feature extraction network. Furthermore, the accuracy of panoramic image detection is greatly increased by further strengthening the extraction of image channel features by integrating an enhanced attention mechanism between the feature extraction network and the backbone network. Experimental results demonstrate that the proposed panoramic object detection model achieves an average precision (mAP) of 73.35% on a self-built panoramic image dataset, compared with the existing traditional target detection model, achieving significant performance improvement.
3D reconstruction and visualization of plants are of great research value for their physiological characteristics. Large-scale datasets are required to train 3D reconstruction algorithms. Most of the current 3D plant datasets only focus on the parts above the ground surface. Roots, as the half below ground, are also critical for plant growth. There are few efforts to generate 3D root datasets. Existing 3D plant root datasets only contain a very limited number of reconstructed 3D models and the lack of ground truth, mainly because of the complexity of root structures and the inconvenience of obtaining root images and ground truth 3D structures. This paper creates a large virtual 3D plant root benchmark dataset, which not only contains extensive 3D root models; but also has 2D images captured from various angles, involving 1.2K 3D root structures and over 80K 2D root images. This large plant dataset enables testing of multiple 3D reconstruction algorithms to verify the reliability of the algorithms. Finally, multiple commonly used 3D reconstruction algorithms and the most recent state-of-the-art 3D modeling methods are tested on this dataset, which demonstrates the superb research values of this dataset to advance the 3D root modeling research.
In response to the growing importance of detecting maliciously altered images to mitigate their harmful effects, we propose a deep learning-based image tampering detection method that incorporates multiscale fusion and anomaly assessment. This approach addresses the limitations of existing methods that often struggle to detect diverse tampering types and exhibit suboptimal precision and localization performance. The proposed method employs a Channel-Spatial Attention module to enhance feature representations extracted from multiscale input images, thereby capitalizing on both spatial and channel-wise dependencies within the data. Furthermore, it uses a Z-score scoring mechanism and an LSTM-based mechanism to effectively capture and evaluate anomalous regions within the image. These components collectively contribute to a more robust identification of the manipulated content. For training supervision, we introduce a binary cross entropy loss, which jointly optimizes pixel-level classification and regression tasks, ensuring accurate tampering detection and localization. Experimental evaluations demonstrate that our method significantly outperforms prevailing tampering detection techniques, exhibiting an increase in AUC values ranging from 21% to 62% and achieving up to a 99.8% improvement in the best F1 score. Specifically, on benchmark datasets CASIA1.0, Coverage, and NIST16, our method attains F1 scores of 0.673, 0.714, and 0.981, respectively, underscoring its superior performance across diverse scenarios and tampering types.