Ebook: Artificial Intelligence and Human-Computer Interaction
There is no denying the increasing importance of AI and human-computer interaction for societies worldwide. The potential for good in these fields is undeniable, but the challenges which arise during research and in practice must be carefully managed if this potential for good is to be realized without harm.
This book presents the proceedings of ArtInHCI2023, the 1st International Conference on Artificial Intelligence and Human-Computer Interaction, held as an online event from 27-28 October 2023, and attended by around 70 participants from around the world. The aim of the conference was to promote academic exchange within and across disciplines, addressing theoretical and practical challenges and advancing current understanding and application. A total of 72 submissions were received for the conference, of which 41 were selected for presentation and publication following a thorough peer review process, resulting in an acceptance rate of 57%. Topics covered included deep learning, artificial neural networks, computer vision and pattern recognition and papers were focused on the challenges of research as well as application.
Providing a fascinating overview of developments and innovation in the field, the book will be of interest to all those working with AI or human-computer interaction.
This volume in the series Frontiers in Artificial Intelligence and Applications (FAIA) presents the proceedings of the 1st International Conference on Artificial Intelligence and Human-Computer Interaction (ArtInHCI2023). The conference was held online from 27–29 October 2023, and was attended by around 70 participants from home and abroad.
ArtInHCI2023 organized discussions on hot topics, including deep learning, artificial neural networks, computer vision and pattern recognition, among others, and focused on the challenges of research as well as application. The conference consisted of one morning session and one afternoon session, showcasing various items such as keynote speeches, oral reports, poster presentations, Q&A, etc. We were fortunate to have with us experts and scholars from around the globe to share their latest findings and insights, including Professor Teh Sin Yin from the Universiti of Sains, Malaysia, who also acted as the conference host; Professor Ganeshsree Selvachandran from Monash University Malaysia; Professor Huiyu Zhou from the University of Leicester, UK; Professor Matthias Rauterberg from Eindhoven University of Technology, the Netherlands; Professor Mohamed Quafafou from Aix-Marseille University, France; Professor Chris W. J. Zhang from the University of Saskatchewan, Canada; and Professor Jiangtao Wang from Coventry University, UK.
The ArtInHCI conference was conceived with the aim of promoting academic exchange within and across disciplines, addressing theoretical and practical challenges and advancing current understanding and application, during which process we hope that amity will also be spread, connections established and future collaborations enabled.
The ArtInHCI organizing committee extend their sincerest gratitude to all who have supported the conference in their various ways; to the authors who have chosen this platform to publish their works and communicate with peers, the participants who took an interest and attended the conference, the chairs and committee members who have been indispensable in lending their professional expertise and judgment, the keynote speakers who so generously shared their vision and passion, and to the reviewers who upheld the faith in scholarship and contributed their experience and honest opinions. It has been a pleasure and honor to work alongside them, and we look forward to further cooperation with them at future ArtInHCI conferences.
This research focuses on the role of trust in human-agent interaction, particularly in the context of home robot systems. The study found that participants were more likely to collaborate when they understood the agent’s intentions. The research involved 36 participants working with an agent across three different interfaces to complete household tasks. The results showed that clear agent intentions can increase cooperation and calibrate trust to an appropriate level.
Knowledge graph-based dialogue systems are capable of generating more informative responses and can implement sophisticated reasoning mechanisms. However, these models do not take into account the sparseness and incompleteness of knowledge graph (KG) and cannot be applied to dynamic KG. This paper proposes a dynamic Knowledge graph-based dialogue generation method with improved adversarial Meta-Learning (ADML). ADML formulates dynamic knowledge triples as a problem of adversarial attack and incorporates the objective of quickly adapting to dynamic knowledge-aware dialogue generation. The model can initialize the parameters and adapt to previous unseen knowledge so that training can be quickly completed based on only a few knowledge triples. We show that our model significantly outperforms other baselines. We evaluate and demonstrate that our method adapts extremely fast and well to dynamic knowledge graph-based dialogue generation.
Deep manifold learning has achieved significant success in handling visual tasks by using Symmetric Positive Definite (SPD) matrices, particularly within multi-scale submanifold networks (MSNet). This network is capable of extracting a series of main diagonal submatrices from SPD matrices. However, these submanifolds do not take into account the distribution of the submanifolds themselves. To address this limitation and introduce batch normalization tailored to submanifolds, we devise a submanifold-specific normalization approach that incorporates submanifold distribution information. Additionally, for submanifolds mapped into Euclidean space, considering the weight relationships between different submanifolds, we propose an attention mechanism tailored for log mapped submanifolds, termed submanifold attention. Submanifold attention is decomposed into multiple 1D feature encodings. This approach enables the capture of dependencies between different submanifolds, thus promoting a more comprehensive understanding of the data structure. To demonstrate the effectiveness of this method, we conducted experiments on various visual databases. Our results indicate that this approach outperforms the MSNet.
The traveling salesman problem (TSP) is a well-known optimization problem that seeks to find the shortest possible route that visits a set of points and returns to the starting point. In this paper, we apply some heuristics of the TSP for the task of restoring the DNA matrix. This restoration problem is often considered in biocybernetics. For it, we must recover the matrix of distances between DNA sequences, if not all the elements of the matrix under consideration are known at the input. We consider the possibility of using this method on the testing of distance calculation algorithms between a pair of DNA’s to restore the partially filled matrix.
Although Rank SVM and its derivative model a novel support vector machine for multi-label classification (SVM-ML) have achieved very good results in multi-label classification problems, neither of these models can achieve zero empirical risk on the training set. Drawing on the memory mechanism used in binary SVM to solve this problem, we propose the MCMSVM model. The experimental results confirm the superiority of MCMSVM in performance on small datasets.
The relevance of the subject area under consideration is due to the need to effectively solve discrete optimization problems that arise in the process of analyzing high-dimensional communication networks. Namely, the article examines decision-making procedures in the presence of several predictor functions, i.e. special very quickly executing auxiliary functions that a priori evaluate the effectiveness of choosing a separating element for some iterative algorithm. Cases of one, two or three predictors are considered, as well as various schemes of so-called voting, that is, choosing one predictor from those proposed with an assessment of the probability of making the right decision. The qualitative result of the research is presented with the development of recommendations for the use of predictors. This approach can be extended to situations not directly related to discrete optimization algorithms. In particular, the results obtained can be used in organizing the obtaining of an expert opinion in the presence of one, two or three experts who, in general, have different qualifications, and possible schemes for their decision-making.
Visual display systems have wide-ranging applications in training simulators, serving as comprehensive systems that expand the display field of view by seamlessly combining multiple display units. This paper introduces an innovative spherical display system based on LED technology, designed to achieve close-proximity real-image display within a spherical enclosure. Additionally, we propose a channel division method tailored for this novel LED spherical display to enable efficient display driving. To address geometric distortion issues that arise when rendering flat graphics on a spherical surface, we present a geometric correction method specially designed for LED spherical displays, providing a comprehensive explanation of its theoretical principles and correction process. Experimental results demonstrate that with the implementation of our proposed channel division and geometric correction methods, the number of driving channels for the LED spherical display is reduced, image cropping is minimized, and there is no need for image fusion processing. Corrected images exhibit no distortion, and channel alignment remains continuous without any displacement. This method offers a straightforward operation, requires minimal parameter settings, and is easily implementable, providing a solid foundation for the widespread application of LED spherical vision display systems the future.
Portrait matting refers to separating the portrait part from the background in an image. The difficulty of the problem lies in accurately identifying the pixels of the person and also maintaining the contour details. In this paper, we propose a fully automatic deep learning approach to achieve portrait matting. Firstly, semantic segmentation is used to predict the probabilities of pixels belonging to portrait, background, and unknown region, then a trimap is obtained. In order to remove the misclassification of pixels, we refine the portion of head contour for the trimap. The method used is to introduce the result of facial landmark detection, and erosion operation is performed on the head region while maintaining the integrity of the facial contour of the portrait. After that, we use deep matting method to predict the alpha value in the image to get the matting results at the details. We then propose a novel framework that integrates the optimised trimap, the deep matting result, and the original image to obtain the final matting result. Both qualitative and quantitative experiments verify the effectiveness of the proposed method.
Regression trackers have been shown to perform superiorly in visual tracking. However, existing researches in regression trackers mainly explore deep models for feature extraction, and then use sophisticated architectures for online detection. Such systems should optimize a massive number of trainable parameters. In this paper, we present a simple yet effective visual tracking system, called LiteCNT. Our algorithm only consists of three convolutional layers for the whole tracking process. In addition, a multi-region convolutional operator is introduced for regression output. This idea is simple but powerful as it enables our tracker to capture more details of target object. We further derive an efficient and effective operator to approximate multi-region aggregation.
The most effective approach to achieving color consistency lies in accurate spectrum modeling, and the key to recover a faded spectrum is to recall the chromogenic metamer. In this paper, a spectral modeling mechanism is designed utilizing three primary colors as its core. Spectral recovering has been completed for all of the 1269 Munsell colors with corresponding RGB parameters. With both maximum entropy (ME) and least mean square error (LS) objectives, the mechanism works well with a result of 0.0046 as the average mean square error in the whole Munsell color space. The contribution of our approach not only lies in the accurate conversion from RGB to spectrum, but also in developing a set of color metamers for chromogenic methods of color calibration.
Nowadays, the security of neural networks has attracted more and more attention. Adversarial examples are one of the problems that affect the security of neural networks. The gradient-based attack method is a typical attack method, and the Momentum Iterative Fast Gradient Sign Method (MI-FGSM) is a typical attack algorithm among the gradient-based attack algorithms. However, this method may suffer from the problems of excessive gradient growth and low efficiency. In this paper, we propose a gradient-based attack algorithm RMS-FGSM based on Root Mean Square Propagation (RMSProp). RMS-FGSM algorithm avoids excessive gradient growth by Exponential Weighted Moving Average method and adaptive learning rate when gradient updates. Experiments on MNIST and CIFAR-100 and several models show that the attack success rate of our approach is higher than the baseline methods. Above all, our generated adversarial examples have a smaller perturbation under the same attack success rate.
The aim is to study the set of subsets of grids of Waterloo automaton and the set of covering automata defined by the grid subsets. The study was carried out using the library for working with nondeterministic finite automata NFALib implemented by one of authors (M. Abramyan) in C#. The results are regularities obtained when considering semilattices of covering automata for Waterloo automaton. A complete description of the obtained semilattices from the point of view of equivalence of the covering automata to the original Waterloo automaton is given, the criterion of equivalence of the covering automaton to Waterloo automaton in terms of properties of the subset of grids defining the covering automaton is formulated. The relevance of the subject area under consideration is due to the need to research of a set of regular languages and, in particular, description of their various subclasses. Also relevant are the problems that may arise in some subclasses. This will give, among other things, the possibility of describing new algorithms for the equivalent transformation of nondeterministic finite automata.
In visual SLAM algorithms, the assumption of scene rigidity serves as the foundation for algorithm operation. However, this assumption restricts the use of most visual SLAM systems in densely populated or vehicle-intensive environments, limiting their application in scenarios involving service robots or autonomous driving. In this article, we introduce IVI-SLAM, which is a visual SLAM system based on DynaSLAM. IVI-SLAM extends DynaSLAM by adding background restoration capabilities for both monocular and stereo modes, along with a method for depth recovery. We detect dynamic objects by using deep learning techniques and restore the background occluded by these dynamic objects by iterative training of image inpainting models. Subsequently, we utilize a depth recovery approach to restore the depth values in the affected region. We evaluate our system on publicly available monocular and stereo datasets, achieving promising results.
With digital learning as the background, this paper employs Python, and other technical means to study the impact of human-computer interaction on student learning behaviors, providing technical support for digital education decision-making. The study uses questionnaire surveys to collect student data on human-computer interaction in the learning process, and utilizes Python to conduct statistical analysis, correlation analysis and regression analysis to explore the impacts of perceived value and flow experience on sustained usage intention, validate the research model, and emphasize the importance of optimizing human-computer interaction and focusing on student needs for successful learning. Technical innovation lies in applying Python and other means to demonstrate the impact of human-computer interaction on student learning behaviors, providing technical support for the digital transformation decision-making in education. Starting from the customer value perspective, this study focuses on improving students’ learning experience, and provides theoretical and practical references for student-centered digital transformation in education.
In recent years, there has been a surge in the development of path query-based applications, leading to significant research efforts dedicated to addressing the path query problem. Previous studies have primarily focused on either path queries in temporal graphs or skyline path queries in static graphs, often overlooking the critical temporal aspect. Moreover, edge-labeled temporal graphs, frequently encountered in real-world scenarios such as traffic networks with labels like “expressway” or “provincial road,” have not been adequately considered in the context of skyline path queries that account for path stability amidst graph updates. However, in this paper, we introduce a novel approach for addressing the stability of skyline path queries in the context of edge-labeled temporal graphs. To tackle this challenge, we first introduce a globally updated Main Point (MP) index. Building upon this index, we propose a partition-based method to facilitate stable skyline path queries in temporal graphs. Our extensive experimental evaluations demonstrate the effectiveness and efficiency of the algorithm we present.
In this paper we study the prefix codes and application of prefix codes for problem of machine learning for deterministic finite state automaton. We give an example for the problem of constructing an inverse morphism, also parameterized by the number of transitions of automata. We investigate the factorization of prefix codes can give more simple structure of DFA for understanding his behaviors. To verify the correctness of the proposed approach, we implemented a system computer algebra GAP that accurately performs the logical flow of algorithm cycle by cycle.
The classification of workshop objectives based on deep learning is the foundation of intelligent workshop management. There are various types of targets in the workshop, with variable geometric shapes and disorderly distribution of targets. In this article, we propose a deep learning based solution that can improve the speed and accuracy of target classification. A deep neural network model can train specialized models for specific work environments. The experiment has shown that the scheme has significant improvements in accuracy and visual effects.
Small and medium-sized enterprises (SMEs) have developed rapidly in China, bringing enormous opportunities and challenges. In this study, we aim to investigate methods that can accurately assess credit risks of SMEs, using machine learning algorithms, focus on explainability, customer default forecasting, and delinquency. This study focused on the enterprises’ performance data and used the authorized invoice data of 425 SMEs in Chongqing. Machine learning algorithms, such as logistic regression, random forest, support vector machine, and soft voting ensemble learning methods, were used to establish a prediction classifier that was combined with the SHAP value to explain the feature contribution of a specific output. Therefore, Our study presented a strong correlation between the derived features and future delinquencies, which will enable in forecasting enterprises’ business performance.
For a long time, investors have always been influenced by their own experience and investment expert advice. Machine Learning method in Quantitative Investment is an advanced method that replaces subjective judgments with artificial intelligence models to improve transaction accuracy. The authors have implemented a composite stock price prediction model based on multi-layer training networks, which is more suitable for predicting future stock prices compared to traditional methods. This model starts from the time series of stock data and deeply integrates the characteristics of stocks with artificial intelligence. The deep training sub-model is a combination of machine learning models and traditional statistical methods. This unique design cleverly solves the one-sidedness of machine learning methods and the petrification of classical financial methods. Through comparative experiments on multiple prediction models within the same time period, the model proposed in this paper has been proven to be the most effective model.
Stock market prediction and trading strategies have been extensively studied in finance and AI. Due to market volatility, it’s challenging for investors to achieve high returns. To address this, we explore deep reinforcement learning using LSTM-DQN and FC-DQN models for stock trading. LSTM-DQN combines Recurrent Neural Networks and Q-learning to capture stock data patterns, while FC-DQN uses a fully connected neural network. After applying discrete wavelet transform to the raw stock data for denoising and smoothing purposes, experiments were conducted. When comparing these two models in terms of accuracy and cumulative returns, the LSTM-DQN model outperforms the FC-DQN model. It’s evident that the LSTM-DQN model generates greater profits in terms of investment returns, making it a more suitable choice for investors. Finally, this paper conducts an analysis and discussion of the experimental results.
Nowadays artificial intelligence (AI) has been applied in many high-stake decision-making tasks. The black box AI models which are lack of explainability can cause serious problems in practice. In the justice, an explainable model becomes more and more important. Since tree-based machine learning models are explainable, we propose an explainable legal judgment prediction model using concept trees with collegiate bench mechanism in this paper. A concept tree is constructed to check the classification labels predicted by the original multi-classifier. A revising process is designed to deal with the scenario when the results of the original multi-classifier and the concept trees are conflicted. Meanwhile, the concept trees grow into concept forest because of the existence of arbitration classifiers. The judicial judgment process is simulated, which not only makes the good classification performance with collegiate bench mechanism, but also has the model explanation from the features in the conceptual levels. The experiments validate the validity of our model with both better explainability and better accuracy.
This paper builds a neural network decision model based on back propagation neural network and deep Q-learning algorithm, studies intelligent business data analysis and decision-making driven by simulation technology, conducts decision model training by collecting actual business data and simulating business data, improves the accuracy and efficiency of decision output, completes real-time business data update decision-making, and provides decision-making basis for business operation.