
Ebook: Data, Information and Computing Science

Computer science, and the processing of data in particular, has become an intrinsic part of almost every field of scientific or commercial endeavor.
This book presents the proceedings of CDICS 2024, the 2nd International Conference on Data, Information and Computing Science, held from 6 to 8 December 2024 in Singapore. The conference offered a platform for academics, scientists, researchers, and experts with an interest in data, information and computing science to share their research results and discuss potential scientific and engineering developments arising from their work. The conference received 44 paper submissions, of which 10 papers were accepted for presentation and publication following a thorough review process conducted by members of the technical program committee and professional reviewers. The papers are divided into 2 sections: image processing and pattern recognition, and applications and intelligent computing, and cover all aspects of computing, information and data, ranging from theoretical foundations to novel models, algorithms and applications, and including computer vision; image processing; machine learning; data analysis; networking; and artificial intelligence.
The book will serve as an important source of reference and knowledge for research, and will be of interest to all those working in the fields of data, information and computing science.
The 2nd International Conference on Data, Information and Computing Science (CDICS 2024) was held in Singapore from 6 to 8 December 2024. This conference offered a platform for academics, scientists, researchers, and experts to express and discuss their interests in data, information and computing science.
The conference invited three keynote speakers to share their latest research: Prof. Sergei Gorlatch, from the University of Muenster; Prof. Teh Ying Wah from the University of Malaya; and Prof. Anand Nayyar from Duy Tan University. Each speaker delivered a 45-minute keynote. The conference also included 2 oral sessions and I poster session. CDICS 2024 provided an effective communication platform for all those participants who took the opportunity to share their research results and discuss potential scientific and engineering developments arising from their work.
The CDICS 2024 conference received 44 paper submissions, of which 10 papers were accepted. These cover all aspects of computers, information and data ranging from theoretical foundations to novel models, algorithms and applications, including: computer vision, image processing, machine learning, data analysis, networking, and artificial intelligence. All the papers included here passed a rigorous peer-review process by members of the technical program committee and professional reviewers. The variety and novelty of the research topics presented at the conference and exhibited in the papers as published in this book demonstrate the impact of CDICS 2024.
We would like to acknowledge all of those who supported CDICS 2024; the help received from individuals and institutions was very important for the success of this conference. In particular, we would like to thank committee chairs, committee members and reviewers for their tremendous contribution to the organization of the conference and the peer reviewing of papers.
CDICS2024 was a forum for excellent discussions that put forward new ideas, promoted collaborative research, and will support researchers as they take their work forward. We are sure that the book will serve as an important source of references and knowledge for research, which will lead not only to scientific and engineering progress, but also to novel products and processes.
The Editors
This research delves into the comparative assessment of diverse deep learning architectures for the automated identification of pulmonary diseases in chest X-ray images, aligning with the diagnostic framework of the International Classification of Deseases, 10th Revision (ICD-10). The evaluated architectures encompass Convolutional Neural Network (CNN), MobileNetV2, DenseNet, VGG16, and InceptionV3. Through an extensive investigation conducted on a comprehensive dataset comprising 108,948 chest X-ray images, we scrutinize the performance metrics and diagnostic accuracy of each architecture. Notably, our findings reveal MobileNetV2 as the frontrunner, exhibiting an outstanding accuracy rate of 97% for Emphysema disease. This remarkable accuracy underscores the robustness and efficiency of MobileNetV2, even amidst large- scale diasets, positioning it as a promising solution for the automated detection of pulmonary ailments from medical imaging. Additionally, we analyze the computational efficiency and resource requirements of each architecture, providing insights into their practical applicability in real-world clinical settings. The implications of our study extend beyond technical benchmarks, offering valuable insights for healthcare practitioners, researchers, and developers working towards enhancing diagnostic processes in pulmonary healthcare. By elucidating the comparative strengths and limitations of various deep learning architectures, this research contributes to the advancement of computer-aided diagnosis systems, paving the way for more accurate and efficient detection of lung diseases from chest X-ray imaging data.
Fruit detection refers to a method within image processing and computer vision that focuses on automatically recognizing and distinguishing different types of fruits using advanced algorithms and techniques. Fruits are essential and widely used in ceremonial offerings (Banten). They are pivotal in ensuring the completeness of ceremonial offerings. The main goal of fruit detection is to recognize fruits in image form, making it applicable in various applications, including inventory management, automatic classification in the agricultural industry, and medical applications for monitoring dietary patterns. Commonly used methods include artificial neural networks (deep learning), digital image processing, and feature extraction techniques to distinguish between shapes, colours, textures, and other visual features of different types of fruits. In this research, the deep learning algorithm CNN (Convolutional Neural Network) is employed. This technique, a deep learning algorithm, excels in accurate image recognition and classification. A total of 500 fruit images are used for five types of ceremonial fruits commonly used in religious activities, namely coconut, banana, candlenut, nutmeg, and areca nut. The residual learning method RESNET152 is used as the CNN algorithm architecture. According to the test findings, the highest accuracy rate achieved was 93% in correctly identifying ceremonial fruit images.
This research aims to develop an advanced material detection system for conveyor belts, utilizing state-of-the-art image processing and machine learning techniques to automate the identification of various materials, thereby enhancing operational efficiency and accuracy in industrial settings. Current methods in material detection, such as traditional manual sorting and basic automated systems, often need more precision and adaptability in dynamic industrial environments. This paper identifies a gap in the practical and reliable detection of diverse materials under varying environmental conditions. The primary research questions addressed include: How can modern image processing and machine learning techniques improve material detection accuracy? What optimizations can be applied to ensure real-time processing efficiency? The proposed solution integrates a strategic camera setup with controlled lighting, robust image preprocessing algorithms (noise reduction, normalization, and resizing), and custom-designed detection algorithms using machine learning models. This system outperforms existing solutions by offering higher detection accuracy and adaptability to diverse industrial conditions. An interface is developed to display detection results and provide intuitive controls for system adjustments, ensuring practical use by industrial professionals. Rigorous testing and validation processes are implemented to enhance detection accuracy and processing speeds, with specific performance metrics established to measure efficacy. This paper provides a transformative impact on the manufacturing and processing industries by addressing environmental variability and material diversity.
Potholes are cracks on the road’s surface that leave a hole behind it. Reporting potholes to accountable bodies at an early stage can save many lives. Therefore, timely inspection and maintenance of potholes are required for smooth transportation. Traditional pothole detection methods are labor-intensive and time-consuming. This research focuses on such gaps and presents an intelligent detecting system that uses a smartphone camera, sensors, and gyroscope for real-time detection of potholes. The proposed model covers two essential functions: i). automated identification of potholes, and ii). notifying users to escape probable accidents. The “Single Shot Multi-Box Detector (SSD)” technique trains the pothole image datasets. For developing the dataset, pothole images are taken and labeled with TensorFlow object detection API. The method achieved 90% accuracy in detecting potholes in used image datasets. Study outcomes can help stakeholders get road profiling information and alerts about potholes for smooth road transportation.
Detecting speaking intentions in multi-user VR environments can facilitate turn-taking, thereby making group interactions more effective in VR environments. This study aims to establish the recognition of speaking intentions based on motion and gaze data from VR devices during interactions involving multiple participants. Through in-depth statistical data analysis, we identified head and right-hand features associated with speaking intentions and discovered different temporal dependencies for motion and gaze features. Using these features, we show that these features allow for effective detection of speaking intentions, with the random forest (RF) classifier achieving the highest F1 score of 0.824 by mixing motion and gaze features with different data window sizes.
Segmentation of the retinal vessels is extremely useful and very important in the diagnosis and management of various diseases associated with the eye, including diabetic retinopathy and glaucoma. The work has presented an improved methodology using an IS-Net model trained on the high-resolution FIVES dataset, including 800 annotated images of the retina. This paper therefore resolves the proposed approach by pre-processing, which consists of normalizing and performing horizontal flipping, followed by enhancement using IS-Net and histogram-based thresholding criteria for vessel structure binarization. The IS-Net architecture is designed with multi-scale RSU blocks to capture both fine and broad vessel details comprehensively for segmentation. Results have shown that IS-Net achieves a good balance in recall and specificity, with the F1 score high enough to outperform other models in terms of specificity by reducing false positives. These findings underlined the effectiveness of IS-Net for clinical applications and emphasized the value of high-resolution data for refinement in the performance of segmentation.
The usage of Machine Learning in the clinical domain enables better decision-making skills and improves the quality of health care services rendered to the patients. This paper described a new MLOps framework which addresses the challenges faced when applying AI in the clinical domain. These include the handling of heterogenous, incomplete and evolving medical data sets, ensuring privacy of patients, controlling the versions of dataset and models adopted. The proposed framework supports a repository of evolving anonymized patient medical data, tracking of the feature selection and models adopted using versioning control system and records the accuracy of their outcomes. It also enables audit tracking and ensures transparency and accountability, thus helping to increase medical workers’ confidence in Machine Learning and encourages the use of ML in the clinical work. For evaluation, the proposed framework is set up in the clinical environment using existing patient’s records. Evidences, such as records of model training using specific datasets traceable to the actual patients’ records, are collected. These evidences fulfill the audit requirements and form the basis to justify research funding in future AI-related medical work. Finally, the study also proposes future work and directions to extend the framework into other domains.
Aiming at addressing the issue of highly specialized faulty text and the challenge of sparse character vectors in high-dimensional space resulting from repetitive characters and limited character types, a named entity recognition method based on Domain BERT (DBERT) is proposed. The DBERT model achieves effective dimensionality reduction and refinement of faulty text features by introducing a feature compression strategy. It also undergoes domain-specific pre-training to fully learn and adapt to the unique characteristics and specializations of faulty text. Subsequently, the DBERT model extracts context-related features of characters in the text and combines these features with specific character representations after a weighting operation. Named entity recognition is then performed using a combination of BiLSTM and CRF models. Finally, DBERT-BiLSTM-CRF is compared with LSTM-CRF and BiLSTM-CRF on an automobile maintenance domain dataset, demonstrating superior performance in terms of recall, precision, and F1 score.
With the rapid expansion of industrial IoT (IIoT), maintaining robust cybersecurity is essential for the smooth operation of industrial processes. Industrial environments require adaptive solutions to effectively mitigate evolving cyber threats and protect sensitive operations. This research aims to improve the cybersecurity of industrial IoT environments. The research intends to design and implement an adaptive and real-time intrusion detection system with edge computing integration that improves the reliability of the operations in industrial IoT. We incorporated machine learning approaches to classify cyber threats using XGBoost and Deep Neural Networks (DNN). A comparative analysis of results obtained from two datasets shows that the XGBoost model was slightly more accurate than the DNN model, with an accuracy of 79% for dataset D1 and approximately 99.42% for data set D2. This analysis also clearly demonstrates the usefulness of these machine learning approaches and the need to select a model depending on the requirements for detecting particular attacks. Confusion matrix analysis shows that both models have several advantages in terms of recognizing different types of cyber threats.
Globally, skin cancer—especially melanoma—is a major health concern, and better patient outcomes depend on early and precise diagnosis. Manual examination is a common component of traditional diagnostic techniques, but it is labor-intensive and subject to error. Promising approaches to automate skin lesion analysis have been made possible by recent developments in deep learning. In order to overcome the shortcomings of current approaches, this work investigates the creation of a hybrid deep learning model that blends VGG16 and ResNet50 architectures. The suggested approach incorporates sophisticated preprocessing methods, such as picture normalization and data augmentation, to improve feature extraction and classification accuracy using datasets like ISIC 2018 and HAM10000.Important results show that the hybrid model outperforms state-of-the-art benchmarks in performance metrics, achieving 98.75% training accuracy and 97.50% validation accuracy. The model’s enhanced precision (97.60%), recall (97.55%), and F1 score (97.58%) highlight how reliable it is at differentiating between benign and malignant tumors. The study also emphasizes how crucial it is to strike a balance between model complexity and computing performance in order to support practical clinical deployment. This study advances computer-aided diagnostic (CAD) systems for the identification of skin cancer by tackling issues including class imbalance and dataset unpredictability. The suggested method has a great deal of promise to improve early diagnosis, lessen the need for invasive treatments, and assist dermatologists in making clinical decisions. This paper thoroughly reviews current methodologies used in skin cancer detection, offering a timely resource for researchers developing automated and precise melanoma detection models.