Jeny Jijo1, Supreet Ronad2, Sathvik Saya2, Sampreeth Naik2 and Priyadarshini V2, 1Assistant Professor, Dept of CSE, PES University, Electronic City Campus, Bengaluru, 2Dept of CSE, PES University, Electronic City Campus, Bengaluru
An automated system that can be used to make the working of a restaurant more efficient is described. In today’s age of rapid meals and need for social distancing due to COVID-19, ensuring hygiene and keeping safe has become a top-most priority. But in businesses which involve serving people in short range, it is difficult to do the same. The solution is to make the whole system contact-less using technology. The proposed system makes use of a mobile app and a line following robot to deliver food to customers in a restaurant. Ordering through the application gives the user fast visual confirmation of our choices and assures that the things in the order placed is exactly what the customer ordered. This technology is paired with a robot that can bring meals to a certain table. The whole system is made low-cost as compared to the existing systems.
Automated service, Bot, Application, Flutter, Arduino, IoT, Restaurant.
Mr.P Salman Raju1, Dr.P Venkateswara Rao1 and Prof. S S Murthy2, 1Department of CSE, AKNU, Rajahmundry, India, 2Director, IPE, Hyderabad, India
The most successful method of securing data is cryptographic encryption. Modern cryptographic techniques significantly contribute towards todays accelerated cybersecurity needs. With the advent of data, it has become more important to secure the data from external interference to improve Cyber security. Various algorithms have been deployed with a penultimate goal of securing data of which Advanced Encryption Standard (AES) is one. The paper provides a substantial modification over Advanced Encryption Standard, with the aim of Cyber security and easy implementation focusing on the Symmetric Key Cryptosystem and AES algorithm. The performance was compared using different parameters such as data block size, encryption/decryption speed, run-time, and compile time. The main reason for this experiment was to exploit the effectiveness of the newly modified AES over the actual AES using the real-life application. A new key implementation process, the right shift key technique, and the elimination of the Mix-Column process were introduced. This paper provides calculations based on the AES-128-bit version, but the proposed algorithm can work on other versions as well. The goal was to achieve fast execution time and less memory usage while keeping the security hard to breach. The result depicts that our method achieved 88% throughput and better net execution time. This is a substantial improvement in terms of efficiency from the traditional AES.
Cryptography, Cyber Security, Symmetric Key Cryptography, Block Cipher, AES, Modified AES, Performance Evaluation, Result and Discussion.
Darshan B1, Moniak Rao2, S Saraf3 And Dr. A S Poornima4, 1& 4Department Of Computer Science Engineering, Siddaganga Institute of Technology Tumkur, India, 2JSS Science and Technology University Mysore, 3Commscope Inc
Human detection plays an important role various real life application. Most of conventional techniques rely upon utilizinghand made capabilities that are problem-structured and finest for particular tasks. Moreover, theyre fairly vulnerable to dynamical activities which includes illumination changes, digicam jitter, and variations in item sizes. On the opposite hand, the proposed function gaining knowledge of techniques are less expensive and less complicated due to discriminative capabilities. Most of conventional processes rely on utilizing hand made features that are problem-based and premiere for unique tasks. Moreover, they may be at risk of dynamical occasions along with illumination changes, digital cam jitter, and versions in item sizes. On the opposite hand, the proposed function mastering processes are inexpensive and less complicated due to abstraction and discriminative capabilities.In the sector of indoor tracking, researchers have proven an interest in deep learning for classifying each day human activities, detecting falls, and tracking gait abnormalities. Driving these interests are rising programs associated with clever and steady homes, assisted living, and medical diagnosis. The fulfillment of deep learning in offering an correct real-time accounting of determined human motion articulations essentially relies upon at the neural community structure, enter facts representation, and right training. Experimental results demonstrated that the proposed methods are successful for human detection task. Pretrained tensor flow model produces an average accuracy of 95%.
Motion Detection, Machine Learning, Tensor Flow, Deep learning, Multilayer perceptron.
Dharshika S, Sahreen Sajad and Sushmitha N, Department of Information Science and Engineering, R.V. College of Engineering Bengaluru, Karnataka, India
Epilepsy has severe impacts on patients including disrupting their social relationships and less mobility. Prediction of the disease can help the patient prevent the onset of seizures with the help of appropriate medication. Since the traditional methods of studying EEG are prone to misdiagnosis, Machine Learning can provide a more accurate diagnosis. In this paper, we aim to survey models to better describe methodologies for a high-precision model to predict epilepsy in patients.
Seizure, EEG (Electroencephalogram), Machine Learning, KNN (K-Nearest Neighbour, Logistic Regression, Decision tree.
Radhwan Adnan Dakhil and Ali Retha Hasoon Khayeat, Department of Computer Science, University of Kerbala, Karbala, Iraq
Repair and maintenance of underwater structures as well as marine science rely heavily on the results of underwater object detection, which is a crucial part of the image processing workflow. Although many computer vision-based approaches have been presented, no one has yet developed a system that reliably and accurately detects and categorizes objects and animals found in the deep sea. This is largely due to obstacles which scatter and absorb light in an underwater setting. With the introduction of deep learning, scientists have been able to address a wide range of issues, including safeguarding the marine ecosystem, saving lives in an emergency, preventing underwater disasters, and detecting, spooring, and identifying underwater targets. However, the benefits and drawbacks of these deep learning systems remain unknown. Therefore, the purpose of this article is to provide an overview of the dataset that has been utilized in underwater object detection and to present a discussion of the advantages and disadvantages of the algorithms employed for this purpose.
Underwater Object Detection, Deep Learning, Convolutional Neural Network (CNN), Underwater Imaging.
Xuanxi Kuang1 and Yu Sun2, 1University High school, 4771 Campus Drive, Irvine, CA 92612, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
Special communities specific to Autism Spectrum disorder face difficulties both socially and communicably . Autism spectrum disorder will affect their expression and response to society, and theyll have a hard time learning and following complex directions . This paper proposes software to promote ones collaborative skills and drawing skills with interaction with the AI system. At the same time, it also tries to raise awareness of the special group in our society. As an open platform, each individual will have opportunities to work with other users to cooperate, and theyll have a chance to learn drawing step by step from drawing that is contributed by more than 15 million players around the world. They can decorate the object with a color adjective to enhance their sense of beauty. In order to test the usability of the software, we did two experiments to test the accuracy of the graph and color combination. The result shows this software achieves a high accuracy on color input and obtains a correct graph from the input.
Interactive, Artificial intelligent, Self learning process.
Pinar Yildirim, Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Istanbul Okan University, Istanbul, Turkey
In this paper, a research study to extract knowledge in the online patient reviews for rheumatoid arthritis is introduced. Rheumatoid arthritis is a long-term and disabling autoimmune disease. Today, a huge amount of people have rheumatoid arthritis in the world. Considering the importance of the medication of rheumatoid arthritis, we aimed to investigate patient reviews in WebMD database and get some useful information for this disease. Our results revealed that etanercept treatment has the highest number of reviews. Data analysis was applied to discover knowledge on this drug. Deep learning approach was used to predict the effectiveness of etanercept and classification results were compared with other traditional classifiers. According to the comparison of classifiers, deep neural network has better accuracy metrics than others. Therefore, the results highlight that deep learning can be encouraging for medical data analyses. We hope that our study can make contributions to intelligent data analysis in medical domain.
Classification, Deep Learning, Etanercept, Online Drug Reviews.
Saranyanath K P1, Wei Shi2 and Jean-Pierre Corriveau1, 1School of Computer Science, Carleton University, Ottawa, Canada, 2School of Information Technology, Carleton University, Ottawa, Canada
Cyberbullying is a form of bullying that occurs across social media platforms using electronic messages. In this paper we propose three different approaches, and five models to identify cyberbullying on a generated social media dataset, derived from multiple online platforms. Our initial approach consists in enhancing a Support Vector Machines. Our second approach is based on DistilBERT, which is a lighter and faster Transformer model than BERT. Staking the first three models we obtain two more ensemble models. Contrasting the ensemble models with the three others, we observe that the ensemble models outperform the base model concerning all evaluation metrics except precision. While the highest accuracy, of 89.6%, was obtained using an ensemble model, we achieved the lowest accuracy, at 85.53% on the SVM model. The DistilBERT model exhibited the highest precision, at 91.17%. The model developed using the different granularity of features outperformed the simple TF-IDF.
Machine Learning, Natural Language Processing, Support Vector Machine, DistilBERT, Cyberbullying.
Tina Yazdizadeh and Wei Shi, School of Information Technology, Carleton University, Ottawa, Ontario, Canada
Communication using modern internet technologies has revolutionizedthe ways humans exchange information.Despite all the advantages made available by information and communication technology, its applicability is still limited due to problems caused by personal attacks or pseudo-attacks.Thesetoxic contents may be in the form of texts (e.g., online chats, emails), speech, or even images or movie clips on social media platforms.Because cyberbullyingof an individual via the use of such toxic digital contentmay have severe consequences, it is essential to design and implement, among others, various techniques to automatically detect cyberbullying from the social media content using machine learning approaches.During a cyberbullying detection process, word embedding techniques are used to represent words for text analysis, typically in the form of a real-valued vector that encodes the meaning of words such that the words that are closer in the vector space are expected to be similar in meaning.The extracted embeddings are then used todecide if a digital input contains cyberbullying content.Supplying strong word representations to classification methods is an important issue.In this paper, we evaluate the ELMo word embedding against three other word embeddings, namely, TF-IDF, Word2Vec, and BERT, usingthree basic machine learning models and four deep learning models.The results show that the ELMo word embeddings have the best results when combined with neural network-based machine learning models.
Cyberbullying, Natural Language Processing, Word Embeddings, ELMo, Machine Learning
Bakkialakshmi, Hindustan Institute of Technology and Science, India
Investigation of human subjects is the goal. The value and authenticity of human emotions are greater. The development of affect theory leads one to believe that it is necessary to be aware of ones sentiments and emotions to forecast ones behaviour. The proposed system considers the AMIGOS dataset, in which 40 test videos are exposed to the 20 participants. The dataset collects the ECG, EEG, and GSR values of the participants during exposure to video. The proposed line of inquiry is focused on developing a reliable model that incorporates neurophysiological data into actual feelings. Any change in emotional affect will directly elicit a response in the bodys physiological systems. The system that is described makes use of the AMIGOS dataset to develop a one-of-a-kind prediction method termed the Gaussian expectation-maximization technique. In terms of statistical parameters such as population mean and standard deviation, the suggested method is evaluated in comparison to a technique that is considered to be state-of-the-art. When doing an in-depth analysis of emotions, it is necessary to retrieve the comparative analyses of several people, each of whom possesses their own set of covariate points. Anger, hatred, disgust, happiness, and sadness are some of the emotional states that can be identified using the technique that has been provided. The proposed system determines an individuals emotional state after a minimum of 6 iterations of iterative learning, using the Gaussian Expectation maximization statistical model, in which to the level of zero error, the iterations tend to continue. Perhaps each of these improves predictions while simultaneously increasing the amount of value extracted.
Affective computing, Emotion Detection, Machine learning, Amigos, Emotional Psychology.
Yifei Tong1 and Yu Sun2, 1Trinity Grammar School, 119 Prospect Rd, Summer Hill NSW2130, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA92620
How can the ef iciency of volunteers be improved in performing bushcare in the limited amount of time able tobespent caring for each location every month ? Bushcare is a volunteer activity with a high dif iculty curve for volunteers just starting out as the crucial skill of distinguishing the native plants from the harmful invasive species only comes with experience and memorization. The lack of ability to distinguish targeted plants will greatly reduce the ef iciency of the volunteers as they workthrough the limited amount of time they have at each location each month while also discouraging newly joinedvolunteers from continuing this activity. To assist newly joined volunteers, the majority of each would likely be from a younger demographic with a digital app that could help the user distinguish the species of plant, making it easier for them to start familiarizingthemselves with both the native and invasive species in their area . The user could simply have to take a pictureof the plant they wish to identify and the software would use its image recognition algorithm trained with a databaseof dif erent species of plants to identify the type of plant and whether it needs to be removed. At the same time, moreexperienced volunteers could continue to use this app, identifying errors in the app’s identification to make it morereliable.
Flutter, Machine learning, Firebase, Image recognition.
Tony Zheng1 and Yu Sun2, 1Troy High School, 2200 Dorothy Ln, Fullerton, CA 92831, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
Many swimmers are constantly incorporating new and dif erent training regimes that would let them improve quickly . However, it is dif icult for a swimmer to see their progress instantly. This paper develops a tool for swimmers, specifically swimmers, to predict their future results. We applied machine learning and conducted a qualitative evaluation of the approach . The results show that it is possible to determine their future performance with decent accuracy. This application considers the swimmers performance history, age, weight, and height to predict the most accurate results.
Machine Learning, Mobile APP, database.
Calvin Huang1 and Yu Sun2, 1University High School, 4771 Campus Dr, Irvine, CA 92612, 2California State Polytechnic University, Pomona, CA, 91768, Irvine
In order to use the full power of artificial intelligence, many are required to navigate through a complex processthat involves reading and understanding code. Understanding this process can be especially intimidating to domainexperts who wish to use A.I to develop a project, but have no former experience with programming. This paperdevelops an application to allow for any domain expert (or normal person) to gather data, assign labels, andtrainmodels automatically without the use of software to do so. Our application, through a server, allows the user tosendHTTP API requests to train models, upload images to the database, add models/labels, and access models/labels.
Tensorflow Lite, Flask, Flutter, Google Colab.
Tianyu Li1, Yu Sun2, 1St. George’s School, 4175 W 29th Ave, Vancouver, BC V6S 1V1, Canada, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
Due to technological advancements, humans are able to produce more food than ever before. In fact, the foodproduction level is so high that all population could be supported if the food resource is distributed correctly. Yet, it is more than common to see items left expiring on the supermarket shelves, wasting the food resource that couldotherwise be useful. Neither are the adverse impacts on the climate due to food disposal in anyone’s favor orinterest. This paper proposes an application to identify the stock status of supermarket items, specifically food items, so that supermarket managers can react to the selling status and prevent oversupply. The key tool implementedinthe application is computer vision, specifically YOLOv5, which uses convolutional neural networks . The model automatically recognizes and counts the items in a taken picture. We applied our computer vision model tonumerous supermarket shelf photos and conducted an evaluation of the model’s precision and speed. The resultsshow that the application is a useful tool for users to log supermarket stock information since the computer visionmodel, despite lacking slightly in object detection precision, can return a reliable count for well-taken photos. As aplatform where such information is shared, the application is therefore a viable tool for store managers to import amounts of food accordingly and for the public to be informed and make smart buying choices.
Flutter, Roboflow, Computer Vision, Inventory Management.
Mikołaj Płachta and Artur Janicki, Warsaw University of Technology, Warsaw, Poland
This paper addresses the problem of detecting image steganography based in JPEG files. We analyze the detection of the most popular steganographic algorithms: J-Uniward, UERD and nsF5, using DCTR, GFR and PHARM features. Our goal was to find a single neural network model that can best perform detection of different algorithms at different data hiding densities. We proposed a three-layer neural network in Dense BatchNormalization architecture using ADAM optimizer. The research was conducted on the publicly available BOSS dataset. The best configuration achieved an average detection accuracy of 72 percent.
Steganography, deep machine learning, detection malware, BOSS database, image processing.
Sarah Fan1 and Yu Sun2, 1Sage Hill School, 20402 Newport Coast Dr, Newport Beach, CA 92657, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
Parkinson’s disease (PD) is a progressive neurodegenerative disorder that causes uncontrollable movements and dif iculty with balance and coordination. It is highly important for early detection of Parkinson’s disease in order for patients to receive proper treatment. This paper aims to aid in the early detection of Parkinson’s disease by using a convolutional neural network for PD detection from drawing movements. This CNN consists of 2 convolutional layers, 2 max-pooling layers, 2 dropout layers, 2 dense layers, and a flattened layer. Additionally, our approach explores multiple types of drawings, specifically spiral, meander, and wave datasets hand-drawn by patients and healthy controls to find the most ef ective one in the discrimination process. The models can be continuously trained in which the test data can be inputted to dif erentiate between healthy controls and PD patients. By analyzing the training and validation accuracy and loss, we were able to find the most appropriate model and dataset combination which was the spiral drawing with an accuracy of 80%. With a proper model and a larger dataset for increased accuracy, this approach has the potential to be implemented in a clinical setting.
Machine Learning, Deep Learning, Parkinson Disease.
Masoumeh Mohammadi and Shadi Tavakoli, DepartMent of DataScience & Machine Learning, Telewebion, Tehran, Iran
Applications require the ability to perceive others opinions as one of the most outstanding parts of knowledge. Finding the positive or negative feelings in sentences is called sentiment analysis (SA). Businesses use it to understand customer sentiment in comments on websites or social media. An optimized loss function and novel data augmentation methods are proposed for this study, based on Bidirectional Encoder Representations from Transformers (BERT). First, a crawled dataset from Persian movie comments on various sites has been prepared. Then, balancing and augmentation techniques are accomplished on the dataset. Next, some deep models and the proposed BERT are applied to the dataset. We focus on customizing the loss function, which achieves an overall accuracy of 94.06 for multi-label (positive, negative, neutral) sentences. And the comparative experiments are conducted on the dataset, where the results reveal the performance of the proposed model is significantly superior compared with other models.
Bidirectional encoder representations from transformers (BERT), Bidirectional long short-term memory (Bi-LSTM), Comment classification, Convolutional neural network (CNN), Deep learning, Opinion mining(OM), Natural language processing (NLP), Persian language sentiment classification, Persian Sentiment analysis, Text mining.
Byeong-Cheol Jo1,*, Tak-Sung Heo1,*, Yeongjoon Park1, Yongmin Yoo1, Won Ik Cho2 and Kyungsun Kim1, 1AI R&D Group, NHN, Seoul, Republic of Korea, 2Department of Electrical and Computer Engineering and INMC, Seoul National University, Seoul, Republic of Korea
Text classification has exhibited excellent performance since the advent of pre-trained language models based on Transformer architecture. However, in pre-trained language models, under-fitting often occurs due to the size of the model being very large compared to the amount of available training data. In light of this, we introduce three data augmentation schemes that help reduce underfitting problems of large-scale language models. Primarily we use a generation model for data augmentation, which is defined as Data Augmentation with Generation (DAG). Next, we augment data using text modification techniques such as corruption and word order change (Data Augmentation with Modification, DAM). Finally, we propose Data Augmentation with Generation And Modification (DAGAM), which combines DAG and DAM techniques. We conduct data augmentation for six benchmark datasets of text classification task, and verify the usefulness of DAG, DAM, and DAGAM through BERT-based fine-tuning and evaluation.
Data Augmentation, Text Generation, Text Modification, Summarization, Character Order Change.
Erion Çano and Benjamin Roth, Digital Philology, Research Group Data Mining and Machine Learning, University of Vienna
Collections of research article data harvested from the web have become common recently since they are important resources for experimenting on tasks such as named entity recognition, text summarization, or keyword generation. In fact, certain types of experiments require collections that are both large and topically structured, with records assigned to separate research disciplines. Unfortunately, the current collections of publicly available research articles are either small or heterogeneous and unstructured. In this work, we perform topic segmentation of a paper data collection that we crawled and produce a multitopic dataset of roughly seven million paper data records. We construct a taxonomy of topics extracted from the data records and then annotate each document with its corresponding topic from that taxonomy. As a result, it is possible to use this newly proposed dataset in two modalities: as a heterogeneous collection of documents from various disciplines or as a set of homogeneous collections, each from a single research topic.
Research Articles, Topic Segmentation, Multitopic Dataset, Keyword Generation, Research Resources.
Rita Hijazi 1, 2, Bernard Espinasse1 and Núria Gala2, 1Laboratoire Informatique et Systèmes, Aix-Marseille University, Marseille, France, 2Laboratoire Parole et Langage, Aix-Marseille University, Aix-en-Provence, France
Automatic Text Simplification (ATS) is the process of reducing the linguistic complexity of a text to improve its understandability and readability, while still maintaining its original information, content and meaning. Several text transformation operations can be performed such as splitting a sentence into several shorter sentences, substitution of complex elements, and reorganization. It has been shown that the implementation of these operations essentially at a syntactic level causes several problems that could be solved by using semantic representations. In this paper, we present GRASS (GRAph-based Semantic representation for syntactic Simplification), a rule-based automatic syntactic simplification system that uses semantic representations. The system allows the syntactic simplification of complex constructions, such as subordination clauses, appositive clauses, coordination clauses, and passive forms. It is based on transformations of graph-based meaning representation of the text expressed in DMRS (Dependency Minimal Recursion Semantics) notation using rewriting rules. The experimental results obtained on a reference corpus and according to specific metrics outperform the results obtained by other state of the art systems on the same reference corpus.
Syntactic Text Simplification, Graph-Based Meaning Representation, DMRS, Graph-Rewriting.
Boago Okgetheng, Gabofetswe Malema, Ariq Ahmer, Boemo Lenyibi, Ontiretse Ishmael, Department of Computer Science, University of Botswana, Gaborone, Botswana
Automatic spelling correction for a language is critical since the current world is almost entirely dependent on digital devices that employ electronic keyboards. Correct spelling adds to textual document accessibility and readability. Automatic spelling correction is essential for many NLP applications like web search engines, text summarization, sentiment analysis etc. A few efforts on automatic spelling correction in Bantu languages have been completed; however, the numbers are insufficient. We proposed a spell checker for typed words based on the Modified minimum edit distance Algorithm (MEDA), and the Syllable Error Detection Algorithm (SEDA). In this study, we adjusted the minimal edit distance Algorithm by including a frequency score for letters and ordered operations. The SEDA identifies the component of the word and the position of the letter which has an error. For this research, the Setswana language was utilized for testing, and other languages related to Setswana will use this spell checker. Setswana is a Bantu language spoken mostly in Botswana, South Africa, and Namibia and its automatic spelling correction are still in its early stages. Setswana is Botswana’s national language and is mostly utilized in schools and government offices. The accuracy was measured in 2500 Setswana words for assessment. The SEDA discovered incorrect Setswana words with 99% accuracy. When evaluating MEDA, the edit distance algorithm was utilized as the baseline, and it generated an accuracy of 52%. In comparison, the edit distance algorithm with ordered operations provided 64% accuracy, and MEDA produced 92% accuracy. The model failed in the closely related terms.
Bantu Spell Checker, Edit Distance algorithm, morphologically rich, Syllable Error Detection Algorithm.
Michael DeLeo and Erhan Guven, Whiting School of Engineering, Johns Hopkins University, Baltimore, USA
Representing a board game and its positions by text-based notation enables the possibility of NLP applications. Language models, can help gain insight into a variety of interesting problems such as unsupervised learning rules of a game, detecting player behavior patterns, player attribution, and ultimately learning the game to beat state of the art. In this study, we applied BERT models, first to the simple Nim game to analyze its performance in the presence of noise in a setup of a few-shot learning architecture. We analyzed the model performance via three virtual players, namely Nim Guru, Random player, and Q-learner. In the second part, we applied the game learning language model to the chess game, and a large set of grandmaster games with exhaustive encyclopaedia openings. Finally, we have shown that model practically learns the rules of the chess game and can survive games against Stockfish at a category-A rating level.
Natural Language Processing, Chess, BERT, Sequence Learning.
Ibrahim Hussein Musa1, Guilin Qi1* and Kang Xu2, 1School of Computer Science and Engineering, Southeast University, Nanjing 211189, Jiangsu, China, 2Computer Science Nanjing University of Posts Telecommunications, Nanjing, Jiangsu, China
Recently, topic modeling has been generally used to determine the abstract topics in text corpora. In an Unsupervised way, probabilistic topic models have enjoyed much success in extracting and analyzing Topics from document collections. Prevailing topic models are mainly founded on the idea that each Document is depicted as a probability distribution over the topic, and each topic is a probability distribution over words. However, this assumption is not optimal; a common deficiency of existing topic models are that they would not work well for extracting cross-lingual topics simply because words in different languages generally do not co-occur with each other. This paper proposes an integrated novel framework for Extracting topics from document collections. By incorporating a concept layer between the topic and word Layers, then transferring the knowledge to a cross-lingual model in order to improve the topic Classification for the target language with the ultimate objective of simplifying the process of extracting Shared topics in text data among different languages. Specifically, we propose a novel multilingual Document concept topic modeling (MDCTM). We derive the inference algorithm based on the Gibbs Sampling process. The empirical evaluation was carried out on two states of the art datasets by using Porter Stemmer for English documents and restoring words and forms of words for Chinese documents. Using jieba for word segmentation show that the (MDCTM) model can effectively extract concept topic Models from multilingual text data. Moreover, a noticeable advantage of our proposed model is that it can Be combined with state-of-the-art approaches to achieve better performances. We hope that this will eventually enable machines to better understand human concept, which can help reduce the ambiguities in Multi-language scenarios. Furthermore, we also hope that our system sets up a new baseline for future Concept level methods applied to a much wider class of corpora.
Topic models, concepts topic models, bilingual topic models, multilingual.
Abhigyan Ghosh and Radhika Mamidi, Language Technologies Research Center, IIIT Hyderabad, India
The hearing challenged communities all over the world face difficulties to communicate with others. Machine translation has been one of the prominent technologies to facilitate communication with the deaf and hard of hearing community worldwide. We have explored and formulated the fundamental rules of Indian Sign Language (ISL) and implemented them as a translation mechanism of English Text to Indian Sign Language glosses. According to the formulated rules and sub-rules, the source text structure is identified and transferred to the target ISL gloss. This target language is such that it can be easily converted to videos using the Indian Sign Language dictionary. This research work also mentions the intermediate phases of the transfer process and innovations in the process such as Multi-Word Expression detection and synonym substitution to handle the limited vocabulary size of Indian Sign Language while producing semantically accurate translations.
Indian Sign Language (ISL), Machine Translation, Sign Language(SL), Low Resource Languages, hearing loss people, human-computer interaction, text to sign, communication, speech to sign
Arisandy, Department of English, Kent State University
In order to is a conjunction that is commonly used for being able to be replaced by the infinitive marker to”. However, in Indonesia, it is still taught in middle schools to deliver its social function, text structure and linguistic features. Since English language teachers question the importance of teaching in order to, this study focused on finding the frequency of in order to in contexts covered by COCA. It was found that in order has been used 91,548 times in all modalities in the last three decades and 24,010 times for academic purposes. Academically, in order to is used more in geography or social science with 4122 times (20%) of all fields. However, the highest trend of use was in business, with a score of 310.97. Lastly, it was found that the word avoid collocated with in order to followed by the word protect.
in order to, COCA, Frequency, Collocated.
Gabriel Melo, Kayke Bonafé and Guilherme Wachs-Lopes, Department of Computer Science, University Center of FEI, São Paulo, Brazil
Depression is a topic that has gained prominence in recent years. According to the WHO  , depression affects more than 294 million people around the world. Works such as   indicate that early diagnosis is an important field of research since, in more severe cases, depression can lead to suicide. Therefore, this work proposes, implements and evaluates a computational model based on natural language processing to classify depressive tendencies of Twitter users through their posts over time. As a result, an F-Measure of 83.58% was obtained using not only textual content, but also the sentiment analysis of the documents. With this data, it is possible to perform a comparison to check whether the detection of depression is more related to the constant variation of emotions or the message conveyed by the text.
Depression, Natural Language Processing, Machine Learning.
Aayush Jadhav and Shubh Gupta, Department of Information Technology, Thadomal Shahani Engineering College, Mumbai, India
Crude oil is arguably the most important resource on the planet right now. Fluctuations in its prices affect every commodity due to the direct effect on transportation. Getting ahead of the uncertainty around crude oil prices can prove to be a game changer for businesses. It is extremely challenging to predict the price of crude oil because of its high volatility and its dependence on several external factors. People have been trying to make models using different machine learning algorithms and appropriate datasets to make the best price predictions. In this paper, we present a comparative analysis between 3 such algorithms, namely SVM, ANN and GARCH-GED available. The vital information from the WTI unrefined petroleum market dataset is used in all these models and they are evaluated on the basis of the RMSE value obtained.
Crude oil, Artificial Neural Network (ANN), Generalized Autoregressive Conditional Heteroskedasticity (GARCH), Support Vector Machine (SVM), Root Mean Square Error (RMSE).
MD. Motahar Mahtab, Monirul Haque and Mehedi Hasan, Department of Computer Science & Engineering, BRAC University, Dhaka, Bangladesh
Intentionally luring readers to click on a certain content by exploiting their curiosity defines a title as clickbait. Although several studies focused to detect clickbait titles in English articles, low resource language like Bangla has not been given adequate attention. To tackle clickbait titles in Bangla, we created an annotated dataset of ~15K news articles and ~65K unannotated news articles which can be used to create supervised or semi-supervised classification models. For the supervised approach, we ran a detailed comparison among traditional linguistic feature-based models, deep neural networks and state-of-the-art Transformer models. We also investigated whether training the Transformer models in a semi-supervised learning approach with the unannotated dataset can help improve the performance of the system. Extensive experiments on these two approaches show that the semi-supervised approach does not provide a significant performance gain over its supervised counterparts. We expect that this dataset and the detailed analysis and comparison of these clickbait detection systems will provide a fundamental basis for future research into detecting clickbait titles in Bengali articles.
Bangla Clickbait News, Bangla News, Clickbait News, Texts Classification, NLP, Low Resource Language.
Jose Armando Hernandez Gonzalez, Centre Borelli , Paris Saclay University, Gif-sur-Yvette, France
The present work proposes a legal assistance tool based on NLP and a predictive ML model adapted to the particularities of ANDJE (Agencia Nacional de Defensa Juridica del Estado Colombiano). It has been divided into 4 parts associated with specific requirements: with an exploratory analysis of the available EDA data, an NLP pre-processing of large volumes of information structured in databases and not structured in scanned PDFs, development of predictive models and user assistance tool based in Transformers. State defence attorney analyst. Considerations for future works.
NLP, BERT, Transformers, Machine Learning.
Sarah Alnefaie1,2, Eric Atwell2 and Mohammad Ammar Alsalka2, 1King Abdulaziz University, Jeddah, Saudi Arabia, 2University of Leeds, Leeds, UK
The dataset of the question and answer is essential in many fields. There is a lack in the Quran question and answer corpus. Therefore, this paper will create a valuable dataset for the research community. We first reviewed all the tools as black boxes, not as computational linguistics algorithms, compared them, and explored their features and drawbacks. Then we identified the freely available tools: Explore AI Question Generation demo, Cathoven Question Generator, Questgen Question Generator, and Lumos Learning Question Generator. Finally, we created a corpus of the Quran questions and answers using these web service tools. Our experiment showed that the tool’s performance varies by many criteria, such as criteria that measure the tool’s performance in general and standards that measure the quality of the generated text. Cathoven Question Generator outperformed the rest of the tools in general. These tools have produced 40,585 questions and answers about the English Quran.
Quran question and answer dataset, Automatic question generation tools, question and answer corpus.
Mariam Titilope Gobir, Department of English and Linguistics, Faculty of Humanities, Kwara State University, Malete, Nigeria
Since the 20th century, with the launch of natural language processing softwares, such as the Praat software in 1995, speech analysis has transcended subjective interpretations to scientific evaluations. This study aimed at analysing hypoarticulation and hyperarticulationin selected Nigerian newscasters’ renditions using the application of the Praat software. Ten (10) words were selected from a corpus of spoken texts collected through the survey method— the interview of selected Nigerian newscasters. From the data analysis, it was found that hyperarticulations are marked by phonemic substitution, /r/ intrusion andvowel length elongationwhile hypoarticulations are indicated by lack of pitch variation, phonemic reduction andsyllabic reduction. These findings are reflections of the attitudes of the Nigerian newscasters as second language speakers of English towards the attainment of proficiency in the standard variety. Not only attitude, the variations indicative of hypoarticulation and hyperarticulation in the newscasters’ renditions are not devoid of cognitive interference.
Hypoarticulation, Hyperarticulation, Speech Variation, Attitude, Cognition.
Prashant Kapil and Asif Ekbal, Department of Computer Science and Engineering, IIT Patna, India
The increase in usage of the internet has also led to an increase in unsocial activities, hate speech is one of them. The increase in Hate speech over a few years has been one of the biggest problems and automated techniques need to be developed to detect it. This paper aims to use the eight publicly available Hindi datasets and explore different deep neural network techniques to detect aggression, hate, abuse, etc. We experimented on multilingual-bidirectional encoder representations from the transformer (M-BERT) and multilingual representations for Indian languages (MuRIL) in four settings (i) Single task learning (STL) framework. (ii) Transfering the encoder knowledge to the recurrent neural network (RNN). (iii) Multi-task learning (MTL) where eight Hindi datasets were jointly trained and (iv) pre-training the encoder with translated English tweets to Devanagari script and the same Devanagari scripts transliterated to romanized Hindi tweets and then fine-tuning it in MTL fashion. Experimental evaluation shows that cross-lingual information in MTL helps in improving the performance of all the datasets by a significant margin, hence outperforming the state-of-the-art approaches in terms of weighted-F1 score. Qualitative and quantitative error analysis is also done to show the effects of the proposed approach.
M-BERT, MuRIL, Weighted-F1, RNN, cross-lingual.
Muhammad Saad Amin, Luca Anselma and Alessandro Mazzei, Department of Computer Science, The University of Turin, Turin, Italy
Information extraction is one of the core fundamentals of natural language processing. Different recurrent neural network-based models have been implemented to perform text classification tasks like named entity recognition. To increase the performance of recurrent networks, different factors play a vital role in which activation functions are one of them. Yet, no studies have perfectly analysed the effectiveness of activation function on Named Entity Recognition based classification task of textual data. In this paper, the authors have implemented a Bi-LSTM based CRF model for named entity recognition on the semantically annotated corpus i.e., GMB, and analysed the impact of all non-linear activation functions on the performance of the network. Our analysis has stated that only Sigmoid, Exponential, SoftPlus, and SoftMax activation functions have performed efficiently in the NER task and achieved an average accuracy of 95.17%, 95.14%, 94.38%, and 94.76% respectively.
Activation Functions, Groningen Meaning Bank, Named Entity Recognition, Recurrent Neural Networks.
Xiaohan Feng1 and Makoto Murakami2, 1Graduate School of Information Sciences and Arts, Toyo University, Kawagoe, Saitama, Japan, 2Dept. of Information Sciences and Arts, Toyo University, Kawagoe, Saitama, Japan
The advantages of using serious games for education have already been proven in many studies, especially narrative VR games, which allow players to remember more information. On the other hand, game walkthrough can compensate for the disadvantages of gaming, such as pervasiveness and convenience. This study investigates whether game walkthrough of serious games can have the same learning effect as serious games. Use game creation (samples) and questionnaires, this study will compare the information that viewers remember from game walkthrough and actual game play, analyze their strengths and weaknesses, and examine the impact of the VR format on the results. The results proved that while game walkthrough allows subjects to follow the experiences of actual game players with a certain degree of empathy, they have limitations when it comes to compare with actual gameplay, especially when it comes to topics that require subjects to think for themselves. Meanwhile game walkthrough of VR game is not a medium suitable for making the receiver memorize information. For prevalence and convenience, however, serious games walkthrough is a viable educational option outside the classroom.
Serious game, multimedia, educational game, virtual reality, narratology, Education Outside the Classroom(EOTC).
David Tang1 and Yu Sun2, 1Irvine High School, 4321 Walnut Avenue, Irvine, CA 92604, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
The well-known puzzle game Tetris, where arrangements of 4 squares (tetrominoes) fall onto the field like meteors, has been found to increase the brain’s ef iciency . Many variations came into existence ever since its invention. Sometimes, the leveling can become a double-edged sword, so this game is essentially a Zen mode without a leveling system. This game is built for people who want to play a 3D version of Tetris at a speed they themselves have set. This paper designs a game to exercise spatial visualization. This study uses a Unity/C++-based game . This game will be tested by kids on the autism spectrum, and we will conduct a qualitative evaluation of the approach. No results have been shown yet, and that is due to the fact that this study is still a work in progress. I am trying to make the game comply with the latest Tetris design guidelines that I can find online (that is, the 2009 guideline).
Tetris, Spatial visualization, 3-Dimensional Perception.
Shin-Hwan Kim1, Kyung-Yup Kim2, Sang-Wook Kim3 and Jae-Hyung Koo4, 1Access Network Technology Team, Korea Telecom, Seoul, Korea, 2Access Network Technology Team, Korea Telecom, Seoul, Korea, 3Access Network Technology Department, Korea Telecom, Seoul, Korea, 4Network Research Technology Unit, Korea Telecom, Seoul, Korea
Resent MU-MIMO(Multi User-Multi Input Multi Output) scheme is one of the important and advanced technologies. In particular, it is a suitable technique to increase the capacity from the point of view of solving cell load, which is one of the big issues in the contents of 5G commercial field optimization. While this MU-MIMO technology has an important advantage of cell capacity expansion, there is a disadvantage like an interference problem due to each multi-user beams. It is important to use the advanced beamforming technology for MU-MIMO to overcome these disadvantages. Therefore, by applying the interference cancelling technology among inter UE(User Equipment) beams to improve each UE’s performance, it will contribute to improving the cell throughput. This paper introduces the various techniques of eliminating interference in MU-MIMO system. Also, it is important that UE reports rank indicator reflected the interference of multi-user beams. This paper analyses the problem of the conventional method of the rank decision in MU-MIMO system, estimates the vehicular speed quickly with the proposed rank optimization technique, and shows the DL(Downlink) UE’s performance is improved by applying a proposed rank value suitable for vehicular speed. This technique will be effectively applied to increase the overall cell capacity by improving the DL UE’s throughput in the MU-MIMO system.
MU-MIMO, 5G, multi-user, interference, UE, DL, rank indicator, cell capacity.
Shana L and Dr C Seldev Christopher, Department of Computer Science and Engineering, St. Xavier’s catholic college of engineering, Chunkankadai, Nagercoil, India
Human activity is individual biometric behaviour that can be detected based on the distance which has different submissions in social security, forensic detection and crime prevention. Hence, in this paper, the Deep Recurrent Neural Network-based Chimp Optimization Algorithm (DRNN-COA) is developed to identify human actions from the images. This proposed methodology is designed with three phases such as keyframe extraction, feature extraction and classification. In the keyframe extraction stage, the Structural Similarity Measure (SSIM) is utilized. In the feature extraction stage, Scale-Invariant Feature Transform (SIFT), Coverage factor and Space-Time Interest (STI) are used. The essential features are extracted with the consideration of the feature extraction method. Then the classification process is done by utilizing the proposed DRNN-COA Algorithm. Based on the proposed method, human activity classification is achieved which is utilized to identify the actions from the images. The proposed method is validated by using the KTH databases. The proposed method is implemented in the MATLAB platform and their corresponding performances/outputs are evaluated. Moreover, the statistical measures of the proposed method are also determined and compared with the existing method as Artificial Neural Network (ANN), Random Forest (RF) and Support Vector Machine (SVM) respectively.
Human Activity Recognition, Similarity Index, Keyframe Extraction, Feature Extraction & Statistical Measurements.
Janete Amaral, Alberto S. Lima, José Neuman de Souza, Lincoln S. Rocha, MDCC, Universidade Federal do Ceará, Fortaleza/CE – Brasil
Researchers consider that the first edition of the book “The Art of Software Testing”, by Myers (1979), initiated research in Software Testing. Since then, software testing has gone through evolutions that have driven standards and tools. This evolution has accompanied the complexity and variety of software deployment platforms. The migration to the cloud allowed benefits such as scalability, agility, and better return on investment. Cloud computing require a greater involvement of software testing to ensure that services work as expected. In addition to testing cloud applications, cloud computing has paved the way for testing in the Test-as-a-Service model. This work aims to characterize Test-as-a-Service, in the context of cloud computing. Based on the knowledge explained here, we sought to linearize the evolution of software testing, characterizing fundamental points, and allowing us to compose a synthesis of the body of knowledge in software testing, expanded by the paradigm of cloud computing.
Cloud computing, Software Testing, Test-as-a-Service.
Ricardo Souza Barreto Barcelos1 and Carlos Leonardo Ramos Póvoa2, 1Science and Technology Center, UENF, Campos dos Goitacazes, Brazil, 2Science and Technology Center, UENF, Campos dos Goitacazes, Brazil
The delivery of packages and products using bicycles is a worldwide trend in business logistics. The challenge for large cities is to use bicycles to bring sustainability to logistics, without losing the efficiency and productivity of the delivery service. The non-burning of fossil fuels and the agility of displacement make bicycles an excellent transportation option in urban centers. This work consists of a case study carried out in a bicycle delivery company. The company assembles delivery routes empirically. Using the Geo-Route route planner the routes were optimized, which in comparison with the routes used by the company produced a reduction in distance, as well as a reduction in the number of bicycles used in the delivery. With the use of the proposed routes, an increase in productivity and a more efficient division between the teams was obtained. Package delivery by bicycles favors studies that explore more and more alternative transportation options. The Geo-Route system provided a cost reduction making its use in the company satisfactory.
Bicycle touring, Logistics, Transportation.
Qinqin Guo1, Yu Sun2, 1Portola High School, 1001 Cadence, Irvine, CA 92618, 2California State Polytechnic University, Pomona, CA, 91768, Irvine, CA 92620
Pet tracking has been an important service in the pet supply industry, as it is constantly needed by countless pet owners . As of 2021, about 90 million families in the U.S. alone have a pet, that is about 70% of all American households. However, for most owners of smaller pets such as cats, hamsters, and more, not being able to find the pet within the house has been a problem bothering them. This paper proposes a tool to use Raspberry Pi for gathering signal strength data of the blue tooth devices and using Artificial Intelligence to interpret the gathered data in order to get the precise location of the indoor moving object . The system is applied to arrive with the location of pets within the house to an accurate level where the room that the pet is located in is correctly predicted. A qualitative evaluation of the approach has been conducted. The results show that the intelligent system is effective at correctly locating indoor pets that are constantly moving.
Raspberry Pi, Firebase, machine learning, Artificial Intelligence(AI).
Ehlimana Krupalija, Emir Cogo, Šeila Bećirović, Irfan Prazina and Ingmar Bešić, Department of Computer Science and Informatics, Faculty of Electrical Engineering, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
Cause-effect graphs are often used as a method for deriving test cases for black-box testing. This paper presents a comparison of different available algorithms for converting cause-effect graph specifications to test case suites and problems which may arise when using different approaches. Different types of representation and graphical notation for describing nodes, logical relations and constraints used when creating cause-effect graph specification are also discussed. An overview of available tools for creating cause-effect graph specifications and deriving test case tables is given. The systematic approach in this paper is meant to offer aid to domain experts and end users in choosing the most appropriate algorithm for deriving test case tables in accordance to specific system priorities and gaining a better level of understanding the notation used for specifying cause-effect graphs so that the most common mistakes can be avoided.
Cause-effect graphs, Test case suites, Black-box testing, Software Testing, Software Quality.
Mohamed Azouz Mrad, Kristóf Csorba, Dorián László Galata, Zsombor Kristóf Nagy and Brigitta Nagy, Budapest University of Technology and Economics, Budapest, Hungary
Dissolution testing is part of the target product quality that is essential in approving new products in the pharmaceutical industry. The prediction of the dissolution profile based on spectroscopic data is an alternative to the current destructive and time-consuming method. Raman and near-infrared (NIR) spectroscopies are two fast and complementary methods that provide information on the tablets physical and chemical properties and can help predict their dissolution profiles. This work aims to compare the information collected by these spectroscopy methods to support the decision of which measurements should be used so that the accuracy requirement of the industry is met. Artificial neural network models were created, in which the spectroscopy data and the measured compression curves were used as an input individually and in different combinations in order to estimate the dissolution profiles. Results showed that using only the NIR transmission method along with the compression force data or the Raman and NIR reflection methods, the dissolution profile was estimated within the acceptance limits of the f2 similarity factor. Adding further spectroscopy measurements increased the prediction accuracy.
Artificial Neural Networks, Dissolution prediction, Comparing spectroscopy measurement, Raman spectroscopy, NIR spectroscopy & Principal Component Analysis.
Albeiro Espinal1, 2 and Yannis Haralambous1 and Dominique Bedart2 and John Puentes1, 1IMT Atlantique, Lab-STICC, CNRS UMR 6285, Brest, France, 2DSI Global Services, Le Plessis Robinson, France
Automated resume ranking aims at selecting and sorting pertinent resumes, among those sent to answer a given job of er. Most of the screening and elimination process relies on the resumes’ content, marginally including information of the job of er. In this sense, currently available resume ranking approaches lack of accuracy in detecting relevant information in job of ers, which is imperative to assure that selected resumes are pertinent. To improve the extraction of relevant terms that represent significant information in job of ers, we study the uncertainty-oriented selection of 16 textual markers – 10 obtained by examining the behaviour of expert recruiters and 6 from the literature – according to two approaches: fuzzy logistic regression and fuzzy decision trees. Results indicate that globally, fuzzy decision trees improve the F1 and recall metrics, by 27% and 53% respectively, compared to a state-of-the-art term extraction approach.
Recruiters Behavior Modeling, Relevant Term Extraction, Textual Relevance Marker Evaluation, Uncertainty Measure, Fuzzy Machine Learning.
Lemlem Kassa1, Jianhua Deng1, Mark Davis2 and Jingye Cai1, 1School of Information and Software Engineering, University of Electronic Science and Technology China (UESTC), Chengdu 610054, China, 2Communication Network Research Institute (CNRI), Technological University, D08 NF82 Dublin, Ireland
A Machine Learning (ML) is an innovative solution that can autonomously extract patterns and predict trends based on environmental measurements and performance indicators as input to provide a self-driven intelligent network systems that can configure and optimize themselves. Under the effects of heterogeneous traffic demand among users and varying channel conditions in WLAN downlink MU-MIMO channels, it is challenging to achieve the maximum system throughput performance. In addressing these issues, the existing studies have proposed different approaches. However, most of these approaches did not consider a machine-learning based optimization solution. The main contribution of this paper is to propose a machine-learning based adaptive approach that can optimize system frame size that would maximize the system throughput of WLAN in the downlink MU-MIMO channel. In this approach, the Access Point (AP) performs the maximum system throughput measurement and collects the “frame size-system throughput patterns” which contains knowledge about the effects of traffic condition, channel condition, and number of stations (STAs). Based on these patterns, our approach uses neural networks to correctly model the system throughput as a function of the system frame size. After training the neural network, we obtain the gradient information to adjust the system frame size. The performance of the proposed ML approach is evaluated over the FIFO aggregation algorithm under the effects of heterogenous traffic patterns for VoIP and Video traffic applications, channel conditions, and number of STAs.
Frame Size Optimization, Downlink MU-MIMO, WLAN, Network Traffic, Machine Learning, Neural Network, Throughput Optimization.