Sentiment Analysis of Customer Feedback in Online Food Ordering Services

Background: E-commerce websites have been established expressly as useful online communication platforms, which is rather significant. Through them, users can easily perform online transactions such as shopping or ordering food and sharing their experiences or feedback. Objectives: Customers' views and sentiments are also analyzed by businesses to assess consumer behavior or a point of view on certain products or services. Methods/Approach: This research proposes a method to extract customers' opinions and analyse sentiment based on a collected dataset, including 236,867 online Vietnamese reviews published from 2011 to 2020 on foody.vn and diadiemanuong.com. Then, machine learning models were applied and assessed to choose the optimal model. Results: The proposed approach has an accuracy of up to 91.5 percent, according to experimental study findings. Conclusions: The research results can help enterprise managers and service providers get insight into customers' satisfaction with their products or services and understand their feelings so that they can make adjustments and correct business decisions. It also helps food e-commerce managers ensure a better e-commerce service design and delivery.


Introduction
Today, advanced information technology has changed the way of communication; it helps users easily access information and exchange their opinions about products and services on a large scale in real-time. The advent of social media and review websites allows users to express their opinions (Akila et al., 2020). The explosion of big data has made online community comments or reviews need to be collected and mined automatically, allowing enterprises to track customers' shopping behavior, interests, and satisfaction with products and services (Yadav, 2015;Akter et al., 2016). Hidden in those comments are the happy, sad, love, and hate feelings. Such "emotional" things it is a big challenge for computers without human reading and self-understanding. From an e-commerce standpoint, detecting the correct customer emotions will help us display better advertising content. For example, spotting a person in a tired mood can suggest some energy drinks, an entertainment venue, or simply play a piece of gentle music. The research direction is not a new one. However, each method has its advantages and disadvantages, and no method is accurate. Because of the intricacy of the Vietnamese language structure, using a lexicon-based technique for opinion mining poses a significant barrier for academics. To deal with the Vietnamese language, there aren't many sets of emotional vocabulary or handling methods. Small businesses are beginning to see the value of social media in achieving their objectives (Balan et al., 2017). Recently, Nguyen et al. (2020) proposed exploring user experience in the hotel sector by using the Topic Model, which is also an effective method in analyzing and extracting information from the corpus of customers' opinions. Therefore, the application of machine learning methods and evaluation of the Accuracy is necessary to choose the most suitable method through collected datasets.
The goal of this study is to analyze opinion mining studies and suggest the use of a machine learning approach to exploit consumer comments in Vietnamese. This research applies the knowledge mining method from data collected by automatic programs, including 236,867 reviews from customers on online ordering services and eating places review channels, namely foody.vn and diadiemanuong.com, which are famous for e-commerce websites in Vietnam. Then, data preprocessing was conducted, and machine learning methods were applied to find the best model and predict sentiment scores for the rest of the corpus.
The structure of this paper is divided into five sections. Section 1 describes the necessity of the research. Theoretical bases related to the research are presented in Section 2. In Section 3, the author describes the research method and experimental designs. The research results are detailed in Section 4. Finally, conclusions and future research are presented in Section 5.

Related works
This section focuses on exploring related research in customer opinion mining sentiment analysis, especially in the online service field. The machine learning and lexiconoriented approaches in some research are also explored and analyzed to form the basis of this research.

Customer Opinion Mining in online services
The development of technology and using social media on a large scale has created opportunities to get useful insights from data without proper schema. Opinion mining in big data is used to categorize customers' opinions with different emotions and gauge customer mood. Opinion mining has gained significant results over time based on many comments available online. Customers have shared their opinions on products and services in restaurants, schools, hospitals, vacation destinations, etc. The value of a user's comment, review, or rating about some product or service is their thoughts, judgment, psychological or feelings about its quality, appearance, or price. Depending on individual perceptions, opinions can be positive, negative, or neutral. Users may now express their opinions and make them visible to anybody on the internet thanks to social media. Based on that, enterprises can improve their products, services, and marketing strategies, to detect the latest trends opportunities or measure the effectiveness of their marketing activities (Pejić Bach et al., 2019). Currently, the community of scientists has much research on opinion mining methods and the applications of opinion mining at many different levels. In the study of Akila et al. (2020) and Nagpal et al. (2020), the authors have proposed tools and methods to collect and analyze customer comments using machine learning and topic models. In another study by Patel et al. (2020), the author analyzed users' emotions based on the customer rating score of the products and services they used in the food services. From the results of domestic and foreign researches, the author found that there are two popular approaches in opinion mining: One of the limitations of the machine learning-based method is its dependence on the training dataset size, which is labeled and must be large enough. However, labeled data is often uncommon, especially in some narrowly specialized majors. Most research teams must spend time and cost on labeling the data. For the most part, sentiment analysis was characterized as "the study computation of views, feelings, and emotions represented in the text" (Nagpal et al., 2020). In other words, opinion mining, as a way of obtaining the viewpoint of the person who generated a certain document, has lately been the most popular study topic in general social networks (

Lexicon-based customer sentiment analysis
Opinions and comments of customers are natural written form (Liu, 2012). In some research by Maks et al. (2012), Akter et al. (2016) gave some methods and techniques of natural language processing in analyzing opinions and sentiment of customers through online commentary. Previous research mainly focuses on vocabulary -lexiconbased and machine learning-based methods. For the lexicon-based approach, the outcome depends heavily on the quality of the emotional words. In a subtle way, the outcomes of machine learning-based approaches, such as SVM and Nave Bayes, are significantly reliant on feature selection methods, such as n-gram or lexicon-based. The research of Vu et al. (2011) has given ways or reviews that explore words in Vietnamese comments in general, but it is almost absent in favor of the user emotions.
The lexicon-based method of analysis depends on the emotional vocabulary sources. An emotional vocabulary source, which is often understood as a dictionary, is a collection of words expressing emotions, with each word assessed as polarizing by a real number. These dictionaries can be built by hand or semi-hand. The advantage of this approach is that there is no training required since there is no need for labeled data. This method is commonly used for sentiment analysis on common text types: blog posts, comments on film, product, or forums.
The research of Ohana et al. (2009) used the SentiWordNet dictionary to evaluate the polarization of film comments. SentiWordNet is an automatically generated dictionary based on a WordNet database, and the best results get an accuracy of 69.35%. The authors conclude that using a SentiWordNet dictionary is as effective as using a hand-built dictionary. Other research has built their dictionaries based on different sources. Research by Taboada et al. (2011) and Liu (2012) affirms that dictionary building helps to establish a solid foundation for this approach.

Support Vector Machine -A classificational algorithm
SVM is a machine learning taxonomy using the kernel function to map a space of data points that cannot be linearly separated into a new space with error classification. For instruction on SVM and their recipe details, we refer readers to Burges (1998). A detailed treatment of the application of these models for text classification is possible found in Joachims (2002).
SVM is essentially an optimal problem; the goal of this algorithm is to find a space F and the super-plane decision f over F such that the classification error is lowest. Let the sample set {(x1, y1), (x2, y2), ... (xf, yf)} with xi ∈ R n belong to two classes of labels: yi ∈ {-1,1} is the corresponding class label of xi (-1 represents class I, 1 represents class II). We have, the super-plane equation contains the vector xi in space: xi.w + b = 0 Thus, in the equation (1)

Methodology
This section describes the General Model that the research proposes. Followed by steps to preprocess the data, train, evaluate the model, and conduct data analysis with the time factor.

Overview model and methods
The research data was collected for research purposes, containing raw data from the Foody.vn and diadiemanuong.com websites. Before the machine learning procedure, the raw data is preprocessed, sampled, and labeled. Training, validation, and test data are the three types of sampling data. The training dataset is used during the learning process and is used to fit the parameters; the validation dataset is a dataset of examples used to tune the hyperparameters of a classifier. Test datasets are used only once as the final step to reporting estimated error rates for future predictions. Figure 1 is an overview of the research model which we have done.

Figure 1 Proposed Overview Model and Methods
Accessing API website portals and collecting raw data

Data crawling
The Beautiful Soup and Selenium libraries in Python language collect data on the websites. The data collection is based on the Hypertext Markup Language (HTML) structures of foody.vn and diadiemanuong.com. If we want to collect some information data, we proceed to retrieve the data corresponding to the HTML tag containing that information. The result of this step will collect all website data in HTML or TXT formats. This data will be processed in the following steps.

Result of data collecting
The collected dataset had 236,867 records, shown in Table 1, including store name, address, commented customer name, commented time, comment content, rating of a customer for that store. The number of reviews gathered from foody.vn is 214,835 comments; for the diadiemanuong.com is 22,032 comments. This dataset will go into the preprocessing and cleaning step to provide input to the later steps of the models.

Data preprocessing
Collected data is raw unprocessed so that the data may be empty, misspelled, too short, too long, or contain icons. This will affect the analysis results, so we need to clean up the data. The steps are as below: • Remove icon and special characters: special characters do not have any definite meaning, on the other hand, cause interference in the analysis. Convert all to lower case: each character represents a binary sequence in computer memory. Because the upper-case characters will have a Unicode code that is different from the lower case, which has the same semantically, the computer will not be able to distinguish the input data so that the prediction may be affected. Therefore, converting the entire text to lowercase is reasonable for the analysis and prediction system. • Transform words to normal form: conversion to clear words is required for the preprocessing of the data. Comments on Foody (commented by users in Vietnamese) may have acronyms or misspellings. For example, words in Vietnamese: "ko ngon" (not delicious), "vs" (with), "15k" (15,000 VND) ... or data is not normalized, not standardized. This will interfere with the results of the analysis. During machine learning training, the input is "không ngon", but when predicting the output, the phrase "ko ngon" does not appear during the training, so it will be difficult to identify emotional and predictable results. • Remove blank/NULL data: the collected dataset will have a lot of blank data, which does not make sense in the analysis process, causing a waste of storage memory.

Data labeling
Normally, the data labeling in research applying machine learning will be built by hand. However, after randomly reviewing the content of the collected comment dataset and based on the results of the rating (the rating field in the dataset), founding that comments with a rating less than 5.0 have a negative meaning, and vice versa, comments with a rating equals or greater than 5.0 have a positive meaning. To perform the data labeling process before being trained, the research applied the classifying emotions method according to the customer rating (Liu, 2017;Patel et al., 2020) to divide the collected dataset into 2 datasets, labeled according to the following rules: • Rate < 5: reviews below 5 stars will be labeled negative.
• Rate >= 5: Review comments rated above 5 stars will be labeled as positive.
The labeling results showed that most of the data were positive comments which accounted for 81.9% of the total comments; the negative comments accounted for 18.1% of the total comments, as table 2 below:

Training and Evaluating model
Normally, the efficiency of opinion classification models is evaluated based on four indicators: Accuracy, Precision, Recall, and F1_Score (known as a harmonic average of Precision and Recall in Table 3). They are formulas (2), (3), (4), and (5), respectively. In addition, this research also considers the training time and the predicting time of each model. There is,

Results and Discussion
The results of data preprocessing, training, and model evaluation are presented in this section. Along with that, the results are visualized, and discussions related to the research topic are presented.

Result of training and Evaluating model
This is the most important stage of opinion mining research to determine whether a customer comment is "positive" or "negative". This research applies some classification methods of the Supervised Machine Learning group that are considered the best. Based on the results of the previous research related to the topic, find the most suitable model for the dataset, which is the classified comments. Then, forecasting the unsorted comment data or new comment data arises without retraining. Table 4 shows the experimental results of the methods. The Accuracy of Decision Tree is 89%, Naïve Bayes 82.5%, Logistic Regression 90%, and Support Machine Vector 91%. In addition, it also shows the training and prediction time of each method. The Decision Tree method has a training time of 1h 4m 32s and a prediction time of 14,300 ms, while the Support Machine Vector (SVM) has a training time of 6,320 ms and a prediction time of 31.25 ms. A clustered bar chart shows the experimental results of the model in Figure 2 below. In this chart, we can see the column that shows the SVM algorithm's Accuracy is highest with 91,5%.

Figure 2
Results of training and evaluating model (Precision, Recall, F1_Score, and Accuracy)

Result of visualization
The visualization results in Figure 3 include the following four charts: Rating by Store, Top stores with a high review, Criteria Scores by Year, and Sentiment by District. Reports are filtered, and information is displayed only in 2020.
The Rating by Store chart shows the average customer rating information for each store. In addition, it also shows the average rating of all stores, which is 5.904, through which we can correlate the rating of the store with the average value. For example, "3 Râu" -the fried chicken store has an average rating of 10.00, and R&B milk tea has 9.7. The Top Stores with high reviews chart show the total number of customer comments for each store. The chart shows "Mực nướng Đảo Ngọc", "Baozi -Ẩm thực Đài" are stores that are more interested in and commented on by customers than the rest of the shops.
The Criteria Scores by Year chart shows the total customer rating according to the criteria (location, price, quality, service, space). In 2020, the total rating by location is 2366, by price is 2319, by services is 2520, by quality is 2411, and by space is 2452.
The Sentiment by District chart shows information about total negative and positive comments distributed by districts in Ho Chi Minh. For example, District 1 has a positive comment rate of 63%, and negative comment rate is 37%, or Binh Thanh district has a positive comment rate of 64%, negative comment rate is 36%.

Figure 3 Dashboard Sentiment Analytics
Source: Authors' work The Word Cloud chart represents negative and positive keywords, making it easy for viewers to catch up with and compare them. In Figure 4, it's easy to see which words are mentioned the most in customers' comments, and the bigger words, the more mentioned. In the WordCloud_Positive chart, the word "món ngon" (delicious plates) appears most in the customers' reviews. Similarly, in the WordCloud_Negative chart, the word "thất vọng" (disappointed) was mentioned most.  The research has conducted experiments on the dataset for the SVM method combining the time factor. The results are shown in Figure 5; the Sentiment by Month- Year chart shows the percentage of positive and negative comments over time. For example, in February 2016, the rate of positive comments was 83.14%, and the negative comments rate was 16.31%; in September 2016, the rate of positive comments was 88.01%, and the negative comments rate was 11.99%. This dashboard lets managers capture customers' emotions very promptly and quickly. This makes a lot of sense in business and management. Figure 6 below is the accuracy result from 2015-2020 of the SVM method. The chart is the experimental results of the SVM method for the dataset grouped by year. Including  6 datasets (2015, 2016, 2017, 2018, 2019, and 2020). The SVM accuracy for the 2015 dataset was 89%, 2016 was 92%, and 2020 it was 92%.

Conclusion
In this paper, the research experimented, compared, and selected suitable machine learning methods to analyze and classify sentiment based on customers' opinions. The applications of the opinion categorization depend on the field, the analysis model, and the source of the collected data. In this research, we have proposed an application solution in natural language analysis, namely, customer sentiment analysis based on comments posted on foody.vn and diadiemanuong.com websites. The solution is tested on many different machine learning methods to compare the pros and cons of the model and select the best model through F1-Score measurement. The research results implemented on the corpus from 2011 to 2020 show that the SVM algorithm has the highest Accuracy with 91,5%. Especially creating visual reports, the analysis combined with the time factor to serve the decision-making needs of businesses. Solving the data explosion problem is to provide customer experience information in locations. The research provides a fundamental architecture in exploiting customer opinions from text data in Vietnamese on social networks, creating the basis for further research in exploiting Big Data in each industry field, creating value for business and consumers. In addition, the research results also significantly contribute to the practical application of social network data mining in the process of understanding users' needs, thereby making appropriate business decisions and management of an enterprise. At the same time, the results also open the application direction for regulators in gathering people's comments on drafts and management policies before being promulgated through social networks. The food and beverage sector will have strategies to develop better services and products to attract better and retain customers. In addition, the