Depression Detection from Arabic Tweets Using Machine Learning Techniques
Received Date: February 19, 2022 Accepted Date: March 19, 2022 Published Date: March 21, 2022
doi: 10.17303/jcssd.2022.1.103
Citation: Amjad A Alaskar (2022) Depression Detection from Arabic Tweets Using Machine Learning Techniques. J Comput Sci Software Dev 1: 1-10
Abstract
Social media platforms are being increasingly used, as many people around the world interact, communicate and share content with others. Social media users often reveal their feelings and emotions in their posts. Social media has become a vital online resource for studying the language used by social media users to express their mental health issues that can help to identify individuals at risk of harm. Researchers now have been more interested in mental health through social media. Twitter has been successfully implemented to explore several mental health conditions, including anxiety, depression, thoughts of self-harm, and suicide. Depression is the major cause of ill health and disability worldwide, and the number of people with common mental disorders is rising globally. In this paper, we build a model that is able to classify Arabic tweets based on the depression attributes selected by health professionals. In data collection, we collect tweets from the Twitter API. Then, we apply supervised machine learning techniques to extract tweets with the most depression attributes. After that, we evaluate the accuracy among applied supervised machine learning algorithms to identify the best algorithm for our model. We believe that the project can be used by health doctors to aid in diagnosis and provide help to depressed Twitter users.
Keywords: Data Mining; Depression; Machine Learning; Sentiment Analysis; Social Media; Twitter
Introduction
In recent decades, the use of social media platforms has increased. People can communicate and share content with others. Facebook and Twitter are the most popular platforms. Social media users are able to reveal their feelings in their posts.
In recent years, studies have been more interested in mental health through social media platforms. Many studies have investigated the association of language and social media usage patterns with several mental illnesses, including stress, depression, anxiety, and suicidality. Social media users often express their emotions, thoughts, and opinions with others. The contents of users’ activities can be a valuable source of information that can be used to identify and detect depression symptoms in users of social media [23]. Social media posts help capture behavioral features that are relevant to an individual’s thinking, mood, communication, opinions, and activities. The language and emotion used in social media posts can refer to feelings of worthlessness, helplessness, guilt, and self-hatred that characterize major depression [4].
In this paper, we constructed a model to identify depression symptoms. The model is able to classify Arabic tweets based on depression attributes to help doctors make decisions. In data collection, we collected tweets from the Twitter API. Then, we applied supervised machine learning techniques to extract tweets with the most depression attributes. After that, we evaluated the accuracy among applied supervised machine learning algorithms to identify the best algorithm for our model.
Background
Machine learning is a fast-growing field due to the increase in applications of data mining. Recent studies and applications focus on the problem of machine learning. In machine learning, data are processed to perform tasks of assembling, sorting, assimilating, and classifying information [13]. Machine learning techniques are highly associated with data mining. Machine learning shows how models can learn or improve their performance based on data. The main objective of machine learning is that the model automatically learns how to recognize and extract complex patterns and make intelligent decisions based on the data. Machine learning techniques are classified into supervised learning and unsupervised learning [8].
Supervised Algorithms
Supervised learning is equivalent to classification [8]. A training set and a test set are used to categorize the data. The input attributes and their corresponding class labels are included in the training set. The training dataset is used to construct the classification model, which aims to categorize the input features into matching class labels, while the test dataset is used to test the model validation by predicting the class labels of unseen features. To categorize datasets, machine learning algorithms such as Naive Bayes (NB), decision tree (C4.5, ID3, and C5), and support vector machines (SVMs) are used [5].
Unsupervised Algorithms
Clustering is equivalent to unsupervised learning. Because the input datasets are not class labeled, the learning process is unsupervised. Typically, clustering can be used to identify classes within the data. The clustering technique categorizes a group of objects into clusters based on how similar their properties are. In the same cluster, the objects are similar to each other, and objects from different clusters are dissimilar. Similarities and dissimilarities are calculated using attribute values that describe the objects and the distance measurements between them. Clustering is being used in a variety of fields, including security, spam filters, business intelligence, biology, and web search [8].
Related Work
A large number of researchers are interested in depression detection in different fields. Research has been performed within the psychology, medicine, and sociolinguistic fields to identify and correlate major depressive disorder and its symptoms. Additionally, studies related to data mining algorithms can analyze and detect depression symptom features to investigate the possibility of depression in social media networks and analyze any indication of depression through people's posts, feelings, dialect, and sentiment analysis. In this section, we will discuss the relevant research and methods of identifying and detecting depression in social media networks.
The study by Islam et al. [10] used Facebook data from a public online source for depression analysis. A group of psycholinguistic features was used in the proposed model. Machine learning methods were applied as an effective and scalable practice. They used four popular classifiers: decision tree, ensemble, support vector machine, and k-nearest neighbor. In terms of accuracy, the decision tree surpasses other machine learning algorithms used to evaluate Facebook comments to detect depression.
Based on a Twitter activity feed over a year, De Choudhury et al. [4] used crowdsourcing to assemble a group of Twitter users who had been diagnosed with clinical depression. They measured behavioral traits related to social activity, language, feelings, and linguistic patterns to create a statistical classifier that can identify depression risk before the onset is reported. They used the SVM classifier to predict before an individual is reported to be depressed and the likelihood of depression occurring. The classifier achieved results with a classification accuracy of 70%. The study showed that depressed users show decreased social activity, greater negative feelings, a higher focus on self-attention, increased relational and medical concerns, and an increased expression of religious ideas.
Daimi et al. [3] proposed using a classification-based approach to predict which patients are potentially depressed or who are already depressed. The classification model was trained and tested using synthetic data. The symptoms were chosen based on surveys and interviews with depression experts. The C4.5 decision tree technique was used in this study, and the WEKA tool was adopted for this research. The end results of the synthetic datasets were reasonable in accuracy, precision, and recall (sensitivity).
Resnik et al. [18] explored using supervised topic systems to analyze linguistic signals to identify depression. By using more advanced models to identify and detect depression, qualitative examples have confirmed that the LDA model, a common topic-extraction technique in machine learning, can reveal significant and potentially helpful latent structure by showing good outcomes using supervised LDA (SLDA) and supervised anchor (SANCHOR) topic models, as well as starting a preliminary exploration of a new supervised nested LDA model (SNLDA). Coppersmith created an experimental dataset from Twitter that included 3 million tweets from approximately 2,000 Twitter users. These were reviewed by a qualified clinical psychiatrist to determine which topics were most likely to be relevant in the depression assessment. More sophisticated topic models utilizing supervision, such as SLDA and SANCHOR, can be optimized in LDA alone, according to the quantitative experiment.
To evaluate an individual's risk of depression, Nadeem et al. [16] used a set of 2.5 million tweets to create a new method for building a classifier by treating social media as a text classification problem. Decision trees, logistic regression, linear support vector classifier, and Naive Bayes algorithms were all used. With an A-grade ROC AUC score of 0.94, a precision score of 82%, and an accuracy of 86%, the Naive Bayes algorithm surpassed all other classifiers. Therefore, it was considered the best model for predicting a user's mental health condition.
Sonawane et al. [21] developed a web application that takes social media posts and questionnaire tests as input and predicts depression levels based on the output. They used the Naive Bayes (NB) classifier. The system can identify whether the user is stressed based on the user's Facebook post, as well as a variety of questionnaires supported by the system, to deliver the appropriate doctor's location near the user's location.
Tadesse et al. [22] studied users' posts from Reddit to discover any depressive traits in online users. They employed a technical description of approaches applied to determine depression by using machine learning methods and natural language processing to classify text. They built a glossary of the majority of the commonly used words in depressive accounts. To extract features from users' linguistic usage in their posts, they employed the LIWC dictionary, LDA topics, and N-gram features. The suggested framework is developed by using logistic regression, SVM, random forest, adaptive boosting, and multilayer perceptron classifiers. The results showed that the MLP classifier reached 91% accuracy, achieving the highest result to determine the presence of depression in Reddit with LIWC + LDA + bigram. Additionally, the best feature among the single feature sets, the bigram, is the SVM classifier with 80% accuracy.
Ang Lia et al. [12] analyzed a data collection of 15,879 posts from a Chinese social media network named Weibo. Simple logistic regression, random forest, support vector machines, and multilayer perceptron neural networks were used for the study. Two classification systems were developed based on linguistic characteristics: one to distinguish between depressed and nondepressed posts (stigma/non stigma) and another to distinguish between posts with three different sorts of depression stigma (unpredictability/weakness/false illness). The results showed that random forest exhibited the highest F-measure value to distinguish between stigma/non stigma at 75.2%, and the value to distinguish between three different sorts of depression stigma was 86.2%. They estimated the coefficients of the indicators in the simple logistic regression (SLR) model to examine these relationships since the coefficients of the predictors in three of four models (SVM, RF, and MLPNN models) could not clearly indicate relationships between linguistic features and the stigma of depression.
Methodology
In this section, we discuss the methodology of the model and its phases. We build a model that is able to identify and classify Arabic tweets based on the depression attributes that are selected by health professionals. Then, we apply the classification techniques. Tweets are extracted based on the most depression attributes. Classification methods such as decision trees, random forests, k-nearest neighbor (KNN), multinomial Naive Bayes (MNB), and support vector machines (SVMs) are suitable for this kind of classification. As illustrated in Figure 1. The model contains the following phases:
Data Collection
We collect a set of 3424 tweets extracted from the Twitter social network. A connection to the Twitter API is created to collect Arabic tweets. We use Twitter posts collected from the Twitter API to explore and detect depressive behavior. The dataset covers the Arabic language and Saudi dialects. The collected tweets must contain some kind of sad and depressed feelings, and it is the goal of our project to extract valuable information from these tweets to identify the depression of Twitter users.
Text Preprocessing
Several preprocessing steps must be performed on the collected tweets to make the depression detection process more effective. These steps are as follows [2]:
- Data cleaning: It involves the handling of missing data and noisy data, and different symbols such as exclamation marks, punctuations, digits, and hashtags (!, $, %, &, #, etc.) must be removed. In addition, misspellings are corrected and repeated letters and non-Arabic tweets are removed.
- Stop word removal: Arabic stop word removal, such as , removes all forms of stop words. Some stop words can help achieve the full meaning of the tweet, and some of them are just additional characters that need to be removed.
- Normalization: The normalization of tweets is required. For example, the letters (أ ) or (ا ) can be used in the first letter of a word. For the word , it is normalized into .
- Tokenization: Tokenization is an important process; it reduces the typographical variation of words. The extraction of features requires tokenization. Dealing with the Arabic language requires a high-level component that uses a feature dictionary that converts these words into feature vectors, so the feature index (word) in the vocabulary is associated with its frequency in the entire training set.
- Stemming: The words in tweets are stemmed by eliminating any attached suffixes, prefixes, and infixes. This helps to minimize derived or inflected words into their stems, base, or root form to improve the classification process. For instance, the words are the root of the word .
Feature Extraction
After text preprocessing, the collected dataset is used to extract attributes or features that will be used to train our classifier model. Feature selection helps to increase classification accuracy by removing infrequent terms of both the training and testing sets of tweets. Feature extraction aims to extract the important content of a tweet, extracting the feature word that carries a message for the user, regardless of whether it is a depressive tweet or not [2]. Machine learning algorithms require representing the key attributes or features of data for processing to reduce dimensionality and feature space and enhance the performance of the classifiers. General feature selection and extraction methods may be used for outlier detection [8]. However, features are not always easy to extract.
The selection of features or attributes related to depression is the most important phase for constructing the classification model. Symptoms were selected based on surveys and interviews with experts in the field of depression to select the attributes needed for classifying depression. An example that displays the final set of attributes is presented in Table 1 below, with the translation of Arabic language and Saudi dialect tweets [3].
Machine Learning Classification Algorithms
In supervised machine learning, we build predefined classes, annotate the tweets, and label them to train the classifier. After the data collection process was completed, we split the dataset into 80% training and 20% testing sets. The training set contains input features and their corresponding class labels. By using this training set, the classification model is developed, which attempts to categorize the input features into corresponding class labels. Then, the model is validated by predicting the class labels of unseen features using a test set.
In this work, we applied supervised machine learning algorithms such as support vector machines (SVMs), k-nearest neighbor (KNN), multinomial Naive Bayes (MNB), decision trees, and random forests, which are suitable for this kind of classification.
Multinomial Naive Bayes (MNB) is a specialized type of Naive Bayes (NB) that was designed for text documents taking into account the consideration of word occurrences in training documents from class. It is a text document specialized version of Naive mathematician. Because of the existence and absence of explicit words, Naive Bayes is considered simple to model a document. Simple Naive Bayes classification of a document supports the existence and absence of explicit words. However, multinomial Naive Bayes expressly models word counts [15]. In addition, the multinomial model selects word frequency information in documents [14].
Support vector machines (SVMs) are a prediction model that uses features or variables that are extracted from data. Then, the features are treated as independent variables in an algorithm to forecast the dependent variable of an outcome of interest [6]. The SVM classifier is a non probabilistic binary classifier that finds the separating plane between two classes with maximal margins [17]. Unlike Naive Bayes, the SVM classifier makes non probabilistic binary vectors a learning algorithm to be applied for classification. It represents the features as points in space predicted to one of the assigned classes [2]. The SVM classifier uses a large margin for classification. It separates the tweets using a hyperplane. SVM creates a hyperplane that divides the data into two sets with the maximum margin in linear classification. When the two sides are equal, a hyperplane with the maximum margin has distances from the hyperplane to the points [1].
K-nearest neighbor (KNN) is considered the simplest and fundamental classification method when there is little or no prior knowledge about the data distribution. This rule simply keeps the whole training dataset during learning and assigns a class to each query based on the majority label of its k-nearest neighbors in the training set. Majority voting among the data records in the neighborhood is usually used to determine the classification of data record t with or without consideration of distance-based weighting [9]. However, to implement (KNN), we have to choose an appropriate value for k, and the success of classification depends greatly on the k value. Therefore, the KNN method is biased by the k value [10], [7]. In the model, we chose the value of k as 3.
Decision trees are commonly applied in machine learning techniques. They simply create a sequence of carefully designed questions in an attempt to classify the data. Decision trees are classifiers that forecast class labels for data items. Many scientific problems require labeling data items using a specific class based on data item features. Decision trees are built by analyzing a training dataset for which the class labels are known. Then, they are used to classify previously unseen data examples. If decision trees are trained with high-quality data, they can provide highly accurate predictions [11].
In the random forest technique, several decision trees are built by a random tree building algorithm. By taking the most common prediction among the trees, the predictions of the resulting group of decision trees are combined. Maintaining a set of good hypotheses, rather than committing to a single tree, decreases the chance that a new example is misclassified by assigning the incorrect class by many of the trees [11].
Evaluation Results
In this section, we discuss testing of the depression detection model by using text evaluation metrics such as accuracy, precision, recall (sensitivity), and F-score.
The following are the most commonly used measures for binary classification based on the confusion matrix values [19], [20]:
Accuracy: it approximates the effectiveness of the algorithm by showing the probability of the true value of the class label; furthermore, it evaluates the total effectiveness of the algorithm.
Precision: it is the number of positive examples correctly classified divided by the total number of examples classified by the system as positive and estimates the predictive value of a label, regardless of whether it is positive or negative, according to the class in which it is calculated; furthermore, it evaluates the predictive power of the algorithm.
Recall (sensitivity): it is the number of positive examples correctly classified divided by the total number of positive examples in the data. The sensitivity approximates the probability that the positive (negative) label is correct; furthermore, it evaluates the efficiency of the algorithm on a single class.
F-score or F-measure: it is the combination of the above and gives the relationships between positive data labels and those given by the classifier. The F-measure is a composite measure that takes advantage of high-sensitivity algorithms and challenges higher-specificity algorithms, where the F-measure is evenly balanced when β = 1.
Table 2 and Figure 2 illustrate the evaluation end results for the applied algorithms for support vector machine (SVM), k-nearest neighbor (KNN), multinomial Naive Bayes (MNB), decision tree (DT), and random forest (RF) to select the best algorithm for our model in detecting depression tweets.
Figure 3 shows that the k-nearest neighbor (KNN) classifier has the highest accuracy at 87.70%, followed by support vector machine (SVM) at 87.50% and random forest (RF) at 85.30%. Based on the results, the k-nearest neighbor (KNN) classifier outperformed the other classification techniques in almost all measures, followed by support vector machine (SVM) and random forest (RF).
Conclusion
We discussed the proposed system and decided to build a model that can classify Arabic tweets based on the depression attributes that are selected by health professionals. After that, we applied the classification techniques where tweets are extracted based on the most depression attributes. Then, we evaluated the accuracy of applied supervised machine learning techniques to select the best algorithm for our model. The evaluation results showed that the k-nearest neighbor (KNN) classifier outperformed the other classification techniques in almost all measures, followed by support vector machine (SVM) and random forest (RF).
In future work, we hope to understand how social media behavior analysis can help to develop widely scalable techniques for automated public health tracking in the Arabic world. Additionally, we are interested in using the potential of social media platforms to accurately track mental health in the Arabic population and to identify suffering individuals to target prevention and send awareness messages by doctors to enhance mental health. Furthermore, we are interested in applying more evolutionary techniques to Arabic text to extract more emotional features with the most depression attributes, which could help to improve the results. Additionally, more Arabic datasets are needed to verify the effectiveness and efficiency of our model due to the complexity of the Arabic language and the lack of resources and tools available for extracting Arabic sentiments. Additionally, Arabic dialects are changing over time and do not follow the formal grammatical structure of Modern Standard Arabic.
We believe that the research can be used by health doctors and health care agencies to help in diagnosis and provide help to depressed Twitter users and to determine suffering individuals in order to target prevention and send awareness messages by doctors to enhance mental health.
Conflict of Interest:
The authors declare that they have no conflicts of interest. The authors thank the Deanship of Scientific Research and RSSU at King Saud University for their technical support.- Alsaleem, Saleh (2011) Automated Arabic Text Categorization Using SVM and NB. Int. Arab J. e-Technol. 2: 124-8.
- Atoum, Jalal Omer, Mais Nouman (2019) Sentiment analysis of Arabic jordanian dialect tweets. Int. J. Adv. Comput. Sci. Appl 10: 256-62
- Daimi, Kevin, Shadi Banitaan (2014) Using data mining to predict possible future depression cases. International Journal of Public Health Science (IJPHS) 3: 231-40.
- De Choudhury, Munmun, Michael Gamon, Scott Counts, and Eric Horvitz (2013) Predicting depression via social media. In Seventh international AAAI conference on weblogs and social media
- Elbagir, Shihab, Jing Yang (2018) Sentiment Analysis of Twitter Data Using Machine Learning Techniques and Scikit-learn. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence 1-5.
- Guntuku, Sharath Chandra, David B. Yaden, Margaret L. Kern, Lyle H. Ungar, and Johannes C. Eichstaedt (2017) Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences 18: 43-9.
- Guo Gongde, Hui Wang, David Bell, Yaxin Bi, and Kieran Greer (2003) KNN model-based approach in classification. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, Berlin, Heidelberg. 986-96.
- Han, Jiawei, Jian Pei, and Micheline Kamber (2011) Data mining: concepts and techniques. Elsevier.
- Imandoust, Sadegh Bafandeh, and Mohammad Bolandraftar (2013) Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications 3: 605-610.
- Islam (2018) Depression detection from social network data using machine learning techniques. Health information science and systems 6: 8.
- Kingsford, Carl, Steven L Salzberg (2008) What are decision trees?. Nature biotechnology 26: 1011-3.
- Li Ang, Dongdong Jiao, Tingshao Zhu (2018) Detecting depression stigma on social media: A linguistic analysis. Journal of affective disorders 232: 358-62.
- Lloyd, Seth, Masoud Mohseni, Patrick Rebentrost (2013) Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411
- McCallum, Andrew, and Kamal Nigam (1998) A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization 752: 41-8.
- Mohana R, S Sumathi (2014) Document classification using Multinomial Naïve Bayesian Classifier. International Journal of Science, Engineering and Technology Research: 1557.
- Nadeem Moin (2016) Identifying depression on Twitter. arXiv preprint arXiv:1607.07384.
- Nguyen (2017) Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimedia tools and applications 76: 10653-76.
- Resnik (2015) Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality: 99-107.
- Sokolova, Marina, Guy Lapalme (2009) A systematic analysis of performance measures for classification tasks. Information processing & management 45: 427-37.
- Sokolova, Marina, Nathalie Japkowicz, and Stan Szpakowicz (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence, Springer, Berlin, Heidelberg: 1015-21.
- Sonawane NP (2018) Predicting Depression Level Using Social Media Posts. IJIRSET 7: 6016-9.
- Tadesse, Michael M, Hongfei Lin, Bo Xu, Liang Yang (2019) Detection of Depression-Related Posts in Reddit Social Media Forum. IEEE Access 7: 44883-93.
- Tsugawa (2015) Recognizing depression from twitter activity. In Proceedings of the 33rd annual ACM conference on human factors in computing systems 3187-96.
Tables at a glance
Figures at a glance