Sentiment analysis for Arabizi in social media. (c2015)

LAUR Repository

Show simple item record

dc.contributor.author Tobaili, Taha
dc.date.accessioned 2015-11-27T11:10:31Z
dc.date.available 2015-11-27T11:10:31Z
dc.date.copyright 8/26/2015 en_US
dc.date.issued 2016-02-02
dc.identifier.uri http://hdl.handle.net/10725/2702
dc.description.abstract With the vast increase of social media users over the past few years, millions of product reviews are discussed and posted in online forums and social media such as Facebook and Twitter. There are many applications for sentiment analysis and opinion mining in which governments or stock market observers use social media data to study the opinion of the public and predict election results or stock fluctuations. This is also useful for companies to collect feedback on their product releases. Filling rating surveys is no longer efficient when we have a free growing database full of the public’s opinion. It is therefore intuitive to make use of the social media’s textual data to build an automated software that predicts the sentiment of the public; however the challenge arises in analyzing informal languages. Most sentiment analysis research and progress is currently conducted in formal English. One major challenge is applying sentiment analysis techniques onto other languages. With approximately four million tweets posted daily in several Arabizi dialects, an informal Arabic whereby sentences are written using English alpha numerals e.g. Yalla 7abibi, it is very useful to have a data mining tool that can analyze the sentiment of Twitter users in the Arab world. We took the initiative to make use of this abundance of data by analyzing it and predicting sentiment. Applying the same sentiment analysis techniques that are used on English for Arabic is not a simple task due to their semantic and structural differences, and because Arabic is a rich morphological language. Informal Arabic lacks standardization and has no grammar, thus sentimental analysis in this area is considered a complex process. Sentiment Analysis for Arabic has been studied for MSA (Modern Standard Arabic) but rarely for informal Arabic, and non-existent for Arabizi; whereas most of the youth in Lebanon text in Arabizi claiming that it is easier than texting in Arabic. The prevalence of this expanding linguistic trend motivated us to target this NLP challenge. In this study, we created a novel Lexicon of around 10,000 informal opinion words using regular expressions to match over 50,000 words. We also created an algorithm that lemmatizes Arabizi words, and classifies input sentences into positive, negative or neutral categories. We collected around 400,000 Lines of Arabizi data from Whatsapp, Facebook, and Twitter. We filtered them and tested a small sample across our classifier achieving 80% classification accuracy. The dialect chosen for the lexicon is Lebanese, our native language. en_US
dc.language.iso en en_US
dc.subject Data mining -- Analysis en_US
dc.subject Public opinion -- Data processing en_US
dc.subject Natural language processing (Computer science) en_US
dc.subject Arabic language -- Lexicology -- Data processing en_US
dc.subject Web 2.0 -- Terminology en_US
dc.subject Dissertations, Academic en_US
dc.subject Lebanese American University -- Dissertations en_US
dc.title Sentiment analysis for Arabizi in social media. (c2015) en_US
dc.type Thesis en_US
dc.term.submitted Summer II en_US
dc.author.degree MS in Computer Science en_US
dc.author.school SAS en_US
dc.author.idnumber 201204908 en_US
dc.author.commembers Hajj, Hazem
dc.author.commembers Tarhini, Abbas
dc.author.woa OA en_US
dc.author.department Computer Science and Mathematics en_US
dc.description.embargo N/A en_US
dc.description.physdesc 1 hard copy: x, 65 leaves; ill., col. map; 30 cm. available at RNL. en_US
dc.author.advisor Sharafeddine, Sanaa
dc.keywords Sentiment Analysis en_US
dc.keywords Natural Language Processing en_US
dc.keywords Lexical-Based Classification en_US
dc.description.bibliographiccitations Bibliography: leaves 54-56. en_US
dc.identifier.doi https://doi.org/10.26756/th.2015.27 en_US
dc.publisher.institution Lebanese American University en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search LAUR

Advanced Search


My Account