Pioneer studies on opinion mining and sentiment analysis

Opinion mining and sentiment analysis
 
   Opinion mining and sentiment analysis, a practical application of natural language processing (NLP), has attracted attention for more than a decade because a large amount of opinionated information is generated and disseminated across various platforms on the internet. On the one hand, people are willing to share their feelings and thoughts. On the other hand, people often refer to majority opinions before making decisions. The Natural Language Processing Laboratory supervised by Professor Hsin-Hsi Chen investigates this topic in several respects, such as opinion retrieval, extraction, summarization, tracking and question answering [1]. Figure 1 shows simultaneous opinion tracking for four persons in a presidential election in Taiwan in 2000 [2]. Figure 2 plots the trends of the semiconductor industry over time based on opinions mined from the web [3].
 
 
Figure 1. Opinions about four persons in a presidential election.
 
 
 
Figure 2. Opinions of industry trends.
 
Fundamental technology breakthroughs
 
   Opinionated information may be expressed at different granularities, from the level of words, clauses, sentences, and single documents to that of multiple documents. Dictionary is a fundamental lexical resource for NLP applications. We released the National Taiwan University Sentiment Dictionary (NTUSD), the first Chinese dictionary for sentiment analysis [2]. We also addressed the effects of domain-dependent terms in sentiment analysis [4]. A highly ranked term in a specific domain such as “鼎泰豐” (Din Tai Fung) becomes a positive term in that domain. We further extend NTUSD to NTUSD-Fin [5] to cover market sentiment for financial social media data applications.
 
   In the lexicon-based approach, the polarity of a sentence is determined by the sentiments of the words it contains. In the sentence “到地鐵出入口僅十米,地段好” (Only ten meters to the subway entrance, good location), “” (good) is a positive word modifying “地段” (location). The sentence reflects a positive review. However, people do not always use overt words to express their opinions. In the case of implicit opinion, no overt opinion words can be relied on to determine polarity. The sentences “附近有很多餐廳” (There are many restaurants nearby) and “房間裡有很多螞蟻” (There are many ants in the room) contain implicitly positive and negative opinions, respectively.
 
   Even overt opinion words appear, and the actual meaning may not be determined by the surface forms of the words. In the sentence “點餐都要等半小時,服務還真是好阿” (I have to wait for half an hour to order. The service is definitely really good), “” (good) is a positive word modifying “服務” (service). However, the interpretation of this sentence is negative. The negative context specifies that someone must wait for a long time to start ordering at a restaurant. In this example, an ironic expression implies the opposite of the literal meaning, causing problems in opinion mining and sentiment analysis. We also find some examples that have pragmatically opposite arguments. This problem is challenging for the lexicon-based sentiment tagger. For instance, both arguments in the sentence “他很年輕,但已經是世界上最棒的足球運動員之一” (He is young, but he has been one of best soccer players in the world) are semantically positive. However, the adjective “年輕” (young), which is defined as positive in NTUSD, may imply a lack of experience or skill when applied to modify an athlete.
 
   In a series of pioneer studies on opinion mining and sentiment analysis, we address basic opinion mining technologies and potential applications [2][3], an analysis of Chinese discourse markers for opinion analysis [6], Chinese irony corpus construction and ironic structure analysis [7], implicit opinion analysis [8], implicit polarity and implicit aspect recognition [9], and identification of false-alarm hashtag usage via a learning sentiment analyzer [10]. Moreover, we organize a multilingual opinion analysis task in NTCIR, which is an international evaluation of information access technologies [11]. We define problems, construct evaluation datasets, and lead the researchers along this research direction.
 
Fake opinions detection
 
   Opinions are useful, but fake opinions may mislead people. To affect customers’ buying decisions, fake opinions are generated to promote special targets and/or denounce their competitors. Determining how to filter out untruthful information becomes an important issue in opinion mining. We explore the characteristics of opinion spams and spammers in a web forum to obtain insights and present features that could be potentially helpful in detecting spam opinions in threads [12]. In addition, we explore the detection of opinion spammers [13]. The methodology is extended to analyze the behavior of cyber armies on web forums for election campaigns [14] and paid reviews and paid writer identification [15].
 
   False statements and exaggerated content can be observed in online advertisements. These statements can also be regarded as opinion spams. Most inappropriate food-related advertisements contain overstated health claims. Medical effects and curative claims may also appear in cosmetic advertising. Authorities, advertisers, websites and consumers must all quickly and automatically detect illegal advertising. We propose methods for detecting false online advertisements [16][17].
 
References
1. Ku, L.W., Liang, Y. T., & Chen, H. H. (2006). Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs (pp.100-107). Stanford University, California, USA.
2. Ku, L. W., & Chen, H. H. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850.
3. Ku, L. W., Ho, H. W., & Chen, H. H. (2009). Opinion mining and relationship discovery using CopeOpi opinion analysis system. Journal of the American Society for Information Science and Technology, 60(7), 1486–1503.
4. Yu, H. C., Huang, T. H., & Chen, H. H. (2012). Domain dependent word polarity analysis for sentiment classification. International Journal of Computational Linguistics and Chinese Language Processing, 17(4), 33-48.
5. Chen, C. C., Huang, H. H., & Chen, H. H. (2018). NTUSD-Fin: a market sentiment dictionary for financial social media data applications. In Proceedings of the First Financial Narrative Processing Workshop. Miyazaki, Japan.
6. Huang, H. H., Yu, C. H., Chang, T. W., Lin, C. K., & Chen, H. H. (2014). Web-based analysis of Chinese discourse markers for opinion mining. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (pp. 100-107). Warsaw, Poland.
7. Tang, Y. J., & Chen, H. H. (2014). Chinese irony corpus construction and ironic structure analysis. In Proceedings of the 25th International Conference on Computational Linguistics (pp. 1269-1278). Dublin, Ireland.
8. Huang, H. H., Wang, J. J., & Chen, H. H. (2017). Implicit opinion analysis: extraction and polarity labelling. Journal of the Association for Information Science and Technology, 68(9), 2076-2087.
9. Chen, H. Y., & Chen, H. H. (2016). Implicit polarity and implicit aspect recognition in opinion mining. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 20-25). Berlin, Germany.
10. Huang, H. H., Chen, C. C., & Chen, H. H. (2018). Disambiguating false-alarm hashtag usages in tweets for irony detection. In Proceedings of 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia.
11. Yohei, S., Evan, D. K., Ku, L. W., Sun, L., Chen, H. H., & Noriko, K. (2008). Overview of multilingual opinion analysis task at NTCIR-7. In Proceedings of the 7th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access (pp. 185-203). Tokyo, Japan.
12. Chen, Y. R., & Chen, H. H. (2015). Opinion spam detection in Web forum: A real case study. In Proceedings of 24th International World Wide Web Conference (pp. 173-183). Florence, Italy.
13. Chen, Y. R., & Chen, H. H. (2015). Opinion spammer detection in Web forum. In Proceedings of the 38th Annual ACM SIGIR Conference (pp. 759-762). Santiago, Chile.
14. Ko, M. C., & Chen, H. H. (2015). Analysis of cyber army’s behaviours on Web forum for elect campaign. In Proceedings of the Asia Information Retrieval Societies Conference (pp. 394–399). Brisbane, Australia.
15. Ko, M. C., Huang, H. H., & Chen, H. H. (2017). Paid review and paid writer detection. In Proceedings of 2017 IEEE/WIC/ACM International Conference on Web Intelligence (pp. 637-645). Leipzig, Germany.
16. Tang, Y. J., Lin, C. K., & Chen, H. H. (2012). Advertising legality recognition. In Proceedings of the 24th International Conference on Computational Linguistics (pp. 1219-1228). Mumbai, India.
17. Huang, H. H., Wen, Y. W., & Chen, H. H. (2017). Detection of false online advertisements with DCNN. In Proceedings of 26th International World Wide Web Conference (pp. 795-796). Perth, Australia.
 
Hsin-Hsi Chen
Professor, Department of Computer Science and Information Engineering

LANDSCAPE

Keywords