常用到的語料庫 IMDB Polarity Data 2.0,
目的: polarity detection: is this review positive or negative?
步驟:
Tokenization
Feature Extraction
Classification using different classifiers
1.1Sentiment Tokenization
除了正常 tokenization 要注意的問題如處理 HTML/XML markup 外,情感分析還可能需要處理
有用的 Tokenizer 代碼
關於特徵提取,兩個重要的問題,一是怎麼來處理否定詞(negation),二是選什麼詞作為特徵。
I didn't like this movie
I really like this movie
didn't like this movie, but I
=>
didn't NOT_like NOT_this NOT_movie but I
具體見:
Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA).
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
一般兩種方案,一是僅僅使用形容詞(adjectives),而是使用所有的單詞(all words),通常而言,使用所有的詞的效果會更好些,因為動詞(verbs)、名詞(nouns)會提供更多有用的信息。
作為 Baseline model,這裡會使用 Naive Bayes,沒啥懸念,計算如下
Prior: how likely we see a positive movie review
Likelihood Function: for every review, how likely every word is expressed by a positive movie review
採用 Laplace/Add-one Smoothing
一個變種或者改進版是Binarized(Boolean feature) Multinomial Naive Bayes,它基於這樣一個直覺,對情感分析而言,單詞是否出現(word occurrence)這個特徵比單詞出現了幾次(word frequency)更為重要,舉個例子,出現一次 fantastic 提供了 positive 的信息,而出現 5 次 fantastic 並沒有給我們提供更多信息。boolean multinomial Naive Bayes 就是把所有大於 1 的 word counts 壓縮為 1。
算法如下
也有研究認為取中間值 log(freq(w)) 效果更好一些,相關論文如下:
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up? Sen+ment Classification using Machine Learning Techniques. EMNLP-‐2002, 79—86.
V. Metsis, I. Androutsopoulos, G. Paliouras. 2006. Spam Filtering with Naive Bayes – Which Naive Bayes? CEAS 2006 -‐ Third Conference on Email and Anti‐Spam.
K.-‐M. Schneider. 2004. On word frequency informa+on and negative evidence in Naive Bayes text classifica+on. ICANLP, 474-‐485.
JD Rennie, L Shih, J Teevan. 2003. Tackling the poor assumptions of naive bayes text classifiers. ICML 2003
當然在實踐中,MaxEnt 和 SVM 的效果要比 Naive Bayes 好的多。
1.4Problems
有些句子裡並不包含情感詞(sentiment word),如下面一句是 negative 的態度,然而並不能通過情感詞來得出
「If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.」
還有一個問題是排序問題(Order effect),儘管前面堆砌了很多情感詞,但最後來個全盤否定,顯然 Naive Bayes 沒法處理這種問題
例如
看一下目前已經有的 Lexicons,
看下各個詞庫的 disagreements between polarity lexicons
那麼怎麼來分析 IMDB 裡每個單詞的 polarity 呢?
How likely is each word to appear in each sentiment class?
likelihood:
Make them comparable between words - Scaled likelihood:
更多見 Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-‐659.
Learning Sentiment Lexicons除了目前已有的 lexicon,我們還可以根據自己的語料庫來訓練自己的 sentiment lexicon。
2.1Semi-supervised learning of lexicons
基於少量的有標註的數據+人工建立的規則,採用 bootstrap 方法來學習 lexicon
Hatzivassiloglou and McKeown intuition for identifying word polarity
論文: Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the Semantic Orientation of Adjectives. ACL, 174–181
基於這樣的假設:
論文方法:
1. 對 1336 個形容詞形成的種子集合進行標註,657 個 positive,679 個 negative
2. 通過 google 搜索來查詢 conjoined 形容詞,eg. 「was nice and」
3. Supervised classifier 通過 count(AND), count(BUT) 來給每個詞對(word pair)計算 polarity similarity
4. 將 graph 分區
但是這種方法難以處理短語
論文: Turney (2002):Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
步驟:
1. 從評論中抽取形容詞短語(two-word phrase)
2. 學習短語的 polarity
如何衡量短語的 polarity 呢?
基於下面的假設
Pointwise mutual information: how much more do events x and y co-occur than if they were independent
同樣通過搜尋引擎(Altavista)查詢得到概率
P(word) = hits(word)/N
3. Rate a review by the average polarity of its phrases
一般來說 baseline 的準確率是 59%, Turney algorithm 可以提高到 74%。
Using WordNet to learn polarity
論文:
S.M. Kim and E. Hovy. 2004. Determining the sentiment of opinions. COLING 2004
M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD, 2004
步驟:
有一小部分 positive/negative seed-words
從 WordNet 中找到 seed-words 的同義詞(synonyms)和反義詞(antonyms)
Positive Set: positive words 的同義詞 + negative words 的反義詞
Negative Set: negative words 的同義詞 + positive words 的反義詞
重複 2 直到達到終止條件
過濾不合適的詞
採用半監督方法來引入 lexicons,好處是:
Intuition:
starts with a seed set of words(good,poor)
find other words that have similar polarity:
• Using 「and」 and 「but」
• Using words that occur nearby in the same document
• Using WordNet synonyms and antonyms
4.1Finding aspects/attributes/target
論文:
M. Hu and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of KDD.
S. Blair-‐Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. Reis, and J. Reynar. 2008. Building a Sen+ment Summarizer for Local Service Reviews. WWW Workshop.
很多時候,一條評論並不能簡單的被歸為 positive/negative,它可能討論了多個維度,既有肯定又有否定,如下面這個句子
The food was great but the service was awful!
這條評論就是對食物(food)持肯定態度(positive),對服務(service)持否定態度(negative),在這種情況下,我們不能簡單的對這條評論進行 positive/negative 的分類,而要對其在 food,service 這兩個維度上的態度進行分類。food,service 這些維度,或者說 attributes/aspects/target 從哪裡來? 有兩種方法,一種是從文本中抽取常用短語+規則來作為 attributes/aspects,另一種是預先定義好 attributes/aspects。
首先找到產品評論裡的高頻短語,然後按規則進行過濾,可用的規則如找緊跟在 sentiment word 後面的短語,」…great fish tacos」 表示 fish tacos 是一個可能的 aspect。
Supervised classification
對一些領域如 restaurants/hotels 來說,aspects 比較規範,所以事實上可以人工給一些產品評論標註 aspect(aspects 如 food, décor, service, value, NONE),然後再給每個句子/短語分類看它屬於哪個 aspect。
具體步驟:
從評論中抽取句子/短語
對句子/短語進行情感分類
得到句子/短語的 aspects
匯總得到 summary
值得注意的是,baseline method 的假設是所有類別出現的概率是相同的。如果類別不平衡(在現實中往往如此),我們不能用 accuracy 來評估,而是需要用 F-scores。而類別不平衡的現象越嚴重,分類器的表現可能就越差。
有兩個辦法來解決這個問題
Resampling in training
就是說如果 pos 有10^6 條數據,neg 有 10^4 的數據,那麼我們都從 10^4 的數據中來劃分訓練數據
Cost-sensitive learning
對較少出現的那個類別的 misclassification 加大懲罰(penalize SVM more for misclassification of the rare thing)
4.2How to deal with 7 stars
論文: Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL, 115–124
怎樣來處理評分型的評論?
Map to binary
壓縮到 positive/negative。比如說大於 3.5 的作為 negative,其他作為 positive
Use linear or ordinal regression
or specialized models like metric labeling
4.3Summary on Sentiment
通常被建立分類/回歸模型來預測 binary/ordinal 類別
關於特徵提取:
對其他任務也可以用相似手段
Emotion:
• Detecting annoyed callers to dialogue system
• Detecting confused/frustrated versus confident students
Mood:
• Finding traumatized or depressed writers
Interpersonal stances:
• Detection of flirtation or friendliness in conversations
Personality traits:
• Detection of extroverts
E.g., Detection of Friendliness
Friendly speakers use collaborative conversational style
- Laughter
- Less use of negative emotional words
- More sympathy
That’s too bad I’m sorry to hear that!
- More agreement
I think so too!
- Less hedges
kind of sort of a little …
推薦閱讀
基礎 | TreeLSTM Sentiment Classification
基礎 | 詳解依存樹的來龍去脈及用法
基礎 | 基於注意力機制的seq2seq網絡
原創 | Simple Recurrent Unit For Sentence Classification
原創 | Highway Networks For Sentence Classification
歡迎關注交流