Analytics Magazine Analytics Magazine, July/August 2014 | Page 38

REAL - T IME T E X T A NA LY T I C S for each feature in the sentence is then calculated by summing up the featureopinion scores for that sentence. (Each feature-opinion score is obtained from the sentiment polarity of the opinion word and a multiplicative inverse of the distance between the feature and opinion word. Opinion words at a distance from the feature are assumed to be less associated to the feature compared to the nearer words.) For example, the phone is useful and a great work of art. Let the feature here be phone and opinion words be “useful,” “great.” Semantic orientation of useful = 1 Semantic orientation of great = 1 Distance between the words useful and phone = 2 Distance between the words great and phone = 5 score(f)=1/2+1/5= 0.7 Aggregating opinions for tweets: The sentiment score for a tweet is the summation of the scores for all opinion words present in the tweet. For example, “The phone is useful and a great work of art.” The opinion words in the sentence are “useful,” “great” Semantic orientation of useful = 1 Semantic orientation of great = 1 score(t) = 1 +1= 2 38 | A N A LY T I C S - M A G A Z I N E . O R G Negation-rule: This identifies the negation word (which can be 1 or 2 places before the opinion word) and reverses the opinion expressed in a sentence. For example, “The phone is not good.” Here phone gets negative orientation. Context-dependent rules: The features for which we find no opinion words, context dependent constructs are used to identify the orientation score. For example, “The phone is good but battery-life is short.” The only opinion word in the sentence is “good” (“short” is a context-dependent word). Phone gets positive orientation because of “good.” Battery-life gets negative orientation because of the word “but” being present between good and battery-life. Topic Evolution. The next step to topic modeling is to understand how topics and trends develop, evolve and go viral over time. The algorithm maintains a fixed number of topic streams and their statistics. Each tweet is processed as it comes in and is assigned to the “closest” topic stream (the topic stream most similar to it). If no topic stream is close enough, then a new stream is created and a stale stream is killed to maintain a fixed number W W W. I N F O R M S . O R G