Sentiment: feeling/opinion
Sentiment analysis: the use of NLP/ML/statistics to analyze the emotional tone (positive, negative, neutral) of a given text unit
Other names:
opinion mining
sentiment classification
emotion AI
Books (novels)
Media content: newspaper articles, song lyrics, movie/tv show transcripts,...
Social media posts/comments (especially Twitter)
Customer reviews (hotels, restaurants,...)
SA is a relatively young area of research
Public opinion analysis during WWII
Association for Computational Linguistics (ACL), founded in 1962
Hatzivassiloglou and McKeown (1997): "Predicting the Semantic Orientation of Adjectives"
Turney (2002): "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews"
Pang et. al. (2002): "Thumbs up? Sentiment Classification using Machine Learning Techniques"
Turney and Litmann (2003): "Measuring Praise and Criticism: Inference of Semantic Orientation from Association"
SA is also a fast-growing field
According to Mantyla et. al. (2017):
About 7,000 papers have been published
99% of the papers have come after 2004
A glimpse of the dataset
listing_id | id | date | reviewer_id | reviewer_name | comments |
---|---|---|---|---|---|
15523233 | 314412197 | 2018-08-26 | 28104838 | Richard | Mike’s place was a great location for me and my family to explore Boston. A short (10 mins) walk from The bustling Harvard Square, a short subway ride from there to centre of Boston. |
1225831 | 8895725 | 2013-11-25 | 403089 | Bruce | Paul and Larry are very considerate hosts, and their place is comfortable and well-equipped. I was attending a conference at the Hynes in Boston, and the #1 bus was close by and very convenient. Free parking at their house certainly beats $39/day at the Prudential Center! I was also able to prepare some of my own meals for further cost savings. Most of all, it's really pleasant to experience the city in a residential neighborhood. I highly recommend this "comfy 1BR suite for business or pleasure! |
4932029 | 314313623 | 2018-08-26 | 116449651 | Alec | Nora’s apartment is extraordinary. Beautifully put together, with all the necessary amenities and clear recommendations for local activities. |
16013859 | 537387338 | 2019-09-28 | 6376355 | Hezzy | Nice for a night. I wish shared bath accommodations in Cambridge were just cheaper though! |
23769190 | 607229788 | 2020-02-17 | 66340221 | Vishal | Great place! |
(Data as of July 18, 2021. Source: http://insideairbnb.com/get-the-data.html)
(Source: Silge, Julia, and David Robinson. 2021. Text Mining with R: A Tidy Approach. https://www.tidytextmining.com)
A sample row from the original data
listing_id | id | date | reviewer_id | reviewer_name | comments |
---|---|---|---|---|---|
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | Our short stay was perfect. The house is extremely close to Harvard buildings like the Museum of Natural History and the Science Center. Completely independent from the rest of the house, the rented facilities are big enough to stay and even work. A permit for parking in the street is supplied by Bernd. It will be our next rental in Cambridge for sure. |
Split comments
into tokens, and convert the table into "one-token-per-row".
listing_id | id | date | reviewer_id | reviewer_name | word |
---|---|---|---|---|---|
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | our |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | short |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | stay |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | was |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | perfect |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | the |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | house |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | is |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | extremely |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | close |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | to |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | harvard |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | buildings |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | like |
Remove stop words.
listing_id | id | date | reviewer_id | reviewer_name | word |
---|---|---|---|---|---|
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | short |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | stay |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | perfect |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | house |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | extremely |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | close |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | harvard |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | buildings |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | museum |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | natural |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | history |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | science |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | center |
326170 | 8376671 | 2013-10-28 | 438982 | Sergio | completely |
Total reviews over the years
Word frequency
Sentiment analysis dictionaries ("lexicons"):
afinn
: scores between -5 (most negative) and 5 (most positive)
bing
: "positive", "negative"
nrc
: "positive", "negative", "anger", "anticipation", "disgust", "fear", "joy", "sadness", "surprise", "trust"
loughran
: "positive", "negative", "litigious", "uncertainty", "constraining", "superfluous" (for accounting & finance)
listing_id | id | date | reviewer_id | reviewer_name | word | sentiment |
---|---|---|---|---|---|---|
16361055 | 668120483 | 2020-09-26 | 39827094 | Sarah | amazing | positive |
12489082 | 419683325 | 2019-03-04 | 27867701 | Tessa | comfortable | positive |
19183167 | 244354550 | 2018-03-18 | 128948662 | Jaden | nice | positive |
32055447 | 431208879 | 2019-03-31 | 228816868 | Angel | popular | positive |
13156403 | 702729118 | 2020-10-22 | 27922051 | Rosa | excellent | positive |
14251369 | 444227216 | 2019-04-27 | 85650589 | Thomas | noisy | negative |
12220649 | 84744254 | 2016-07-08 | 26337003 | Kyra | exceptional | positive |
3474824 | 23136923 | 2014-11-23 | 23910258 | Lisa | recommend | positive |
24442402 | 444175927 | 2019-04-27 | 80597464 | Amanda | spacious | positive |
4534572 | 339384675 | 2018-10-21 | 211714480 | Colleen | perfect | positive |
12220649 | 439704107 | 2019-04-19 | 68690106 | Philippe | recommend | positive |
19346436 | 210273202 | 2017-11-08 | 8059524 | Sara | clean | positive |
16146887 | 489832515 | 2019-07-17 | 69996899 | Carlo | weakness | negative |
21460557 | 289980046 | 2018-07-13 | 28306072 | Caitlin | love | positive |
WORDCLOUD
Obtain sentiment score from afinn
lexicon
Plot average score vs. month
listing_id | id | date | reviewer_id | reviewer_name | word | value |
---|---|---|---|---|---|---|
28988139 | 457193152 | 2019-05-23 | 8214737 | Tegan Joseph | clean | 2 |
6662157 | 508249692 | 2019-08-12 | 1140295 | Amia | comfortable | 2 |
9729190 | 547586551 | 2019-10-15 | 78615078 | Allison | nice | 3 |
14536322 | 557274834 | 2019-10-31 | 31998820 | Mayra | amazing | 4 |
16842241 | 363357163 | 2018-12-28 | 232275710 | Lorenzo | clean | 2 |
33242374 | 465848742 | 2019-06-08 | 215913636 | Ralene | noisy | -1 |
6106691 | 111235376 | 2016-10-30 | 76811411 | Louis | easy | 1 |
1893287 | 267715073 | 2018-05-22 | 46717530 | Judy | responsive | 2 |
14974786 | 571008805 | 2019-11-30 | 3686354 | Hathaway | perfect | 3 |
33072838 | 493446940 | 2019-07-22 | 12719744 | Thomas | easy | 1 |
715532 | 5342650 | 2013-06-25 | 5916992 | Tziporah | regretted | -2 |
18676360 | 302468180 | 2018-08-05 | 143853797 | Bingnan | helpful | 2 |
20797694 | 464018797 | 2019-06-04 | 12177218 | John | helpful | 2 |
4109594 | 532826615 | 2019-09-20 | 113133591 | JianHua | happy | 3 |
Expanding beyond one single word
Sentence sentiment
Relationships between words: n-grams, correlations, network analysis
Building a predictive model
Comparing sentiments of different cities
Slides/data/code are available on my site and GitHub.
xaringan
.Sentiment: feeling/opinion
Sentiment analysis: the use of NLP/ML/statistics to analyze the emotional tone (positive, negative, neutral) of a given text unit
Other names:
opinion mining
sentiment classification
emotion AI
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |