+ - 0:00:00
Notes for current slide
Notes for next slide

Text Analysis:
Sentiment, Workflow, Data Viz, and more

Quang Nguyen

qntkhvn
@qntkhvn
qntkhvn.netlify.app

1 / 22

What is sentiment analysis?

  • Sentiment: feeling/opinion

  • Sentiment analysis: the use of NLP/ML/statistics to analyze the emotional tone (positive, negative, neutral) of a given text unit

  • Other names:

    • opinion mining

    • sentiment classification

    • emotion AI

2 / 22

Examples

  • Books (novels)

  • Media content: newspaper articles, song lyrics, movie/tv show transcripts,...

  • Social media posts/comments (especially Twitter)

  • Customer reviews (hotels, restaurants,...)

3 / 22

History

  • SA is a relatively young area of research

  • Public opinion analysis during WWII

  • Association for Computational Linguistics (ACL), founded in 1962

  • Hatzivassiloglou and McKeown (1997): "Predicting the Semantic Orientation of Adjectives"

  • Turney (2002): "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews"

  • Pang et. al. (2002): "Thumbs up? Sentiment Classification using Machine Learning Techniques"

  • Turney and Litmann (2003): "Measuring Praise and Criticism: Inference of Semantic Orientation from Association"

4 / 22

History (cont'd)

  • SA is also a fast-growing field

  • According to Mantyla et. al. (2017):

    • About 7,000 papers have been published

    • 99% of the papers have come after 2004

5 / 22

Case Study: Airbnb Reviews - Cambridge, MA

A glimpse of the dataset

listing_id id date reviewer_id reviewer_name comments
15523233 314412197 2018-08-26 28104838 Richard Mike’s place was a great location for me and my family to explore Boston. A short (10 mins) walk from The bustling Harvard Square, a short subway ride from there to centre of Boston.
1225831 8895725 2013-11-25 403089 Bruce Paul and Larry are very considerate hosts, and their place is comfortable and well-equipped. I was attending a conference at the Hynes in Boston, and the #1 bus was close by and very convenient. Free parking at their house certainly beats $39/day at the Prudential Center! I was also able to prepare some of my own meals for further cost savings. Most of all, it's really pleasant to experience the city in a residential neighborhood. I highly recommend this "comfy 1BR suite for business or pleasure!
4932029 314313623 2018-08-26 116449651 Alec Nora’s apartment is extraordinary. Beautifully put together, with all the necessary amenities and clear recommendations for local activities.
16013859 537387338 2019-09-28 6376355 Hezzy Nice for a night. I wish shared bath accommodations in Cambridge were just cheaper though!
23769190 607229788 2020-02-17 66340221 Vishal Great place!

(Data as of July 18, 2021. Source: http://insideairbnb.com/get-the-data.html)

6 / 22

Workflow

(Source: Silge, Julia, and David Robinson. 2021. Text Mining with R: A Tidy Approach. https://www.tidytextmining.com)

7 / 22

Data Prep

A sample row from the original data

listing_id id date reviewer_id reviewer_name comments
326170 8376671 2013-10-28 438982 Sergio Our short stay was perfect. The house is extremely close to Harvard buildings like the Museum of Natural History and the Science Center. Completely independent from the rest of the house, the rented facilities are big enough to stay and even work. A permit for parking in the street is supplied by Bernd. It will be our next rental in Cambridge for sure.
8 / 22

Split comments into tokens, and convert the table into "one-token-per-row".

listing_id id date reviewer_id reviewer_name word
326170 8376671 2013-10-28 438982 Sergio our
326170 8376671 2013-10-28 438982 Sergio short
326170 8376671 2013-10-28 438982 Sergio stay
326170 8376671 2013-10-28 438982 Sergio was
326170 8376671 2013-10-28 438982 Sergio perfect
326170 8376671 2013-10-28 438982 Sergio the
326170 8376671 2013-10-28 438982 Sergio house
326170 8376671 2013-10-28 438982 Sergio is
326170 8376671 2013-10-28 438982 Sergio extremely
326170 8376671 2013-10-28 438982 Sergio close
326170 8376671 2013-10-28 438982 Sergio to
326170 8376671 2013-10-28 438982 Sergio harvard
326170 8376671 2013-10-28 438982 Sergio buildings
326170 8376671 2013-10-28 438982 Sergio like
9 / 22

Remove stop words.

listing_id id date reviewer_id reviewer_name word
326170 8376671 2013-10-28 438982 Sergio short
326170 8376671 2013-10-28 438982 Sergio stay
326170 8376671 2013-10-28 438982 Sergio perfect
326170 8376671 2013-10-28 438982 Sergio house
326170 8376671 2013-10-28 438982 Sergio extremely
326170 8376671 2013-10-28 438982 Sergio close
326170 8376671 2013-10-28 438982 Sergio harvard
326170 8376671 2013-10-28 438982 Sergio buildings
326170 8376671 2013-10-28 438982 Sergio museum
326170 8376671 2013-10-28 438982 Sergio natural
326170 8376671 2013-10-28 438982 Sergio history
326170 8376671 2013-10-28 438982 Sergio science
326170 8376671 2013-10-28 438982 Sergio center
326170 8376671 2013-10-28 438982 Sergio completely
10 / 22

EDA

  • Total reviews over the years

  • Word frequency

11 / 22

12 / 22

13 / 22

Sentiment analysis

  • Sentiment analysis dictionaries ("lexicons"):

    • afinn: scores between -5 (most negative) and 5 (most positive)

    • bing: "positive", "negative"

    • nrc: "positive", "negative", "anger", "anticipation", "disgust", "fear", "joy", "sadness", "surprise", "trust"

    • loughran: "positive", "negative", "litigious", "uncertainty", "constraining", "superfluous" (for accounting & finance)

14 / 22
  • Get sentiment for each word, using Bing sentiment lexicon
listing_id id date reviewer_id reviewer_name word sentiment
16361055 668120483 2020-09-26 39827094 Sarah amazing positive
12489082 419683325 2019-03-04 27867701 Tessa comfortable positive
19183167 244354550 2018-03-18 128948662 Jaden nice positive
32055447 431208879 2019-03-31 228816868 Angel popular positive
13156403 702729118 2020-10-22 27922051 Rosa excellent positive
14251369 444227216 2019-04-27 85650589 Thomas noisy negative
12220649 84744254 2016-07-08 26337003 Kyra exceptional positive
3474824 23136923 2014-11-23 23910258 Lisa recommend positive
24442402 444175927 2019-04-27 80597464 Amanda spacious positive
4534572 339384675 2018-10-21 211714480 Colleen perfect positive
12220649 439704107 2019-04-19 68690106 Philippe recommend positive
19346436 210273202 2017-11-08 8059524 Sara clean positive
16146887 489832515 2019-07-17 69996899 Carlo weakness negative
21460557 289980046 2018-07-13 28306072 Caitlin love positive
15 / 22

16 / 22

WORDCLOUD

17 / 22

Sentiment vs. date/time

  • Obtain sentiment score from afinn lexicon

  • Plot average score vs. month

18 / 22
  • Recall: sa{5,4,3,2,1, 0, 1, 2, 3, 4, 5}
listing_id id date reviewer_id reviewer_name word value
28988139 457193152 2019-05-23 8214737 Tegan Joseph clean 2
6662157 508249692 2019-08-12 1140295 Amia comfortable 2
9729190 547586551 2019-10-15 78615078 Allison nice 3
14536322 557274834 2019-10-31 31998820 Mayra amazing 4
16842241 363357163 2018-12-28 232275710 Lorenzo clean 2
33242374 465848742 2019-06-08 215913636 Ralene noisy -1
6106691 111235376 2016-10-30 76811411 Louis easy 1
1893287 267715073 2018-05-22 46717530 Judy responsive 2
14974786 571008805 2019-11-30 3686354 Hathaway perfect 3
33072838 493446940 2019-07-22 12719744 Thomas easy 1
715532 5342650 2013-06-25 5916992 Tziporah regretted -2
18676360 302468180 2018-08-05 143853797 Bingnan helpful 2
20797694 464018797 2019-06-04 12177218 John helpful 2
4109594 532826615 2019-09-20 113133591 JianHua happy 3
19 / 22

20 / 22

Other topics to explore

  • Expanding beyond one single word

    • Sentence sentiment

    • Relationships between words: n-grams, correlations, network analysis

  • Building a predictive model

    • Predict ratings from customer reviews
  • Comparing sentiments of different cities

21 / 22

Cheers!

Slides/data/code are available on my site and GitHub.


    qntkhvn

    @qntkhvn

    qntkhvn.netlify.app


Slides created via the R package xaringan.

22 / 22

What is sentiment analysis?

  • Sentiment: feeling/opinion

  • Sentiment analysis: the use of NLP/ML/statistics to analyze the emotional tone (positive, negative, neutral) of a given text unit

  • Other names:

    • opinion mining

    • sentiment classification

    • emotion AI

2 / 22
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow