Airbnb sentiment text analysis
*Group Project*
Goal
The goal of Airbnb’s aspiring hosts is to use the Airbnb website to attract guests willing to pay the highest rates for brief stays in their homes. Airbnb’s goal is to increase profits by improving customer review performance.
implementation
Text mining, a technique that allows businesses to scour websites, decipher the meaning of groups of words, and assign the words a sentiment proxy through the use of a software package.
Opportunities
The goal of Airbnb’s marketing team is to improve its users’ performance so it can reap the benefits of ongoing host and renter fees.Airbnb is a perfect test case for the practicability and value of text analytics
Text analysis - all opinions
Sentiment text analysis based off of Airbnb customers' reviews for the city of Miami, FL, was performed using the dictionary-based sentiment package in R, as well as other known packages such as ggplot to present our outcomes.
The second graph displays customers' overall sentiment (positive or negative) in their reviews.
The first graph shows customers' overall emotions based on key words used in their reviews.
h.sent<-data.frame(t(corp_sent)) h.sent_1 <- data.frame(rowSums(h.sent[1:135])) names(h.sent_1)[1] <- "Quantity" h.sent_1 <- cbind("Sentiments" = rownames(h.sent_1), h.sent_1) rownames(h.sent_1) <- NULL h.sent_2<-h.sent_1[9:10,] quickplot(Sentiments, data=h.sent_2, weight=Quantity, geom="bar", fill=Sentiments, ylab="Count")+ggtitle("Sentiment Graph - All opinions") h.sent<-data.frame(t(corp_sent)) h.sent_1 <- data.frame(rowSums((h.sent[1:135]))) names(h.sent_1)[1] <- "Quantity" h.sent_1 <- cbind("Emotions" = rownames(h.sent_1), h.sent_1) rownames(h.sent_1) <- NULL h.sent_2<-h.sent_1[1:8,] quickplot(Emotions, data=h.sent_2, weight=Quantity, geom="bar", fill=Emotions, ylab="Count")+ggtitle("Emotions Graph - All opinions")
Text analysis - least popular opinion
Sentiment text analysis based off of Airbnb customers' reviews for the city of Miami, FL, was performed using the dictionary-based sentiment package in R, as well as other known packages such as ggplot to present our outcomes.
The first graph shows the customer's least popular opinion and their emotions based on key words used in the review.
The second graph displays the customer's least popular opinion and the sentiment (positive or negative) in the review.
h.sent_leastpop<-data.frame(t(corp_sent)) h.sent_leastpop <- data.frame((h.sent_leastpop[13])) names(h.sent_leastpop)[1] <- "Quantity" h.sent_leastpop <- cbind("Sentiments" = rownames(h.sent_leastpop), h.sent_leastpop) rownames(h.sent_leastpop) <- NULL h.sent_2leastpop<-h.sent_leastpop[9:10,] quickplot(Sentiments, data=h.sent_2leastpop, weight=Quantity, geom="bar", fill=Sentiments, ylab="Count")+ggtitle("Sentiment Graph - Least popular opinion") h.sent_leastpop1 <- data.frame(rowSums((h.sent_leastpop1[13]))) names(h.sent_leastpop1)[1] <- "Quantity" h.sent_leastpop1 <- cbind("Emotions" = rownames(h.sent_leastpop1), h.sent_leastpop1) rownames(h.sent_leastpop1) <- NULL h.sent_2leastpop1<-h.sent_leastpop1[1:8,] quickplot(Emotions, data=h.sent_2leastpop1, weight=Quantity, geom="bar", fill=Emotions, ylab="Count")+ggtitle("Emotions Graph - Least popular opinion")
Regression models in r
Before computing regression models and determine the drivers of property revenue, we cleaned the data and started by omitting all NA values to make sure they wouldn’t influence our final results.
We knew that Airbnb would charge its host a fee ranging between 6% and 12%, and the higher the price of the property was, the lower this fee was. We also knew that hosts paid a 3% service fee. We added two additional variables to both the Paris and Miami dataset, one that would represent the percentage reservation fee, and another to calculate revenue considering the given fees as well.
miami<-na.omit(miami) paris<-na.omit(paris) miami$perc_resv_fee<-(0.5+(((miami$price-min(miami$price))*(1-0.5))/(max(miami$price)-min(miami$price)))) miami$revenue<-((0.03+(0.12*miami$perc_resv_fee))*miami$price) paris$perc_resv_fee<-(0.5+(((paris$price-min(paris$price))*(1-0.5))/(max(paris$price)-min(paris$price)))) paris$revenue<-((0.03+(0.12*paris$perc_resv_fee))*paris$price)
Miami - regression model
To obtain the regression model for Miami, we performed stepwise regression in R, which starts off with no variables and slowly adds or removes those variables that are not significant in determining revenue. We end up with our final model which shows that price and reviews related to a property are the most significant variables in predicting Airbnb revenue, followed by whether the property has a security deposit and clean fee, the rating, and number of bathrooms.
model_miami<-lm(formula = revenue ~ +price+reviews+rating+accommodates+extpeop+savwish+min_stay+sentiment+secdep+cleanfee+weekfee+monthfee+bedroom+bathroom+beds, data = miami) summary(model_miami) stepmodel_miami<-stepAIC(model_miami, direction = "both", trace = TRUE) #both: stepwise, trace=1 muestra el proceso summary(stepmodel_miami)
Paris - regression model
To find the regression equation for Paris we performed stepwise regression again, just as in the case of Miami, and found that price and number of bathrooms per property were the most significant variables in predicting revenue, followed by the number of people the property can accommodate, fees for extra people, whether there is a discount for extra weeks you stay in the property or not, the minimum stay, the sentiment obtained from reviews, and the discounts for monthly stay.
model_paris<-lm(formula = revenue ~ +price+reviews+rating+accommodates+extpeop+savwish+min_stay+sentiment+secdep+cleanfee+weekfee+monthfee+bedroom+bathroom+beds, data = paris) summary(model_paris) stepmodel_paris<-stepAIC(model_paris, direction = "both", trace = TRUE) #both: stepwise, trace=1 muestra el proceso summary(stepmodel_paris)