Airbnb Seattle Project: A data analysis case-study

Muhammad Khan
5 min readJan 17, 2021

--

This post is created for Project (1) of Udacity data science nanodegree program.

Introduction about Airbnb Seattle

Airbnb Seattle (Nice Place for living)

I am enrolled in the Udacity data scientist nanodegree course and as a part of the project 1 requirements, I have to analyses the Airbnb Seattle dataset (Calendar and listings files) for 2016.

Data mining process consist of following steps,

· Business Values of the project

· Data Consolidation

· Data Analysis of Calendar data

· Data Analysis of Listing Data

· Evaluate the Results

· Conclusion

The data set is sourced from Kaggle here and it looks at the Airbnb listings in Seattle in 2016.

Business Values of the project

Following are the questions I will answer doing my work:

1. What is the highest price of listing?

2. What is average monthly charged price?

3. How fast Host respond?

4. What are the popular neighborhoods area and Most expensive\cheapest neighborhoods in Seattle?

5. What are the effects of reviewer ratings?

6. Does certain facilities impact on price?

Data Consolidation

Data consolidation process consists of Data gathering, wrangling and get results from data. In the data wrangling process replace the missing values of price as 0 and also convert price column from string to floating points. Date data is converted from date to months. Data consolidation process is discussed in details of the Jupyter notebook code.

Data Analysis of Calendar data

Following tasks have been performed on calendar data set to get the business values,

· Find the number of rows and columns

· Find the Missing values

· Calendar Data statistic and datatypes

· Data Wrangling:

1. Clean and convert Price from String to Float

2. Convert dates to Months

The below graph are finding results of only Calendar datasets,

Figure 1: Monthly Earning Chart
Figure 2: Monthly Average price of listings

Data Analysis of Listings data

Following tasks have been performed on Listings data set to get the business values,

· Find the number of rows and columns and Missing Values

· Describe the variables in the dataset and look at the datatypes

· Data Wrangling: Clean/convert Price from String to Float

· Find all the categorical and quantitative variables

Evaluate the Results

In the results analysis process, following business understanding questions are addressed.

1. Price Values change Analysis

2. Host Response Analysis

3. Popular Neighborhood Analysis

4. Reviewer’s ratings affect the customer or price

1. Price Values change analysis

In this task, Calendar and listings data are analyzed and the below graphs represents the findings,

Figure 3: Top 10 listings that charged the highest price. Average price is $814.
Figure 4: Listings average charged price per month. Average monthly price is $92.

Charged Price Observation:

The highest price is recorded in June (Summer vacation time) and December (Christmas vacation time) which reflects that vacation time is busiest time that tend to rise the price. In Q1 2016, price charged was lower than the average for the whole year due to low demand time.

2. Host Response Analysis

The Host response time analysis to find how fast host response. It is observed majority hosts response within an hour. The below graph represents the hosts response time.

Figure 5: Host response time analysis.

Host Response Times Observation:

Approx. half of hosts respond within an hour and impact is discussed in last task.

3. Popular Neighborhood Analysis

In this task, popular neighborhood areas are identified on different basis. The below graphs represent the popular neighborhoods area analysis.

Figure 6: Top 10 popular neighborhoods areas in Seattle.
Figure 7: Top 10 expensive neighborhoods areas in Seattle.
Figure 8: Top 10 cheaper neighborhoods areas in Seattle.

Neighborhoods areas Observation:

20% of the properties are located in Broadway, Belltown and Wallingford.

Southeast Magnolia, Portage Bay and Westlake are expensive areas.

Rainier Beach, Olympic Hills and South Delridge are cheaper areas.

4. Reviewer’s ratings affect the customer or price

In this task, reviewer’s rating data is analyzed to find the correlation between different rating variables and effects on the customer and price.

Here I used the Confusion matrix plot which helped to get correlation and see the relationship of each variables with others.

In Matrix plot the correlation is color coded from blue to grey to red (from low to high correlation)

Diagonally dark red boxes from top to bottom shows the correlation of the variable with itself.

Figure 9: Reviewer’s Ratings Matrix Plot.

Reviewer’s ratings Observation:

Ratings: Cleanliness is very important for good rating because Matrix plot represents that Cleanliness has the highest correlation with over all other ratings. Good communication with the customer is highly correlated with the check in rating.

Ratings and price: Ratings have little impact on the price charged by the host. Location has a low positive correlation with price.

Figure 10: Facilities Ratings Matrix Plot.

Observation:

Facilities and Price: Beds, bedrooms and bathrooms have positive effect on price but minimum and maximum nights are not showing any impact on pricing.

Conclusion

Below are the final findings on prices charged, neighborhoods, ratings and facilities;

1. During data analysis of the Airbnb listings dataset, I was able to found that June and December months are the peak price months. This could help hosts to plan their listings availability accordingly to increase their earning. I am not confident to this is seasonality price trend because enough data is not available of next years.

2. I found from hosts data analysis that cleanliness and good communication also impact on overall ratings which could be the key areas to focus while offering.

3. To analyzing neighborhoods areas data that impact to maximum the revenue and initial bookings. I observed that Broadway the most popular and Southeast Magnolia was the most expensive locality.

The above finding helps, how host can create more opportunity to attract more visitors. For detailed analysis, more years’ data is needed to validate seasonal price trend exists.

--

--