Predicting House Prices
Tableau / Python Project
conducted June 2024
Contents:
Goals
Tools & Skills Used
Overview
Final Results & Recommendations
Goals:
Using Kaggle dataset, create an interactive Tableau dashboard displaying what drives real estate prices using exploratory analysis methods.
Links:
Tools:
Tableau
Python
Excel
Knowledge Levels:
Tableau 3 out of 5
Python 2 out of 5
Project Difficulty: 3 out of 5
Skills Used:
Sourcing Data
Exploring relationships
Geographical Visualizations with Python
Regression
Clustering
Sourcing & Analyzing Times Series Data
Overview:
House Sales in King County, Washington, USA
About the Data:
The Google Sheets link here shows the data profile for the Kaggle dataset and how I cleaned the Excel sheet. This dataset contains house sale prices for King County in Washington and includes homes sold during May 2014 through 2015. There is ample data on date sold, price, zip code, number of beds and baths, and much more useful information needed for real estate and analytical purposes.
Linear Regression:
Linear regression is a type of model that measures the relationship between quantitative variables to make a prediction.
While exploring relationships, there are four variables with the most influence on price:
Square Feet of Living
Number of Bathrooms
Grade
Has Waterfront
Number of Bathrooms are not directly correlated to price. Houses that are more expensive have five or more bathrooms. The same number of bathrooms can be found at other units for a lower price.
Prices increase when the Square Foot of Living Space increases.
The same can be said for Grade, which determines the quality level of construction and design.
Homes with a Waterfront have higher prices.
Cluster Analysis:
A cluster analysis groups data points into “clusters.” Comparing these new groups could reveal new patterns and relationships.
The cluster analysis yielded four distinct price groups:
High Price (orange)
Mid-High Price (green)
Mid-Low Price (red)
Low Price (blue)
Cluster Analysis Results:
There is a pattern that each price group excels in one or more categories, but never all four. For example, Low Price (blue) has high values in Bathrooms, Sqft Living, and Grade, but not Waterfront. This is with the exception of the Mid-Low Price (red) group, which has high values in all categories.
Average Prices
High Price (orange) 6.8M USD
Mid-High Price (green) 3.4M USD
Mid-Low Price (red) 2.8M USD
Low Price (blue)1.3M USD
In order to compare my analysis to a larger source, Zillow Real Estate Data was utilized from Quandle Marketplace. Source: https://data.nasdaq.com/databases/ZILLOW
This decomposition plot shows Zillow Real Estate Data after subsetting the data to only display home prices over time
Overall, the average price of homes grew ~16% from 2014 to 2015, which is quite high.
Time Series Analysis:
Final Results & Recommendations:
Homes with more bathrooms generally cost more than homes with more bedrooms and less bathrooms. That said, some units with 3 or more bathrooms can be found at around 1.3M USD.
The average house contains about 2,500 sqft. Homes with a waterfront have smaller overall sqft.
The average grade across all homes is 7.1. Having a higher grade increases the overall value, so buying a home around a 7.1 can save money without losing quality.
Real estate prices grow higher in the spring season (March, April, May). I assume it is common for people to buy homes in the summer so prices are raised in preparation.
Limitations:
Data was collected from 2014 to 2015. Having more recent data would improve prediction and forecasting.
The data set contains many extreme outliers, mainly luxury homes.
Next Steps:
Clean the data to remove luxury homes and other outliers, then conduct the analysis again to examine any differences in results.
Compare price accuracy to other real estate websites.