Predicting House Prices

Tableau / Python Project

conducted June 2024

Contents:

  1. Goals

  2. Tools & Skills Used

  3. Overview

  4. Final Results & Recommendations

Goals:

  • Using Kaggle dataset, create an interactive Tableau dashboard displaying what drives real estate prices using exploratory analysis methods.

Tools:

  • Tableau

  • Python

  • Excel

Knowledge Levels:

  • Tableau 3 out of 5

  • Python 2 out of 5

  • Project Difficulty: 3 out of 5

Skills Used:

  • Sourcing Data

  • Exploring relationships

  • Geographical Visualizations with Python

  • Regression

  • Clustering

  • Sourcing & Analyzing Times Series Data

Overview:

House Sales in King County, Washington, USA

About the Data:

  • The Google Sheets link here shows the data profile for the Kaggle dataset and how I cleaned the Excel sheet. This dataset contains house sale prices for King County in Washington and includes homes sold during May 2014 through 2015. There is ample data on date sold, price, zip code, number of beds and baths, and much more useful information needed for real estate and analytical purposes.

Linear Regression:

  • Linear regression is a type of model that measures the relationship between quantitative variables to make a prediction.

  • While exploring relationships, there are four variables with the most influence on price:

    • Square Feet of Living

    • Number of Bathrooms

    • Grade

    • Has Waterfront

  • Number of Bathrooms are not directly correlated to price. Houses that are more expensive have five or more bathrooms. The same number of bathrooms can be found at other units for a lower price.

  • Prices increase when the Square Foot of Living Space increases.

  • The same can be said for Grade, which determines the quality level of construction and design.

  • Homes with a Waterfront have higher prices.

Cluster Analysis:

  • A cluster analysis groups data points into “clusters.” Comparing these new groups could reveal new patterns and relationships.

  • The cluster analysis yielded four distinct price groups:

    • High Price (orange)

    • Mid-High Price (green)

    • Mid-Low Price (red)

    • Low Price (blue)

Cluster Analysis Results:

  • There is a pattern that each price group excels in one or more categories, but never all four. For example, Low Price (blue) has high values in Bathrooms, Sqft Living, and Grade, but not Waterfront. This is with the exception of the Mid-Low Price (red) group, which has high values in all categories.

  • Average Prices

    • High Price (orange) 6.8M USD

    • Mid-High Price (green) 3.4M USD

    • Mid-Low Price (red) 2.8M USD

    • Low Price (blue)1.3M USD

  • This decomposition plot shows Zillow Real Estate Data after subsetting the data to only display home prices over time

  • Overall, the average price of homes grew ~16% from 2014 to 2015, which is quite high.

Time Series Analysis:

Final Results & Recommendations:

  • Homes with more bathrooms generally cost more than homes with more bedrooms and less bathrooms. That said, some units with 3 or more bathrooms can be found at around 1.3M USD.

  • The average house contains about 2,500 sqft. Homes with a waterfront have smaller overall sqft.

  • The average grade across all homes is 7.1. Having a higher grade increases the overall value, so buying a home around a 7.1 can save money without losing quality.

  • Real estate prices grow higher in the spring season (March, April, May). I assume it is common for people to buy homes in the summer so prices are raised in preparation.

  • Limitations:

    • Data was collected from 2014 to 2015. Having more recent data would improve prediction and forecasting.

    • The data set contains many extreme outliers, mainly luxury homes.

  • Next Steps:

    • Clean the data to remove luxury homes and other outliers, then conduct the analysis again to examine any differences in results.

    • Compare price accuracy to other real estate websites.

Next
Next

Instacart Basket Analysis