Data Science Project - 4 : Outlier Removal (Real Estate Price Prediction Project)
Description
This data science project series walks through step by step process of how to build a real estate price prediction website. We will first build a model using sklearn and linear regression using banglore home prices dataset from kaggle.com. Second step would be to write a python flask server that uses the saved model to serve http requests. Third component is the website built in html, css and javascript that allows user to enter home square ft area, bedrooms etc and it will call python flask server to retrieve the predicted price. During model building we will cover almost all data science concepts such as data load and cleaning, outlier detection and removal, feature engineering, dimensionality reduction, gridsearchcv for hyperparameter tunning, k fold cross validation etc. Technology and tools wise this project covers,
1) Python
2) Numpy and Pandas for data cleaning
3) Matplotlib for data visualization
4) Sklearn for model building
5) Jupyter notebook, visual studio code and pycharm as IDE
6) Python flask for http server
7) HTML/CSS/Javascript for UI
In this particular video we will load banglore home prices data into pandas dataframe and than handle NA values. We will than removal some unnecessary features and also normalize property size. We will convert the range of property size (such as 2100-3250) into an average of min and max.
Code: https://github.com/codebasics/py/blob/master/DataScience/BangloreHomePrices/model/banglore_home_prices_final.ipynb
Parent Code Repository: https://github.com/codebasics/py/tree/master/DataScience/BangloreHomePrices
Website: http://codebasicshub.com/
Facebook: https://www.facebook.com/codebasicshub
Twitter: https://twitter.com/codebasicshub
Patreon: https://www.patreon.com/codebasics
Comments