Predict the re-sale price of BMW car using Neural Network
When I moved to Canada couple of years back I was looking to buy a pre-owned/pre-certified car. However, I used to the feel the prices quoted by the dealers to be very high for the car which I use to like. The prices which used to be quoted always use to beat my prediction by $2000–$3000 . Also since I am mechanical engineer who also happen to work in the automotive domain I always use to wonder what factors influence the price of a pre-certified car. With this data set on Kaggle, I got a chance to predict just that.
I am sure many more people face the same question. I hope this data analysis and subsequent deep learning prediction model can help them.
First lets describe the data-set variables
Maker key: The brand of the car
Model key: The model of the car
Mileage: Total miles driven
Engine power: Engine capacity
Registration date: Date car was registered
Fuel: Type of fuel ( diesel, petrol,..)
Paint color: The color of the car car type- The type of car (sedan, SUV,)
Feature 1 to 8: Boolean features which the company wants to explore
Price: The price at which it was auctioned
Sold at: The date at which it was sold at
I explored the data with the help of matplotlib and seaborn the data visualization packages of python I was able to get some insights into the data set . I am going to present few of them in this post with a link to my Kaggle work space for more detailed graphs
First I see that in the re-sale market the paint color of the car rarely influence the final price of the car . Below is the violin plot of paint color vs price .
We see that the median-price and the Inter quartile range (IQR)for all colors is almost similar .
Second among all the BMW model sold SUV and coupe command a higher price range even with more miles driven. This could in part be due to the higher cost price of this vehicles and in part could be the demand in the resale market for this model is higher
Third as the mileage on the car increases the price commanded by the car decreases. The price of the car also decreases with time even if the miles on the car is lower.
The graph below shows Price vs Mileage and color coded by the registration date of the car . We can see that older car with less miles command a lower price
Finally I proceed to fit a 3 Layer Neural Network model on the data to predict the price of BMW car with the above features . After hyper parameter tuning I was able to get a R-squared value of about 82% which means that the model can explain about 82% of the variation in the price with the given features .
The side snapshots show the R-squared value I obtain from my deep neural network . Also below is the Loss and validation loss function plotted wrt iterations , I will avoid going into the initialization and other fine details of the Neural net fitted .
All those information can be found on my Kaggle notebook
The one thing I was not able to find was the 8 boolean features .If any one knows what those 8 boolean feature are please let me know as well.
For additional questions, please feel free to connect with me via LinkedIn here: https://www.linkedin.com/in/sawantsumeet/