Car Purcharse Prediction

Our work as a car salesman and we need to develop a model to predict the total dollar amount that customers are willing to pay. We have the following dataset:

  • Customer Name
  • Customer e-mail
  • Country
  • Gender
  • Age
  • Annual Salary
  • Credit Card Debt
  • Net Worth

Our predictor variable is Car Purchase Amount.

Import libraries and dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('Car_Purchasing_Data.csv', encoding='ISO-8859-1')
Customer NameCustomer e-mailCountryGenderAgeAnnual SalaryCredit Card DebtNet WorthCar Purchase Amount
0Martina Avilacubilia.Curae.Phasellus@quisaccumsanconvallis.eduBulgaria041.85172062812.0930111609.380910238961.250535321.45877
2Naomi Rodriquezvulputate.mauris.sagittis@ametconsectetueradip...Algeria143.15289753798.5511211160.355060638467.177342925.70921
3Jade Cunninghammalesuada@dignissim.comCook Islands158.27136979370.0379814426.164850548599.052467422.36313
4Cedric Leachfelis.ullamcorper.viverra@egetmollislectus.netBrazil157.31374959729.151305358.712177560304.067155915.46248

Data Visualization

  • We can see that there is a certain linear relation between Car Purchase Amount with Age, Annual Salary and Net Worth.

Creating testing and training dataset

– First, to construct our training daraset:

we are going to drop some features because they aren’t very usful for our purpose. We drop our predictor variable too.

X = df.drop(['Customer Name', 'Customer e-mail', 'Country', 'Car Purchase Amount'], axis = 1)
GenderAgeAnnual SalaryCredit Card DebtNet Worth

– Second, we are going to get our predictor variable:

y = df['Car Purchase Amount']

Data normalization

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
    array([1.e+00, 7.e+01, 1.e+05, 2.e+04, 1.e+06])
    array([    0.,    20., 20000.,   100., 20000.])
    [0. 0. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 0. 0. 0. 1. 0. 1. 1. 0. 1. 0. 1. 1.
     0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 1. 1. 0. 1. 0.
     1. 0. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 0.
     0. 1. 1. 0. 1. 1. 0. 0. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 1. 1.
     0. 1. 0. 0. 1. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 0.
     1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 1.
     0. 0. 1. 0. 1. 1. 1. 1. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 1.
     0. 1. 1. 1. 0. 1. 0. 0. 1. 0. 1. 1. 1. 0. 1. 1. 0. 1. 0. 0. 1. 1. 1. 1.
     0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0.
     1. 1. 0. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1. 0. 1. 0. 1. 1. 1. 1. 1. 0. 0. 1.
     0. 1. 0. 1. 1. 0. 1. 0. 1. 0. 0. 0. 1. 1. 0. 1. 1. 1. 1. 0. 0. 0. 1. 1.
     0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0.
     0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 1. 0.
     0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 0.
     0. 1. 0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 1. 1. 1. 0. 1.
     1. 0. 1. 1. 1. 0. 1. 1. 1. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 1. 0. 0. 1. 1.
     0. 1. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 1.
     0. 1. 0. 1. 1. 0. 0. 1. 1. 0. 1. 1. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0.
     1. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 0. 1. 1. 1. 0. 1. 1. 0. 0. 1. 1.
     0. 1. 1. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 1. 1. 0.
     0. 1. 1. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 1. 1. 1.]
y = y.values.reshape(-1,1)
    (500, 1)
y_scaled = scaler.fit_transform(y)
           [0.4111198 ],

Training the model and built the neuronal network

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size = 0.25) #25% data to testing
from keras.models import Sequential
from keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler

model = Sequential()
#First Layer
model.add(Dense(30, input_dim=5, activation='relu')) #30 neurons
#Second Layer
model.add(Dense(60, activation='relu')) #60 neurons
#Output Layer
model.add(Dense(1, activation='linear'))
    Model: "sequential"
    Layer (type)                 Output Shape              Param #   
    dense (Dense)                (None, 30)                180       
    dense_1 (Dense)              (None, 60)                1860      
    dense_2 (Dense)              (None, 1)                 61        
    Total params: 2,101
    Trainable params: 2,101
    Non-trainable params: 0
model.compile(optimizer='adam', loss='mean_squared_error')

epochs_hist =, y_train, epochs=100, batch_size=25,  verbose=1, validation_split=0.2)
    Epoch 1/100
    12/12 [==============================] - 0s 28ms/step - loss: 9.0786e-04 - val_loss: 9.8518e-04
    Epoch 2/100
    12/12 [==============================] - 0s 26ms/step - loss: 7.9939e-04 - val_loss: 8.7099e-04
    Epoch 3/100
    12/12 [==============================] - 0s 25ms/step - loss: 7.3383e-04 - val_loss: 7.9677e-04
    Epoch 4/100
    12/12 [==============================] - 0s 24ms/step - loss: 6.5883e-04 - val_loss: 7.0038e-04
    Epoch 5/100
    12/12 [==============================] - 0s 25ms/step - loss: 6.1143e-04 - val_loss: 6.2026e-04
    Epoch 6/100
    12/12 [==============================] - 0s 21ms/step - loss: 5.2812e-04 - val_loss: 5.4793e-04
    Epoch 7/100
    12/12 [==============================] - 0s 27ms/step - loss: 4.9849e-04 - val_loss: 4.8312e-04
    Epoch 97/100
    12/12 [==============================] - 0s 28ms/step - loss: 7.6140e-06 - val_loss: 2.1254e-05
    Epoch 98/100
    12/12 [==============================] - 0s 15ms/step - loss: 7.6621e-06 - val_loss: 2.1219e-05
    Epoch 99/100
    12/12 [==============================] - 0s 24ms/step - loss: 9.1768e-06 - val_loss: 2.3739e-05
    Epoch 100/100
    12/12 [==============================] - 0s 14ms/step - loss: 8.3331e-06 - val_loss: 2.2158e-05

Testing the model

    dict_keys(['loss', 'val_loss'])

plt.title('Model Loss Progression During Training/Testing')
plt.ylabel('Training and Testing Losses')
plt.legend(['Training Loss', 'Testing Loss'])

Example with our model

# Gender, Age, Annual Salary, Credit Card Debt, Net Worth
X_Testing = np.array([[1, 50, 50000, 10985, 629312]])

y_predict = model.predict(X_Testing)
    (1, 1)
print('Expected Purchase Amount=', y_predict[:,0])
    Expected Purchase Amount= [244686.44]