In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
In [2]:
#@title MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

Basic regression: Predict fuel efficiency

In a regression problem, the aim is to predict the output of a continuous value, like a price or a probability. Contrast this with a classification problem, where the aim is to select a class from a list of classes (for example, where a picture contains an apple or an orange, recognizing which fruit is in the picture).

This tutorial uses the classic Auto MPG dataset and demonstrates how to build models to predict the fuel efficiency of the late-1970s and early 1980s automobiles. To do this, you will provide the models with a description of many automobiles from that time period. This description includes attributes like cylinders, displacement, horsepower, and weight.

This example uses the Keras API. (Visit the Keras tutorials and guides to learn more.)

In [3]:
# Use seaborn for pairplot.
!pip install -q seaborn
In [4]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Make NumPy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)
In [5]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)
2024-07-15 23:08:32.530902: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-15 23:08:32.530948: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-15 23:08:32.532165: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-15 23:08:32.538860: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-15 23:08:33.344175: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2.15.0

The Auto MPG dataset

The dataset is available from the UCI Machine Learning Repository.

Get the data

First download and import the dataset using pandas:

In [6]:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']

raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?', comment='\t',
                          sep=' ', skipinitialspace=True)
In [7]:
dataset = raw_dataset.copy()
dataset.tail()
Out[7]:
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Origin
393 27.0 4 140.0 86.0 2790.0 15.6 82 1
394 44.0 4 97.0 52.0 2130.0 24.6 82 2
395 32.0 4 135.0 84.0 2295.0 11.6 82 1
396 28.0 4 120.0 79.0 2625.0 18.6 82 1
397 31.0 4 119.0 82.0 2720.0 19.4 82 1

Clean the data

The dataset contains a few unknown values:

In [8]:
dataset.isna().sum()
Out[8]:
MPG             0
Cylinders       0
Displacement    0
Horsepower      6
Weight          0
Acceleration    0
Model Year      0
Origin          0
dtype: int64

Drop those rows to keep this initial tutorial simple:

In [9]:
dataset = dataset.dropna()

The "Origin" column is categorical, not numeric. So the next step is to one-hot encode the values in the column with pd.get_dummies.

Note: You can set up the tf.keras.Model to do this kind of transformation for you but that's beyond the scope of this tutorial. Check out the Classify structured data using Keras preprocessing layers or Load CSV data tutorials for examples.

In [10]:
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
In [11]:
dataset = pd.get_dummies(dataset, columns=['Origin'], prefix='', prefix_sep='')
dataset.tail()
Out[11]:
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Europe Japan USA
393 27.0 4 140.0 86.0 2790.0 15.6 82 False False True
394 44.0 4 97.0 52.0 2130.0 24.6 82 True False False
395 32.0 4 135.0 84.0 2295.0 11.6 82 False False True
396 28.0 4 120.0 79.0 2625.0 18.6 82 False False True
397 31.0 4 119.0 82.0 2720.0 19.4 82 False False True

Split the data into training and test sets

Now, split the dataset into a training set and a test set. You will use the test set in the final evaluation of your models.

In [12]:
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

Inspect the data

Review the joint distribution of a few pairs of columns from the training set.

The top row suggests that the fuel efficiency (MPG) is a function of all the other parameters. The other rows indicate they are functions of each other.

In [13]:
sns.pairplot(train_dataset[['MPG', 'Cylinders', 'Displacement', 'Weight']], diag_kind='kde')
Out[13]:
<seaborn.axisgrid.PairGrid at 0x7f08aba41fc0>

Let's also check the overall statistics. Note how each feature covers a very different range:

In [14]:
train_dataset.describe().transpose()
Out[14]:
count mean std min 25% 50% 75% max
MPG 314.0 23.310510 7.728652 10.0 17.00 22.0 28.95 46.6
Cylinders 314.0 5.477707 1.699788 3.0 4.00 4.0 8.00 8.0
Displacement 314.0 195.318471 104.331589 68.0 105.50 151.0 265.75 455.0
Horsepower 314.0 104.869427 38.096214 46.0 76.25 94.5 128.00 225.0
Weight 314.0 2990.251592 843.898596 1649.0 2256.50 2822.5 3608.00 5140.0
Acceleration 314.0 15.559236 2.789230 8.0 13.80 15.5 17.20 24.8
Model Year 314.0 75.898089 3.675642 70.0 73.00 76.0 79.00 82.0

Split features from labels

Separate the target value—the "label"—from the features. This label is the value that you will train the model to predict.

In [15]:
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')

Normalization

In the table of statistics it's easy to see how different the ranges of each feature are:

In [16]:
train_dataset.describe().transpose()[['mean', 'std']]
Out[16]:
mean std
MPG 23.310510 7.728652
Cylinders 5.477707 1.699788
Displacement 195.318471 104.331589
Horsepower 104.869427 38.096214
Weight 2990.251592 843.898596
Acceleration 15.559236 2.789230
Model Year 75.898089 3.675642

It is good practice to normalize features that use different scales and ranges.

One reason this is important is because the features are multiplied by the model weights. So, the scale of the outputs and the scale of the gradients are affected by the scale of the inputs.

Although a model might converge without feature normalization, normalization makes training much more stable.

Note: There is no advantage to normalizing the one-hot features—it is done here for simplicity. For more details on how to use the preprocessing layers, refer to the Working with preprocessing layers guide and the Classify structured data using Keras preprocessing layers tutorial.

The Normalization layer

The tf.keras.layers.Normalization is a clean and simple way to add feature normalization into your model.

The first step is to create the layer:

In [17]:
normalizer = tf.keras.layers.Normalization(axis=-1)

Then, fit the state of the preprocessing layer to the data by calling Normalization.adapt:

In [21]:
normalizer.adapt(np.array(train_features).astype('float32'))
2024-07-15 23:09:00.278342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10525 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1
2024-07-15 23:09:00.279494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10525 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1
2024-07-15 23:09:00.280564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 10525 MB memory:  -> device: 2, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1
2024-07-15 23:09:00.281602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 10525 MB memory:  -> device: 3, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:0c:00.0, compute capability: 6.1
2024-07-15 23:09:00.855756: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory

Calculate the mean and variance, and store them in the layer:

In [21]:
print(normalizer.mean.numpy())
[[   5.478  195.318  104.869 2990.252   15.559   75.898    0.178    0.197
     0.624]]

When the layer is called, it returns the input data, with each feature independently normalized:

In [22]:
first = np.array(train_features[:1]).astype('float32')

with np.printoptions(precision=2, suppress=True):
  print('First example:', first)
  print()
  print('Normalized:', normalizer(first).numpy())
First example: [[   4.    90.    75.  2125.    14.5   74.     0.     0.     1. ]]

Normalized: [[-0.87 -1.01 -0.79 -1.03 -0.38 -0.52 -0.47 -0.5   0.78]]

Linear regression

Before building a deep neural network model, start with linear regression using one and several variables.

Linear regression with one variable

Begin with a single-variable linear regression to predict 'MPG' from 'Horsepower'.

Training a model with tf.keras typically starts by defining the model architecture. Use a tf.keras.Sequential model, which represents a sequence of steps.

There are two steps in your single-variable linear regression model:

  • Normalize the 'Horsepower' input features using the tf.keras.layers.Normalization preprocessing layer.
  • Apply a linear transformation ($y = mx+b$) to produce 1 output using a linear layer (tf.keras.layers.Dense).

The number of inputs can either be set by the input_shape argument, or automatically when the model is run for the first time.

First, create a NumPy array made of the 'Horsepower' features. Then, instantiate the tf.keras.layers.Normalization and fit its state to the horsepower data:

In [23]:
horsepower = np.array(train_features['Horsepower']).astype('float32')

horsepower_normalizer = layers.Normalization(input_shape=[1,], axis=None)
horsepower_normalizer.adapt(horsepower)

Build the Keras Sequential model:

In [24]:
horsepower_model = tf.keras.Sequential([
    horsepower_normalizer,
    layers.Dense(units=1)
])

horsepower_model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 normalization_1 (Normaliza  (None, 1)                 3         
 tion)                                                           
                                                                 
 dense (Dense)               (None, 1)                 2         
                                                                 
=================================================================
Total params: 5 (24.00 Byte)
Trainable params: 2 (8.00 Byte)
Non-trainable params: 3 (16.00 Byte)
_________________________________________________________________

This model will predict 'MPG' from 'Horsepower'.

Run the untrained model on the first 10 'Horsepower' values. The output won't be good, but notice that it has the expected shape of (10, 1):

In [25]:
horsepower_model.predict(horsepower[:10])
1/1 [==============================] - 0s 164ms/step
Out[25]:
array([[ 0.346],
       [ 0.196],
       [-0.639],
       [ 0.486],
       [ 0.439],
       [ 0.172],
       [ 0.52 ],
       [ 0.439],
       [ 0.114],
       [ 0.196]], dtype=float32)

Once the model is built, configure the training procedure using the Keras Model.compile method. The most important arguments to compile are the loss and the optimizer, since these define what will be optimized (mean_absolute_error) and how (using the tf.keras.optimizers.Adam).

In [26]:
horsepower_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
    loss='mean_absolute_error')

Use Keras Model.fit to execute the training for 100 epochs:

In [27]:
%%time
history = horsepower_model.fit(
    train_features['Horsepower'],
    train_labels,
    epochs=100,
    # Suppress logging.
    verbose=0,
    # Calculate validation results on 20% of the training data.
    validation_split = 0.2)
2024-07-15 23:12:45.368428: I external/local_xla/xla/service/service.cc:168] XLA service 0x7f0601127f70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-07-15 23:12:45.368483: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2024-07-15 23:12:45.368496: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (1): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2024-07-15 23:12:45.368505: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (2): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2024-07-15 23:12:45.368515: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (3): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2024-07-15 23:12:45.381746: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-07-15 23:12:45.424881: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8904
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1721085165.527645    1855 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
CPU times: user 5.96 s, sys: 2.42 s, total: 8.37 s
Wall time: 5.67 s

Visualize the model's training progress using the stats stored in the history object:

In [28]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()
Out[28]:
loss val_loss epoch
95 3.806211 4.183092 95
96 3.807321 4.198089 96
97 3.804003 4.165492 97
98 3.808394 4.157538 98
99 3.803519 4.183074 99
In [29]:
def plot_loss(history):
  plt.plot(history.history['loss'], label='loss')
  plt.plot(history.history['val_loss'], label='val_loss')
  plt.ylim([0, 10])
  plt.xlabel('Epoch')
  plt.ylabel('Error [MPG]')
  plt.legend()
  plt.grid(True)
In [30]:
plot_loss(history)

Collect the results on the test set for later:

In [31]:
test_results = {}

test_results['horsepower_model'] = horsepower_model.evaluate(
    test_features['Horsepower'],
    test_labels, verbose=0)

Since this is a single variable regression, it's easy to view the model's predictions as a function of the input:

In [32]:
x = tf.linspace(0.0, 250, 251)
y = horsepower_model.predict(x)
8/8 [==============================] - 0s 1ms/step
In [33]:
def plot_horsepower(x, y):
  plt.scatter(train_features['Horsepower'], train_labels, label='Data')
  plt.plot(x, y, color='k', label='Predictions')
  plt.xlabel('Horsepower')
  plt.ylabel('MPG')
  plt.legend()
In [34]:
plot_horsepower(x, y)

Linear regression with multiple inputs

You can use an almost identical setup to make predictions based on multiple inputs. This model still does the same $y = mx+b$ except that $m$ is a matrix and $x$ is a vector.

Create a two-step Keras Sequential model again with the first layer being normalizer (tf.keras.layers.Normalization(axis=-1)) you defined earlier and adapted to the whole dataset:

In [38]:
linear_model = tf.keras.Sequential([
    normalizer,
    layers.Dense(units=1)
])

When you call Model.predict on a batch of inputs, it produces units=1 outputs for each example:

In [40]:
linear_model.predict(train_features.astype('float32')[:10])
1/1 [==============================] - 0s 61ms/step
Out[40]:
array([[-0.959],
       [-1.158],
       [ 1.826],
       [-2.627],
       [-1.609],
       [-0.388],
       [-1.636],
       [ 0.812],
       [ 0.364],
       [ 0.119]], dtype=float32)

When you call the model, its weight matrices will be built—check that the kernel weights (the $m$ in $y=mx+b$) have a shape of (9, 1):

In [41]:
linear_model.layers[1].kernel
Out[41]:
<tf.Variable 'dense_2/kernel:0' shape=(9, 1) dtype=float32, numpy=
array([[ 0.525],
       [-0.273],
       [ 0.243],
       [ 0.514],
       [ 0.367],
       [-0.724],
       [ 0.43 ],
       [-0.1  ],
       [-0.185]], dtype=float32)>

Configure the model with Keras Model.compile and train with Model.fit for 100 epochs:

In [42]:
linear_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
    loss='mean_absolute_error')
In [44]:
%%time
history = linear_model.fit(
    train_features.astype('float32'),
    train_labels,
    epochs=100,
    # Suppress logging.
    verbose=0,
    # Calculate validation results on 20% of the training data.
    validation_split = 0.2)
CPU times: user 5.5 s, sys: 2.13 s, total: 7.62 s
Wall time: 5.15 s

Using all the inputs in this regression model achieves a much lower training and validation error than the horsepower_model, which had one input:

In [45]:
plot_loss(history)

Collect the results on the test set for later:

In [47]:
test_results['linear_model'] = linear_model.evaluate(
    test_features.astype('float32'), test_labels, verbose=0)

Regression with a deep neural network (DNN)

In the previous section, you implemented two linear models for single and multiple inputs.

Here, you will implement single-input and multiple-input DNN models.

The code is basically the same except the model is expanded to include some "hidden" non-linear layers. The name "hidden" here just means not directly connected to the inputs or outputs.

These models will contain a few more layers than the linear model:

  • The normalization layer, as before (with horsepower_normalizer for a single-input model and normalizer for a multiple-input model).
  • Two hidden, non-linear, Dense layers with the ReLU (relu) activation function nonlinearity.
  • A linear Dense single-output layer.

Both models will use the same training procedure, so the compile method is included in the build_and_compile_model function below.

In [48]:
def build_and_compile_model(norm):
  model = keras.Sequential([
      norm,
      layers.Dense(64, activation='relu'),
      layers.Dense(64, activation='relu'),
      layers.Dense(1)
  ])

  model.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))
  return model

Regression using a DNN and a single input

Create a DNN model with only 'Horsepower' as input and horsepower_normalizer (defined earlier) as the normalization layer:

In [49]:
dnn_horsepower_model = build_and_compile_model(horsepower_normalizer)

This model has quite a few more trainable parameters than the linear models:

In [50]:
dnn_horsepower_model.summary()
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 normalization_1 (Normaliza  (None, 1)                 3         
 tion)                                                           
                                                                 
 dense_3 (Dense)             (None, 64)                128       
                                                                 
 dense_4 (Dense)             (None, 64)                4160      
                                                                 
 dense_5 (Dense)             (None, 1)                 65        
                                                                 
=================================================================
Total params: 4356 (17.02 KB)
Trainable params: 4353 (17.00 KB)
Non-trainable params: 3 (16.00 Byte)
_________________________________________________________________

Train the model with Keras Model.fit:

In [51]:
%%time
history = dnn_horsepower_model.fit(
    train_features['Horsepower'],
    train_labels,
    validation_split=0.2,
    verbose=0, epochs=100)
CPU times: user 7.66 s, sys: 2.38 s, total: 10 s
Wall time: 7 s

This model does slightly better than the linear single-input horsepower_model:

In [52]:
plot_loss(history)

If you plot the predictions as a function of 'Horsepower', you should notice how this model takes advantage of the nonlinearity provided by the hidden layers:

In [53]:
x = tf.linspace(0.0, 250, 251)
y = dnn_horsepower_model.predict(x)
8/8 [==============================] - 0s 2ms/step
In [54]:
plot_horsepower(x, y)

Collect the results on the test set for later:

In [55]:
test_results['dnn_horsepower_model'] = dnn_horsepower_model.evaluate(
    test_features['Horsepower'], test_labels,
    verbose=0)

Regression using a DNN and multiple inputs

Repeat the previous process using all the inputs. The model's performance slightly improves on the validation dataset.

In [56]:
dnn_model = build_and_compile_model(normalizer)
dnn_model.summary()
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 normalization (Normalizati  (None, 9)                 19        
 on)                                                             
                                                                 
 dense_6 (Dense)             (None, 64)                640       
                                                                 
 dense_7 (Dense)             (None, 64)                4160      
                                                                 
 dense_8 (Dense)             (None, 1)                 65        
                                                                 
=================================================================
Total params: 4884 (19.08 KB)
Trainable params: 4865 (19.00 KB)
Non-trainable params: 19 (80.00 Byte)
_________________________________________________________________
In [58]:
%%time
history = dnn_model.fit(
    train_features.astype('float32'),
    train_labels,
    validation_split=0.2,
    verbose=0, epochs=100)
CPU times: user 7.16 s, sys: 2.18 s, total: 9.34 s
Wall time: 6.48 s
In [59]:
plot_loss(history)

Collect the results on the test set:

In [61]:
test_results['dnn_model'] = dnn_model.evaluate(test_features.astype('float32'), test_labels, verbose=0)

Performance

Since all models have been trained, you can review their test set performance:

In [62]:
pd.DataFrame(test_results, index=['Mean absolute error [MPG]']).T
Out[62]:
Mean absolute error [MPG]
horsepower_model 3.647106
linear_model 2.510668
dnn_horsepower_model 2.936707
dnn_model 1.740302

These results match the validation error observed during training.

Make predictions

You can now make predictions with the dnn_model on the test set using Keras Model.predict and review the loss:

In [64]:
test_predictions = dnn_model.predict(test_features.astype('float32')).flatten()

a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
lims = [0, 50]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)
3/3 [==============================] - 0s 2ms/step

It appears that the model predicts reasonably well.

Now, check the error distribution:

In [65]:
error = test_predictions - test_labels
plt.hist(error, bins=25)
plt.xlabel('Prediction Error [MPG]')
_ = plt.ylabel('Count')

If you're happy with the model, save it for later use with Model.save:

In [67]:
dnn_model.save('/tmp/dnn_model.keras')

If you reload the model, it gives identical output:

In [69]:
reloaded = tf.keras.models.load_model('/tmp/dnn_model.keras')

test_results['reloaded'] = reloaded.evaluate(
    test_features.astype('float32'), test_labels, verbose=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[69], line 1
----> 1 reloaded = tf.keras.models.load_model('/tmp/dnn_model.keras')
      3 test_results['reloaded'] = reloaded.evaluate(
      4     test_features.astype('float32'), test_labels, verbose=0)

File /usr/local/lib/python3.10/dist-packages/keras/src/saving/saving_api.py:254, in load_model(filepath, custom_objects, compile, safe_mode, **kwargs)
    249     if kwargs:
    250         raise ValueError(
    251             "The following argument(s) are not supported "
    252             f"with the native Keras format: {list(kwargs.keys())}"
    253         )
--> 254     return saving_lib.load_model(
    255         filepath,
    256         custom_objects=custom_objects,
    257         compile=compile,
    258         safe_mode=safe_mode,
    259     )
    261 # Legacy case.
    262 return legacy_sm_saving_lib.load_model(
    263     filepath, custom_objects=custom_objects, compile=compile, **kwargs
    264 )

File /usr/local/lib/python3.10/dist-packages/keras/src/saving/saving_lib.py:281, in load_model(filepath, custom_objects, compile, safe_mode)
    278             asset_store.close()
    280 except Exception as e:
--> 281     raise e
    282 else:
    283     return model

File /usr/local/lib/python3.10/dist-packages/keras/src/saving/saving_lib.py:246, in load_model(filepath, custom_objects, compile, safe_mode)
    244 # Construct the model from the configuration file in the archive.
    245 with ObjectSharingScope():
--> 246     model = deserialize_keras_object(
    247         config_dict, custom_objects, safe_mode=safe_mode
    248     )
    250 all_filenames = zf.namelist()
    251 if _VARS_FNAME + ".h5" in all_filenames:

File /usr/local/lib/python3.10/dist-packages/keras/src/saving/serialization_lib.py:728, in deserialize_keras_object(config, custom_objects, safe_mode, **kwargs)
    726 safe_mode_scope = SafeModeScope(safe_mode)
    727 with custom_obj_scope, safe_mode_scope:
--> 728     instance = cls.from_config(inner_config)
    729     build_config = config.get("build_config", None)
    730     if build_config:

File /usr/local/lib/python3.10/dist-packages/keras/src/engine/sequential.py:471, in Sequential.from_config(cls, config, custom_objects)
    465     use_legacy_format = "module" not in layer_config
    466     layer = layer_module.deserialize(
    467         layer_config,
    468         custom_objects=custom_objects,
    469         use_legacy_format=use_legacy_format,
    470     )
--> 471     model.add(layer)
    473 if (
    474     not model.inputs
    475     and build_input_shape
    476     and isinstance(build_input_shape, (tuple, list))
    477 ):
    478     model.build(build_input_shape)

File /usr/local/lib/python3.10/dist-packages/tensorflow/python/trackable/base.py:204, in no_automatic_dependency_tracking.<locals>._method_wrapper(self, *args, **kwargs)
    202 self._self_setattr_tracking = False  # pylint: disable=protected-access
    203 try:
--> 204   result = method(self, *args, **kwargs)
    205 finally:
    206   self._self_setattr_tracking = previous_value  # pylint: disable=protected-access

File /usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File /usr/local/lib/python3.10/dist-packages/keras/src/layers/preprocessing/normalization.py:188, in Normalization.build(self, input_shape)
    186 for d in self._keep_axis:
    187     if input_shape[d] is None:
--> 188         raise ValueError(
    189             "All `axis` values to be kept must have known shape. "
    190             "Got axis: {}, "
    191             "input shape: {}, with unknown axis at index: {}".format(
    192                 self.axis, input_shape, d
    193             )
    194         )
    195 # Axes to be reduced.
    196 self._reduce_axis = [d for d in range(ndim) if d not in self._keep_axis]

ValueError: All `axis` values to be kept must have known shape. Got axis: (-1,), input shape: [None, None], with unknown axis at index: 1
In [70]:
pd.DataFrame(test_results, index=['Mean absolute error [MPG]']).T
Out[70]:
Mean absolute error [MPG]
horsepower_model 3.647106
linear_model 2.510668
dnn_horsepower_model 2.936707
dnn_model 1.740302

Conclusion

This notebook introduced a few techniques to handle a regression problem. Here are a few more tips that may help:

  • Mean squared error (MSE) (tf.keras.losses.MeanSquaredError) and mean absolute error (MAE) (tf.keras.losses.MeanAbsoluteError) are common loss functions used for regression problems. MAE is less sensitive to outliers. Different loss functions are used for classification problems.
  • Similarly, evaluation metrics used for regression differ from classification.
  • When numeric input data features have values with different ranges, each feature should be scaled independently to the same range.
  • Overfitting is a common problem for DNN models, though it wasn't a problem for this tutorial. Visit the Overfit and underfit tutorial for more help with this.