PyTorch Workflow Fundamentals
In this chapter some of the most fundamental parts in building an Machine Learning model is being gone through. All the way from splitting the data into Training and Testing sets to saving the trained model.
The workflow that this model entails is one with a ground thruth, meaning that we actually create the data from a linear regression formula. But even though we actually know the parameters that our model will find in the data will this still be a good first model to build.
Exercise 1
Create a straight line dataset using the linear regression formula
(weight * X + bias)
. Set weight to 0.3 and bias to 0.9 and split the data into 80% training and 20% testing and plot the data
I created the data by utilizing the torch.arange()
function and then using the linear regression
formula to create the outputs seen in lines 16 and 17. I then split the data in by finding index of
80% of the data and then using slicing to order the data into training, and testing sets. This can be
seen in lines 19-24. I then created a plotting function to plot this data, which also could be used
to plot predictions which is when the model is being used on testing data. Seen in the figure below
is only the training and testing data
Exercise 2
Build a PyTorch model by subclassing
nn.Module
. In the model should parameters be initialized using thenn.Parameters
used, setting it to random values. One parameter for the weight for the bias. aforward()
model should also be created to compute the linear regression. Also make an instance of the model and print the current parameters on the untrained model.
The way of doing this is by creating a new class that inherits from the nn.Module
class. The
nn.Module
class contains more or less every building block necessary for any kind of Neural Net.
Below is my implementation:
As instructed I created parameters for the bias and the weight using the nn.Parameters
class, and setting those parameters to random values with torch.randn
. Seen in the creation
of these, I set the argument requires_grad
to True. This is needed if gradient descent is used
to find these values in the training, which is the most standard way of solving a problem of this
kind.
I also implemented a forward()
method for this model, which always has to be done when inheriting
from nn.Module
. This method is used to propagate through the neural net and will do the predictions
from the input data.
To initialize this model I just call this model, and to print out the current parameters is the method
state_dict()
called on the newly initialized model object, as seen below
Exercise 3
Create a loss function and optimizer using nn.L1Loss() and torch.optim.SGD(params, lr) respectively. Set the learning rate of the optimizer to be 0.01 and the parameters to optimize should be the model parameters from the model you created in 2. Write a training loop to perform the appropriate training steps for 300 epochs. The training loop should test the model on the test dataset every 20 epochs.
The loss function and the optimizer is two of the most fundamental things there is when it comes to machine learning.
-
The loss function dictates how the loss is measured, meaning how much the predicted data deviates from the expected output. This function will depend on what type of problem you are trying to solve. But in this exercise I was instructed to use the
nn.L1Loss()
function, which actually measures the mean absolute error. -
The optimizer is a function that tells the model how the parameters should be updated. These functions also take the very important hyperparameter
lr
, which is the learning rate. This hyperparameter is very important because it tells the model how aggressively the model should update the models parameters. The optimizer we were instructed to use here implements stochastic gradient descent and is found in thetorch.optim
module.
The training loop typically follows the below pattern, and in the code snippet below is my implementation
- forward pass (moving the input data through the network) to give a output on the current parameters
- calculate the loss with the loss function
- optimizer zero grad, resets the gradients of the optimized tensor.
- back propagation moves the data through the network backwards (calculate the loss gradients)
- optimizer step, refining the parameters (gradient descent)
Below is a figure illustrating how the loss evolves during the training loop
Exercise 4
Make predictions with the trained model on the test data. Visualize these predictions against the original training and testing data (note: you may need to make sure the predictions are not on the GPU if you want to use non-CUDA-enabled libraries such as matplotlib to plot).
This exercise is all about visualizing how well our model predicts the testing data. To get the predictions
I first put the model in evaluate mode with the method eval()
and do the prediction in inference mode with
torch.inference_mode()
. This can be seen in the code snippet below, and the visualization of the training, testing
and predicted data can be seen in the figure below

Exercise 5
Save your trained model's state_dict() to file. Create a new instance of your model class you made in 2. and load in the state_dict() you just saved to it. Perform predictions on your test data with the loaded model and confirm they match the original model predictions from 4.
To save the trained model I created a separate function as can be seen below
Here I pass in the model as one of the argument and a second argument, name which will be
under what name the models parameters will be saved. I also create a directory called models/
where the model will be saved. I also created a function to load models as seen in the code snippet
below.
LinearRegressionModel
and
pass it in as one of the arguments and also the name of the model you want to load. In the
code snippet below and the figure below depicts the training data, testing data, predictions
from the model trained and predictions from the model loaded.
