Skip to main content

How to serve a Pytorch model

Overview

This tutorial will guide you through the process of deploying a PyTorch AI model, specifically ResNet18, to an edge device.

ResNet18 Model

ResNet18 is a type of AI model that's really good at recognizing images. It's like a super-powered image classifier that can tell the difference between a cat and a dog, or even identify specific objects in a photo. It's used in many different applications, from self-driving cars to medical image analysis.

We'll cover three main steps:

  1. Download a pre-trained ResNet18 Model: We'll download a pre-trained ResNet18 model for image classification task. Then, the model will be saved in PyTorch format.
  2. Deploying to an Edge Node: We'll upload the trained model to your Panel's Library and deploy it to a single edge node. This will make the model accessible for real-time inference on the edge device.
  3. Making Inference: We'll demonstrate how to send image data to the deployed model and receive classification predictions. This will showcase the practical application of the edge-deployed model.

By the end of this tutorial, you'll have a solid understanding of deploying PyTorch models to edge devices and be able to apply these techniques to your own AI projects.

All the necessary scripts for this tutorial will be available in Barbara's Github profile.

Github Repository

Clone the ResNet18 Pytorch Tutorial from our Github.

Preparing the ResNet18 model

ResNet18 Model

You can get the ResNet18 model from the Pytorch site.

Download the ResNet18 model using the torch's python module and the following script.

warning

Before running the script locally, you must install Pytorch.

import torch
from torchvision import models

# Download the pre-trained ResNet18 model
model = models.resnet18(weights=True)

# Change the model to evaluation mode
model.eval()

# Convert the model to TorchScript format
# Option 1: Trace your model using a random input
dummy_input = torch.randn(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, dummy_input)

# Option 2: Export your model without tracing (faster, but more flexible)
script_module = torch.jit.script(model)

# Save the model in TorchScript format
torch.jit.save(traced_script_module, "resnet18_traced.pt")
# or
torch.jit.save(script_module, "resnet18.pt")

Using the above script we download the model and save it locally with the name resnet18_traced.pt.

How to set the config.pbtxt file

The config.pbtxt file is a crucial component in the Triton Inference Server. It provides essential configuration details about a model, enabling Triton to efficiently serve inference requests.

note

Check here the model configuration reference in the official NVidia Triton site.

In the case of a Pytorch model deployed in an Edge Node being served using NVidia Triton engine, the minimal configuration template necessary to work is:

name: "resnet"
platform: "pytorch_libtorch"
max_batch_size: 1
input[
{
name: "INPUT__0"
data_type: TYPE_FP32
dims: [3, -1, -1]
}
]
output:[
{
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: [1]
}
]

The meaning of every field is:

ParameterValues AcceptedDescription
namestringName of the model. The model name must match the folder name.
platformtensorflow_savedmodel onnxruntime_onnx pytorch_libtorchModel's format.
max_batch_sizeintegerSpecifies the maximum number of requests that Triton can combine into a single batch for processing.
input/output namestringName of the input/output layer
input/output data_typeTYPE_BOOL TYPE_INT8 TYPE_UINT8 TYPE_INT16 TYPE_UINT16 TYPE_INT32 TYPE_UINT32 TYPE_INT64 TYPE_UINT64 TYPE_FP16 TYPE_FP32 TYPE_FP64 TYPE_STRINGSpecifies the data type of the input or output tensor.
input/output dimsarraySpecifies the shape of the input or output tensor. It's a list of integers representing the dimensions of the tensor. You can use -1 to indicate a variable-sized dimension.

Preparing the .zip file

Before uploading, you must place your model.pt file and your config.pbtxt file within a specific folder structure depicted in the image below. Then just compress it in a .zip file.

Pytorch model folder structure

Pytorch model folder structure

resnet.zip

Download here the zipped file containing the folder structure for the ResNet18 model.

Uploading the model to the Panel's Library

Once you have your model saved in that compressed file, follow next steps to upload the model to the Library:

  1. Access the Library: Navigate to the "Library" section of the Barbara Panel.
  2. Create a New Model: Click the New button within the library.

Press New model button

Press New model button

  1. Select Model Type: In the Select Library Item form, choose Model.

Select Library Item

Select Library Item

  1. Select Name: A new form titled New Modelwill appear. Enter a descriptive name for your model, such as ResNet18 in this example.

Select name

Select name

  1. Select Framework: Under Select a model, choose PyTorch. The system will automatically select Triton as the serving engine.

Select Model Format

Select Model Format

  1. Select the Model .zip file: Click Select file and choose the compressed file containing your model (e.g., resnet.zip).

Select ZIP file

Select ZIP file

  1. Select version name: Write 1.0.0 as your version name.
  2. Advanced (Optional): You can expand the Advanced section to provide a description and release notes for your model version.

Select Version Name

Select Version Name

  1. Upload and Create Press the Create button to add your model to the library.

Press Create Button

Press Create Button

Once uploaded, navigate back to the Library section. Under the Models tab, you should see your newly created model (e.g., "ResNet18"). This signifies successful upload and readiness for deployment to your edge nodes.

New model added

New model added

Deploy the model in an Edge Node

Let's continue with this tutorial deploying the model we have just uploaded to library to one of our nodes.

  1. Select one node to deploy the model and navigate to its details view. Then press the Add Card button.

Node details view

Node details view

  1. Select Model in the card selector.

Select model

Select model

  1. The Model Install wizard will appear.
tip

To use your node's GPU, select the Serve model using GPU checkbox.

Deploy model

Deploy model

  1. Select the Resnet18 model we have just uploaded to the library.

Select added

Select added

  1. Select the 1.0.0 version and press the Next button.

Select version

Select version

  1. In the Compose Config tab we can adjust the gRPC and REST API ports where the model will be served. Finally, press the Send Model button.

Select ports

Select ports

Finally a Resnet18 model card will appear in the node details view. Once the status of the model is Started the model is being served. You can check the Inference REST URLs in the model to send requests and receive predictions.

Model card in node details

Model card in node details

Making inference requests to our model

resnet_inference.ipynb

Download here the Jupyter Notebook file to make inferences to our Resnet18 model.

Jupyter Notebook

Jupyter Notebook

To make inferences to our model (using the API REST port) we just need to:

  1. Send a POST request to the external inference URL shown in the card: http://10.30.248.75:9084/v2/models/resnet/infer
warning

You must enable a VPN to send an inference to the model served on the node if it's not on the same local network as your machine where the Jupyter Notebook is running.

  1. We have to attach a JSON payload with the following format:
# Create the payload for the request
data = {
"inputs": [
{
"name": input layer name,
"shape": dimensions of the input tensor,
"datatype": datatype of the input tensor,
"data": input tensor (as a list)
}
],
"outputs": [
{
"name": output layer name
}
]
}
  1. As a response, we will receive an output tensor that we must interpret. In this case, it will be a tensor of 1000 values representing the probabilities of each of the 1000 image labels for which the model has been trained. To do this, we will download a file containing these 1000 labels and sort them according to their probability, obtaining the n most probable classifications.

The points 1 and 2 are done in the function infer_image available in the jupyter notebook.

Before sending the image, we must preprocess it according to the model's needs:

  1. Resize it to 224x224 pixels with 3 color channels.
  2. Normalize it using the model's mean and standard deviation.
  3. Permute the dimensions to the CWH (Channel, Width, Height) order.
  4. Next, convert the values to floating-point format and add a batch dimension.
  5. Finally, flatten the matrix into a linear vector to be included in the JSON file sent to the model.
import numpy as np
import cv2
import json
import requests

def infer_image(image_path: str, triton_url: str, input_name: str, output_name: str) -> np.ndarray:
"""
Performs inference on an image using a Triton server.

Args:
image_path (str): Path to the image.
triton_url (str): URL of the Triton server.
input_name (str): Name of the input tensor in Triton.
output_name (str): Name of the output tensor in Triton.

Returns:
np.ndarray: Results of the inference.
"""
# Load the image
img = cv2.imread(image_path)

# Preprocess the image
img = img / 255.0
img = cv2.resize(img, (256, 256))
h, w = img.shape[:2]
y0 = (h - 224) // 2
x0 = (w - 224) // 2
img = img[y0:y0 + 224, x0:x0 + 224, :]
img = (img - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
img = np.transpose(img, axes=[2, 0, 1])
img = img.astype(np.float32)
img = np.expand_dims(img, axis=0)

# Flatten the image for the payload
img_flatten = img.flatten()

# Create the payload for the request
data = {
"inputs": [
{
"name": input_name,
"shape": list(img.shape),
"datatype": "FP32",
"data": img_flatten.tolist()
}
],
"outputs": [
{
"name": output_name
}
]
}

inference_json = json.dumps(data)

# Send the request to Triton
response = requests.post(f"{triton_url}", data=inference_json)

# Check the response
if response.status_code != 200:
raise Exception(f"Error during request: {response.text}")

# Get the results
result = np.array(response.json()['outputs'][0]['data'])

return result

Next, set some basic parameters to make the inference:

image_path = 'kitten.jpg'
triton_url = 'http://10.30.248.75:9084/v2/models/resnet/infer'
input_name = 'INPUT__0'
output_name = 'OUPUT__0'
results_number = 3

And finally we will pass the image to the infer_image function to get the results:

# Obtain the output tensor result
result = infer_image(image_path, triton_url, input_name, output_name)

# Get the indices of the top N predictions
indices = np.argpartition(result, -results_number)[-results_number:]

# Retrieve the top N predictions using the indices
top_n_predictions = result[indices]

Finally, the obtained results will be interpreted using the contents of the synset.txt file, which contains the labels for which the model has been trained. The labels with the highest probability for the selected image will be displayed.

# Load the model's labels from the archive
with open('synset.txt', 'r') as f:
labels = [l.rstrip() for l in f]

# Print the results
for i in range(results_number):
text = '%s --- prob: %f' %(labels[indices[i]][10:],top_n_predictions[i])
print(text)

# Load the image
image = Image.open(image_path)

# Create a drawing context
draw = ImageDraw.Draw(image)

# Define font and text
font = ImageFont.truetype('arial.ttf', 28)

for i in range(results_number):
text = '%s --- prob: %f' %(labels[indices[i]][10:],top_n_predictions[i])
draw.text((10, 20 + i*28), text, font=font, fill='red')

plt.imshow(image)
plt.axis('off')
plt.show()

Then we can get some results for different example images available in the repository:

  1. Kitten.jpg.

Jupyter Notebook - Kitten Result

Jupyter Notebook - Kitten Result

  1. Dog.jpg

Jupyter Notebook - Dog Result

Jupyter Notebook - Dog Result

  1. Elephant.jpg

Jupyter Notebook - Elephant Result

Jupyter Notebook - Elephant Result