How to serve a Pytorch model
Overview
This tutorial will guide you through the process of deploying a PyTorch AI model, specifically ResNet18, to an edge device.
ResNet18 is a type of AI model that's really good at recognizing images. It's like a super-powered image classifier that can tell the difference between a cat and a dog, or even identify specific objects in a photo. It's used in many different applications, from self-driving cars to medical image analysis.
We'll cover three main steps:
- Download a pre-trained ResNet18 Model: We'll download a pre-trained ResNet18 model for image classification task. Then, the model will be saved in PyTorch format.
- Deploying to an Edge Node: We'll upload the trained model to your Panel's Library and deploy it to a single edge node. This will make the model accessible for real-time inference on the edge device.
- Making Inference: We'll demonstrate how to send image data to the deployed model and receive classification predictions. This will showcase the practical application of the edge-deployed model.
By the end of this tutorial, you'll have a solid understanding of deploying PyTorch models to edge devices and be able to apply these techniques to your own AI projects.
All the necessary scripts for this tutorial will be available in Barbara's Github profile.
Clone the ResNet18 Pytorch Tutorial from our Github.
Preparing the ResNet18 model
You can get the ResNet18 model from the Pytorch site.
Download the ResNet18 model using the torch's python module and the following script.
Before running the script locally, you must install Pytorch.
import torch
from torchvision import models
# Download the pre-trained ResNet18 model
model = models.resnet18(weights=True)
# Change the model to evaluation mode
model.eval()
# Convert the model to TorchScript format
# Option 1: Trace your model using a random input
dummy_input = torch.randn(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, dummy_input)
# Option 2: Export your model without tracing (faster, but more flexible)
script_module = torch.jit.script(model)
# Save the model in TorchScript format
torch.jit.save(traced_script_module, "resnet18_traced.pt")
# or
torch.jit.save(script_module, "resnet18.pt")
Using the above script we download the model and save it locally with the name resnet18_traced.pt.
How to set the config.pbtxt file
The config.pbtxt file is a crucial component in the Triton Inference Server. It provides essential configuration details about a model, enabling Triton to efficiently serve inference requests.
Check here the model configuration reference in the official NVidia Triton site.
In the case of a Pytorch model deployed in an Edge Node being served using NVidia Triton engine, the minimal configuration template necessary to work is:
name: "resnet"
platform: "pytorch_libtorch"
max_batch_size: 1
input[
{
name: "INPUT__0"
data_type: TYPE_FP32
dims: [3, -1, -1]
}
]
output:[
{
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: [1]
}
]
The meaning of every field is:
| Parameter | Values Accepted | Description |
|---|---|---|
name | string | Name of the model. The model name must match the folder name. |
platform | tensorflow_savedmodel onnxruntime_onnx pytorch_libtorch | Model's format. |
max_batch_size | integer | Specifies the maximum number of requests that Triton can combine into a single batch for processing. |
input/output name | string | Name of the input/output layer |
input/output data_type | TYPE_BOOL TYPE_INT8 TYPE_UINT8 TYPE_INT16 TYPE_UINT16 TYPE_INT32 TYPE_UINT32 TYPE_INT64 TYPE_UINT64 TYPE_FP16 TYPE_FP32 TYPE_FP64 TYPE_STRING | Specifies the data type of the input or output tensor. |
input/output dims | array | Specifies the shape of the input or output tensor. It's a list of integers representing the dimensions of the tensor. You can use -1 to indicate a variable-sized dimension. |
Preparing the .zip file
Before uploading, you must place your model.pt file and your config.pbtxt file within a specific folder structure depicted in the image below. Then just compress it in a .zip file.

Pytorch model folder structure
Download here the zipped file containing the folder structure for the ResNet18 model.
Uploading the model to the Panel's Library
Once you have your model saved in that compressed file, follow next steps to upload the model to the Library:
- Access the Library: Navigate to the "Library" section of the Barbara Panel.
- Create a New Model: Click the
Newbutton within the library.

Press New model button
- Select Model Type: In the
Select Library Itemform, chooseModel.

Select Library Item
- Select Name: A new form titled
New Modelwill appear. Enter a descriptive name for your model, such asResNet18in this example.

Select name
- Select Framework: Under
Select a model, choosePyTorch. The system will automatically selectTritonas the serving engine.

Select Model Format
- Select the Model .zip file: Click
Select fileand choose the compressed file containing your model (e.g.,resnet.zip).

Select ZIP file
- Select version name: Write
1.0.0as your version name. - Advanced (Optional): You can expand the
Advancedsection to provide a description and release notes for your model version.

Select Version Name
- Upload and Create Press the
Createbutton to add your model to the library.

Press Create Button
Once uploaded, navigate back to the Library section. Under the Models tab, you should see your newly created model (e.g., "ResNet18"). This signifies successful upload and readiness for deployment to your edge nodes.

New model added
Deploy the model in an Edge Node
Let's continue with this tutorial deploying the model we have just uploaded to library to one of our nodes.
- Select one node to deploy the model and navigate to its details view. Then press the
Add Cardbutton.

Node details view
- Select
Modelin the card selector.

Select model
- The
Model Installwizard will appear.
To use your node's GPU, select the Serve model using GPU checkbox.

Deploy model
- Select the
Resnet18model we have just uploaded to the library.

Select added
- Select the
1.0.0version and press theNextbutton.

Select version
- In the
Compose Configtab we can adjust the gRPC and REST API ports where the model will be served. Finally, press theSend Modelbutton.

Select ports
Finally a Resnet18 model card will appear in the node details view. Once the status of the model is Started the model is being served. You can check the Inference REST URLs in the model to send requests and receive predictions.

Model card in node details
Making inference requests to our model
Download here the Jupyter Notebook file to make inferences to our Resnet18 model.

Jupyter Notebook
To make inferences to our model (using the API REST port) we just need to:
- Send a POST request to the external inference URL shown in the card:
http://10.30.248.75:9084/v2/models/resnet/infer
You must enable a VPN to send an inference to the model served on the node if it's not on the same local network as your machine where the Jupyter Notebook is running.
- We have to attach a JSON payload with the following format:
# Create the payload for the request
data = {
"inputs": [
{
"name": input layer name,
"shape": dimensions of the input tensor,
"datatype": datatype of the input tensor,
"data": input tensor (as a list)
}
],
"outputs": [
{
"name": output layer name
}
]
}
- As a response, we will receive an output tensor that we must interpret. In this case, it will be a tensor of 1000 values representing the probabilities of each of the 1000 image labels for which the model has been trained. To do this, we will download a file containing these 1000 labels and sort them according to their probability, obtaining the
nmost probable classifications.
The points 1 and 2 are done in the function infer_image available in the jupyter notebook.
Before sending the image, we must preprocess it according to the model's needs:
- Resize it to 224x224 pixels with 3 color channels.
- Normalize it using the model's mean and standard deviation.
- Permute the dimensions to the CWH (Channel, Width, Height) order.
- Next, convert the values to floating-point format and add a batch dimension.
- Finally, flatten the matrix into a linear vector to be included in the JSON file sent to the model.
import numpy as np
import cv2
import json
import requests
def infer_image(image_path: str, triton_url: str, input_name: str, output_name: str) -> np.ndarray:
"""
Performs inference on an image using a Triton server.
Args:
image_path (str): Path to the image.
triton_url (str): URL of the Triton server.
input_name (str): Name of the input tensor in Triton.
output_name (str): Name of the output tensor in Triton.
Returns:
np.ndarray: Results of the inference.
"""
# Load the image
img = cv2.imread(image_path)
# Preprocess the image
img = img / 255.0
img = cv2.resize(img, (256, 256))
h, w = img.shape[:2]
y0 = (h - 224) // 2
x0 = (w - 224) // 2
img = img[y0:y0 + 224, x0:x0 + 224, :]
img = (img - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
img = np.transpose(img, axes=[2, 0, 1])
img = img.astype(np.float32)
img = np.expand_dims(img, axis=0)
# Flatten the image for the payload
img_flatten = img.flatten()
# Create the payload for the request
data = {
"inputs": [
{
"name": input_name,
"shape": list(img.shape),
"datatype": "FP32",
"data": img_flatten.tolist()
}
],
"outputs": [
{
"name": output_name
}
]
}
inference_json = json.dumps(data)
# Send the request to Triton
response = requests.post(f"{triton_url}", data=inference_json)
# Check the response
if response.status_code != 200:
raise Exception(f"Error during request: {response.text}")
# Get the results
result = np.array(response.json()['outputs'][0]['data'])
return result
Next, set some basic parameters to make the inference:
image_path = 'kitten.jpg'
triton_url = 'http://10.30.248.75:9084/v2/models/resnet/infer'
input_name = 'INPUT__0'
output_name = 'OUPUT__0'
results_number = 3
And finally we will pass the image to the infer_image function to get the results:
# Obtain the output tensor result
result = infer_image(image_path, triton_url, input_name, output_name)
# Get the indices of the top N predictions
indices = np.argpartition(result, -results_number)[-results_number:]
# Retrieve the top N predictions using the indices
top_n_predictions = result[indices]
Finally, the obtained results will be interpreted using the contents of the synset.txt file, which contains the labels for which the model has been trained. The labels with the highest probability for the selected image will be displayed.
# Load the model's labels from the archive
with open('synset.txt', 'r') as f:
labels = [l.rstrip() for l in f]
# Print the results
for i in range(results_number):
text = '%s --- prob: %f' %(labels[indices[i]][10:],top_n_predictions[i])
print(text)
# Load the image
image = Image.open(image_path)
# Create a drawing context
draw = ImageDraw.Draw(image)
# Define font and text
font = ImageFont.truetype('arial.ttf', 28)
for i in range(results_number):
text = '%s --- prob: %f' %(labels[indices[i]][10:],top_n_predictions[i])
draw.text((10, 20 + i*28), text, font=font, fill='red')
plt.imshow(image)
plt.axis('off')
plt.show()
Then we can get some results for different example images available in the repository:
- Kitten.jpg.

Jupyter Notebook - Kitten Result
- Dog.jpg

Jupyter Notebook - Dog Result
- Elephant.jpg

Jupyter Notebook - Elephant Result