Skip to main content

Serve a PyTorch model with Triton

This article refers to Platform v3.0.0. The current Platform version is v3.2.0.

Overview

This tutorial walks you through deploying a PyTorch model on a Barbara edge node, served by NVIDIA Triton Inference Server. We use ResNet18 as the example — a small, well-known image classifier — but the same workflow applies to any TorchScript-exported model. The three phases are:

  1. Prepare the model — export ResNet18 to TorchScript and write its Triton config.pbtxt.
  2. Deploy it — upload the zipped model to the App Library, then deploy it to an edge node with the model wizard.
  3. Run inference — send images to the model's REST endpoint and interpret the predictions.
About ResNet18

ResNet18 is a small image-classification network. It is good enough to recognise the 1000 ImageNet classes and small enough to fit comfortably on most edge hardware.

Source code

The full source code for this tutorial — model export, payloads, Jupyter notebook — lives at Barbaraedge/resnet18_pytorch_tutorial on GitHub.

1. Prepare the model

Export ResNet18 to TorchScript

warning

Install PyTorch on your laptop before running this script.

import torch
from torchvision import models

# Download the pre-trained ResNet18 model
model = models.resnet18(weights=True)

# Switch to evaluation mode
model.eval()

# Option 1: trace the model with a sample input
dummy_input = torch.randn(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, dummy_input)

# Option 2: scripted export (more flexible, supports control flow)
script_module = torch.jit.script(model)

# Save the TorchScript artefact
torch.jit.save(traced_script_module, "resnet18_traced.pt")
# or
torch.jit.save(script_module, "resnet18.pt")

The script produces a resnet18_traced.pt (or resnet18.pt) file — the TorchScript artefact Triton will load.

Write config.pbtxt

Triton needs a config.pbtxt that describes the model — name, format, batch size, and the shape and type of each input and output. A minimal config for ResNet18 looks like this:

name: "resnet"
platform: "pytorch_libtorch"
max_batch_size: 1
input [
{
name: "INPUT__0"
data_type: TYPE_FP32
dims: [3, -1, -1]
}
]
output [
{
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: [1]
}
]
FieldAccepted valuesDescription
namestringModel name. Must match the folder name the model lives in.
platformpytorch_libtorch, tensorflow_savedmodel, onnxruntime_onnxBackend that serves the model.
max_batch_sizeintegerMaximum batch size Triton may build.
input / output namestringLayer name.
input / output data_typeTYPE_BOOL, TYPE_INT8, TYPE_UINT8, TYPE_INT16, TYPE_UINT16, TYPE_INT32, TYPE_UINT32, TYPE_INT64, TYPE_UINT64, TYPE_FP16, TYPE_FP32, TYPE_FP64, TYPE_STRINGTensor element type.
input / output dimsarray of integersTensor shape. Use -1 for a variable dimension.
Want to learn more?

Package as a zip

Lay out the files in the folder structure Triton expects, then zip the top-level folder:

Triton model folder structure

Triton model folder structure

resnet.zip

You can download the ready-made package here: resnet.zip.

2. Upload the model to the App Library

  1. Open the App Library in Barbara Panel.
  2. Click New.
New model button

New model button

  1. Choose Model in the Select Library Item popup.
Select Library Item popup

Select Library Item popup

  1. Name the model (for example, ResNet18).
Name the model

Name the model

  1. Under Select a model, pick PyTorch. Panel will auto-select Triton as the serving engine.
Select PyTorch + Triton

Select PyTorch + Triton

  1. Click Select file and upload the zip (resnet.zip).
Select the model zip

Select the model zip

  1. Set the version (for example, 1.0.0).
  2. Optionally expand Advanced to add a description and release notes.
Set the version name

Set the version name

  1. Click Create.
Create the model

Create the model

The model appears under the Models tab of the Library.

Model added to the Library

Model added to the Library

3. Deploy the model to an edge node

  1. Open the Node Details page of the target node and click Add card.
Node Details page

Node Details page

  1. Pick Model in the card picker.
Add a Model card

Add a Model card

  1. The Model Install wizard opens. Tick Serve model using GPU if the node has a GPU.
Model Install wizard

Model Install wizard

  1. Pick the ResNet18 model.
Select ResNet18

Select ResNet18

  1. Pick version 1.0.0 and click Next.
Select version 1.0.0

Select version 1.0.0

  1. On the Compose Config step, adjust the gRPC and REST ports if needed (defaults work for the first model on a node). Click Send Model.
Compose Config — choose ports

Compose Config — choose ports

A ResNet18 workload card appears on the Node Details page. When the status indicator turns green (Started), the model is being served. The card surfaces the REST and gRPC URLs you will use for inference.

ResNet18 workload card with inference URLs

ResNet18 workload card with inference URLs

4. Run inference

resnet_inference.ipynb

Download the Jupyter notebook for this section: resnet_inference.zip.

Jupyter notebook with the inference helper

Jupyter notebook with the inference helper

To classify an image, send a POST request to the model's REST endpoint:

http://<NODE_IP>:<REST_PORT>/v2/models/resnet/infer
warning

If your laptop is on a different network than the edge node, enable the VPN and use the node's VPN IP.

Payload structure

The Triton REST API expects a JSON body shaped like this:

{
"inputs": [
{
"name": "<input layer name>",
"shape": "<input tensor shape>",
"datatype": "<input tensor datatype>",
"data": "<input tensor data, as a flat list>"
}
],
"outputs": [
{
"name": "<output layer name>"
}
]
}

Preprocess the image

ResNet18 expects a specific input shape and normalisation:

  1. Resize to 256×256 and centre-crop to 224×224.
  2. Normalise with the standard ImageNet mean and standard deviation.
  3. Permute the dimensions to CHW (Channel, Height, Width).
  4. Cast to float32 and add a batch dimension.
  5. Flatten the array into the linear payload for the JSON body.

The reference implementation:

import cv2
import json
import numpy as np
import requests


def infer_image(image_path: str, triton_url: str, input_name: str, output_name: str) -> np.ndarray:
"""Run inference on an image against a Triton-served model."""
img = cv2.imread(image_path)

# Preprocess
img = img / 255.0
img = cv2.resize(img, (256, 256))
h, w = img.shape[:2]
y0 = (h - 224) // 2
x0 = (w - 224) // 2
img = img[y0:y0 + 224, x0:x0 + 224, :]
img = (img - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
img = np.transpose(img, axes=[2, 0, 1])
img = img.astype(np.float32)
img = np.expand_dims(img, axis=0)

img_flatten = img.flatten()

data = {
"inputs": [
{
"name": input_name,
"shape": list(img.shape),
"datatype": "FP32",
"data": img_flatten.tolist(),
}
],
"outputs": [{"name": output_name}],
}

response = requests.post(triton_url, data=json.dumps(data))
if response.status_code != 200:
raise Exception(f"Error during request: {response.text}")

return np.array(response.json()["outputs"][0]["data"])

Call the model

Set the parameters and run inference:

image_path = "kitten.jpg"
triton_url = "http://10.30.248.75:9084/v2/models/resnet/infer"
input_name = "INPUT__0"
output_name = "OUTPUT__0"
results_number = 3

result = infer_image(image_path, triton_url, input_name, output_name)
indices = np.argpartition(result, -results_number)[-results_number:]
top_n_predictions = result[indices]

Interpret the output

The output is a vector of 1000 probabilities — one per ImageNet class. Map the top indices back to human-readable labels using the synset.txt shipped with the notebook:

with open("synset.txt") as f:
labels = [line.rstrip() for line in f]

for i in range(results_number):
print(f"{labels[indices[i]][10:]} --- prob: {top_n_predictions[i]:.6f}")

Example outputs against the sample images shipped with the GitHub repository:

Top predictions for kitten.jpg

Top predictions for kitten.jpg

Top predictions for dog.jpg

Top predictions for dog.jpg

Top predictions for elephant.jpg

Top predictions for elephant.jpg

Summary

You exported a PyTorch model to TorchScript, packaged it with a Triton config.pbtxt, uploaded it as a model to the App Library, deployed it to an edge node with the Model Install wizard, and ran image inferences against the REST endpoint. The same workflow generalises to any TorchScript model — change the config.pbtxt input/output shapes and adapt the pre-processing in the client.