Skip to content

Creating an OpenVINO Classifier Capsule

Introduction

This tutorial will guide you through encapsulating an OpenVINO classifier model. For this tutorial, we will be using the vehicle-attributes-recognition-barrier-0039 model from the Open Model Zoo, but the concepts shown here apply to all OpenVINO classifiers. You can find the complete capsule on the Capsule Zoo. This model is able to classify the color of a detected vehicle.

This capsule will rely on the detector created in the previous tutorial to find vehicles in the video frame before they can be classified.

Getting Started

Like in the previous tutorial, we will create a new directory for the classifier capsule. This time we will name it classifier_vehicle_color_openvino. We will also add a meta.conf with the same contents, declaring that our capsule relies on version 0.2 or higher of OpenVisionCapsules.

[about]
api_compatibility_version = 0.2

We will also add the weights and model files to this directory so that they can be loaded by the capsule.

The Capsule Class

The Capsule class defined here will be very similar in structure to the one in the detector capsule.

from vcap import BaseCapsule, NodeDescription, DeviceMapper

from .backend import Backend
from . import config


class Capsule(BaseCapsule):
    name = "classifier_vehicle_color_openvino"
    description = "OpenVINO vehicle color classifier."
    version = 1
    device_mapper = DeviceMapper.map_to_openvino_devices()
    input_type = NodeDescription(
        size=NodeDescription.Size.SINGLE,
        detections=["vehicle"])
    output_type = NodeDescription(
        size=NodeDescription.Size.SINGLE,
        detections=["vehicle"],
        attributes={"color": config.colors})
    backend_loader = lambda capsule_files, device: Backend(
        model_xml=capsule_files[
            "vehicle-attributes-recognition-barrier-0039.xml"],
        weights_bin=capsule_files[
            "vehicle-attributes-recognition-barrier-0039.bin"],
        device_name=device
    )

Let's take a look at some of the differences between this Capsule class and the detector's.

input_type = NodeDescription(
    size=NodeDescription.Size.SINGLE,
    detections=["vehicle"])

This capsule takes vehicle detections produced by the detector capsule as input. Each vehicle found in the video frame is processed one at a time.

output_type = NodeDescription(
    size=NodeDescription.Size.SINGLE,
    detections=["vehicle"],
    attributes={"color": config.colors})

This capsule provides a vehicle detection with a color attribute as output. Note that classifier capsules do not create new detections. Instead, they augment the detections provided to them by other capsules. We've moved the list of colors out into a separate config.py file so that it can also be referenced by the backend, which we will define in the next section.

# config.py
colors = ["white", "gray", "yellow", "red", "green", "blue", "black"]

You may have noticed that this capsule does not have any options. The options field can be omitted when the capsule doesn't have any parameters that can be modified at runtime.

The Backend Class

We will once again create a file called backend.py where the Backend class will be defined. It will still subclass BaseOpenVINOBackend and we will only need to implement the process_frame method.

from collections import namedtuple
from typing import Dict

import numpy as np

from vcap import (
    Resize,
    DETECTION_NODE_TYPE,
    OPTION_TYPE,
    BaseStreamState)
from vcap_utils import BaseOpenVINOBackend

from . import config


class Backend(BaseOpenVINOBackend):

    def process_frame(self, frame: np.ndarray,
                      detection_node: DETECTION_NODE_TYPE,
                      options: Dict[str, OPTION_TYPE],
                      state: BaseStreamState) -> DETECTION_NODE_TYPE:
        crop = Resize(frame).crop_bbox(detection_node.bbox).frame

        input_dict, _ = self.prepare_inputs(crop)
        prediction = self.send_to_batch(input_dict).get()

        max_color = config.colors[prediction["color"].argmax()]

        detection_node.attributes["color"] = max_color

Let's review this method line-by-line.

crop = Resize(frame).crop_bbox(detection_node.bbox).frame

Capsules always receive the entire video frame, so we need to start by cropping the frame to the detected vehicle.

input_dict, _ = self.prepare_inputs(crop)

We then prepare the cropped video frame to be fed into the model. The video frame is resized to fit into the model and formatted in the way the model expects. We can ignore the second return value, the resize object, because classifiers don't provide any coordinates that need adjusting.

prediction = self.send_to_batch(input_dict).get()

Next, the input data is sent into the model for batch processing. The call to get causes the backend to block until the result is ready. The results are objects with raw OpenVINO prediction information.

max_color = config.colors[prediction["color"].argmax()]

We then pull the color information from the prediction, and choose the color with the highest confidence. We then convert the color from its integer representation to a human-readable string using the colors list defined in config.py.

detection_node.attributes["color"] = max_color

Finally, we augment the vehicle detection with the new "color" attribute. This capsule does not need to return anything because no new detections have been created.

Wrapping Up

Finally, the capsule is complete! Your data directory should look something like this:

your_data_directory
├── volumes
└── capsules
    └── classifier_vehicle_color_openvino
        ├── backend.py
        ├── capsule.py
        ├── config.py
        ├── meta.conf
        ├── vehicle-attributes-recognition-barrier-0039.bin
        └── vehicle-attributes-recognition-barrier-0039.xml

When you restart BrainFrame, your capsule will be packaged into a .cap file and initialized. You'll see its information on the BrainFrame client.

Load up a video stream to see classification results.