OpenVisionCapsules Docs¶
Introduction¶
What is a Capsule?¶
A capsule is a single file with a .cap
file extension. It contains the
code, metadata, model files, and any other files the capsule needs to operate.
Capsules take a frame and information from other capsules as input, run some kind of analysis, and provide metadata about the frame as output. For example, a person detection capsule would take a frame as input and output person detections in that frame. A gender classifier capsule would take this frame and each person detection as input, and output a male or female classification for that detection.
Capsules provide metadata describing these inputs and outputs alongside other information on the capsule. Applications that are compatible OpenVisionCapsules use this metadata to know when to run the capsule and with what input.
File Structure¶
All capsules start off their life as unpackaged capsules. An unpackaged capsule is simply a directory containing all files that will be packaged into the capsule. This directory must contain, at minimum, a meta.conf file and a capsule.py file.
detector_person
├── meta.conf
└── capsule.py
The meta.conf file is a simple configuration file which specifies the major and minor version of OpenVisionCapsules that this capsule requires. Applications use this information to decide if your capsule is compatible with the version of OpenVisionCapsules the application uses. A capsule with a compatibility version of 0.1 are expected to be compatible with applications that use OpenVisionCapsules version 0.1 through 0.x, but not 1.x or 2.x.
[about]
api_compatibility_version = 0.1
The capsule.py file is the meat of the capsule. It contains the actual behavior of the capsule. We will talk more about the contents of this file in a later section.
If your capsule uses other files for its operation, like a model file, it should be included in this directory as well. All files in the capsule’s directory will be included and made accessible once it’s packaged.
person_detection_capsule
├── meta.conf
├── capsule.py
└── frozen_inference_graph.pb
Runtime Environment¶
Loading¶
When a capsule is loaded, the capsule.py
file is imported as a module and
an instance of the Capsule
class defined in that module is instantiated.
Then, for each compatible device, an instance of the capsule’s Backend
class is created using the provided backend_loader
function.
Importing¶
Capsules have access to a number of helpful libraries, including:
The entirety of the Python standard library
Numpy (
import numpy
)OpenCV (
import cv2
)Tensorflow (
import tensorflow
)Scikit Learn (
import sklearn
)OpenVino (
import openvino
)
Applications may provide more libraries in addition to these. Please see that application’s documentation for more information.
Importing From Other Files in the Capsule¶
In order to allow for more complex capsules that have code reuse within them, capsules may consist of multiple Python files. These files are made available through relative imports.
For example, with the following directory structure:
capsule_dir/
├── capsule.py
├── backend.py
└── utils/
├── img_utils.py
└── ml_utils.py
The capsule.py
file may import the other Python files like so:
from . import backend
from .utils import img_utils, ml_utils
Note that non-relative imports to these files will not work:
import backend
from utils import img_utils, ml_utils
Limiting GPU memory Growth¶
By default, OpenVisionCapsules maps all available memory of all visible CUDA configured GPUs. To prevent this, use the following Environment flag while using Tensorflow.
TF_FORCE_GPU_ALLOW_GROWTH=True
This Environment variable is only applicable to Tensorflow.
For proper reference, visit Tensorflow: https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth
The Capsule Class¶
Introduction¶
The Capsule class provides information to the application about what a capsule
is and how it should be run. Every capsule defines a Capsule class that extends
BaseCapsule in its capsule.py
file.
from vcap import (
BaseCapsule,
NodeDescription,
options
)
class Capsule(BaseCapsule):
name = "detector_person"
version = 1
stream_state = StreamState
input_type = NodeDescription(size=NodeDescription.Size.NONE),
output_type = NodeDescription(
size=NodeDescription.Size.ALL,
detections=["person"])
backend_loader = backend_loader_func
options = {
"threshold": options.FloatOption(
default=0.5,
min_val=0.1,
max_val=1.0)
}
-
class
BaseCapsule
(capsule_files: Dict[str, bytes], inference_mode=True)¶ An abstract base class that all capsules must subclass. Defines the interface that capsules are expected to implement.
A class that subclasses from this class is expected to be defined in a capsule.py file in a capsule.
- Parameters
capsule_files – A dict of {“file_name”: FILE_BYTES} of the files that were found and loaded in the capsule
inference_mode – If True, the model will be loaded and the backends will start for it. If False, the capsule will never be able to run inference, but it will still have it’s various readable attributes.
-
abstract static
backend_loader
(capsule_files: Dict[str, bytes], device: str) → vcap.backend.BaseBackend¶ A function that creates a backend for this capsule.
- Parameters
capsule_files – Provides access to all files in the capsule. The keys are file names and the values are
bytes
.device – A string specifying the device that this backend should use. For example, the first GPU device is specified as “GPU:0”.
-
property
description
¶ A human-readable description of what the capsule does.
-
property
device_mapper
¶ A device mapper contains a single field,
filter_func
, which is a function that takes in a list of all available device strings and returns a list of device strings that are compatible with this capsule.
-
property
input_type
¶ Describes the types of DetectionNodes that this capsule takes in as input.
-
property
name
¶ The name of the capsule. This value uniquely defines the capsule and cannot be shared by other capsules.
By convention, the capsule’s name is prefixed by some short description of the role it plays (“detector”, “recognizer”, etc) followed by the kind of data it relates to (“person”, “face”, etc) and, if necessary, some differentiating factor (“fast”, “close_up”, “retail”, etc). For example, a face detector that is optimized for quick inference would be named “detector_face_fast”.
-
property
options
¶ A list of zero or more options that can be configured at runtime.
-
property
output_type
¶ Describes the types of DetectionNodes that this capsule produces as output.
-
property
stream_state
¶ (Optional) An instance of this object will be created for every new video stream that a capsule is run on, and de-initialized when that stream is deleted. It is intended to be overridden by capsules that have stateful operations across a single stream.
-
property
version
¶ The version of the capsule.
- When should you bump the version of a capsule? When:
You’ve changed the usage of existing capsule options
You’ve changed the model or algorithm
You’ve changed the input/output node descriptions
- When shouldn’t you bump the version of a capsule? When:
You only did code restructuring
You’ve updated the capsule to work with a newer API version
You’ve added (but not removed or changed previous) capsule options
In summary, the version is most useful for differentiating a capsule from its previous versions.
Backends¶
Introduction¶
A backend is what provides the low-level analysis on a video frame. For machine learning, this is the place where the frame would be fed into the model and the results would be returned. Every capsule must define a backend class that subclasses the BaseBackend class.
The application will create an instance of the backend class for each device string returned by the capsule’s device mapper.
Required Methods¶
All backends must subclass the BaseBackend abstract base class, meaning that there are a couple methods that the backend must implement.
-
class
BaseBackend
¶ An object that provides low-level prediction functionality for batches of frames.
-
close
() → None¶ De-initializes the backend. This is called when the capsule is being unloaded. This method should be overridden by any Backend that needs to release resources or close other threads.
The backend will stop receiving frames before this method is called, and will not receive frames again.
-
abstract
process_frame
(frame: numpy.ndarray, detection_node: Union[None, vcap.detection_node.DetectionNode, List[vcap.detection_node.DetectionNode]], options: Dict[str, Union[int, float, bool, str]], state: vcap.stream_state.BaseStreamState) → Union[None, vcap.detection_node.DetectionNode, List[vcap.detection_node.DetectionNode]]¶ A method that does the pre-processing, inference, and postprocessing work for a frame.
If the capsule uses an algorithm that benefits from batching, this method may call
self.send_to_batch
, which will asynchronously send work out for batching. Doing so requires that thebatch_predict
method is overridden.- Parameters
frame – A numpy array representing a frame. It is of shape (height, width, num_channels) and the frames come in BGR order.
detection_node – The detection_node type as specified by the
input_type
options – A dictionary of key (string) value pairs. The key is the name of a capsule option, and the value is its configured value at the time of processing. Capsule options are specified using the
options
field in the Capsule class.state – This will be a StreamState object of the type specified by the
stream_state
attribute on the Capsule class. If no StreamState object was specified, a simple BaseStreamState object will be passed in. The StreamState will be the same object for all frames in the same video stream.
-
Batching Methods¶
Batching refers to the process of collecting more than one video frame into a “batch” and sending them all out for processing at once. Certain algorithms see performance improvements when batching is used, because doing so decreases the amount of round-trips the video frames take between devices.
If you wish to use batching in your capsule, you may call the send_to_batch
method in process_frame
instead of doing analysis in that method directly.
The send_to_batch
method sends the input to a BatchExecutor
which collects
inference requests for this capsule from different streams. Then, the
BatchExecutor
routinely calls your backend’s batch_predict
method with a
list of the collected inputs. As a result, users of send_to_batch
must
override the batch_predict
method in addition to the other required methods.
The send_to_batch
method is asynchronous. Instead of immediately returning
analysis results, it returns a concurrent.futures.Future
where the result will be provided.
Simple batching capsules may call send_to_batch
, then immediately call
result
to block for the result.
result = self.send_to_batch(frame).result()
An argument of any type may be provided to send_to_batch
, as the argument
will be passed in a list to batch_predict
without modification. In many
cases only the video frame needs to be provided, but additional metadata may be
included as necessary to fit your algorithm’s needs.
-
class
BaseBackend
¶ An object that provides low-level prediction functionality for batches of frames.
-
batch_predict
(input_data_list: List[Any]) → List[Any]¶ This method takes in a batch as input and provides a list of result objects of any type as output. What the result objects are will depend on the algorithm being defined, but the number of prediction objects returned _must_ match the number of video frames provided as input.
- Parameters
input_data_list – A list of objects. Whatever the model requires for each frame.
-
Inputs and Outputs¶
Introduction¶
Capsules are defined by the data they take as input and the information they give as output. Applications use this information to connect capsules to each other and schedule their execution. These inputs and outputs are defined by NodeDescription objects and realized by DetectionNode objects.
-
class
DetectionNode
(*, name: str, coords: List[List[Union[int, float]]], attributes: Dict[str, str] = None, children: List[DetectionNode] = None, encoding: Optional[numpy.ndarray] = None, track_id: Optional[uuid.UUID] = None, extra_data: Dict[str, object] = None)¶ Capsules use DetectionNode objects to communicate results to other capsules and the application itself. A DetectionNode contains information on a detection in the current frame. Capsules that detect objects in a frame create new DetectionNodes. Capsules that discover attributes about detections add data to existing DetectionNodes.
- Parameters
name – The detection class name. This describes what the detection is. A detection of a person would have a name=”person”.
coords – A list of coordinates defining the detection as a polygon in-frame. Comes in the format
[[x,y], [x,y]...]
.attributes – A key-value store where the key is the type of attribute being described and the value is the attribute’s value. For instance, a capsule that detects gender might add a “gender” key to this dict, with a value of either “masculine” or “feminine”.
children – Child DetectionNodes that are a “part” of the parent, for instance, a head DetectionNode might be a child of a person DetectionNode
encoding – An array of float values that represent an encoding of the detection. This can be used to recognize specific instances of a class. For instance, given a picture of person’s face, the encoding of that face and the encodings of future faces can be compared to find that person in the future.
track_id – If this object is tracked, this is the unique identifier for this detection node that ties it to other detection nodes in future and past frames within the same stream.
extra_data – A dict of miscellaneous data. This data is provided directly to clients without modification, so it’s a good way to pass extra information from a capsule to other applications.
-
class
NodeDescription
(*, size: vcap.node_description.NodeDescription.Size, detections: List[str] = None, attributes: Dict[str, List[str]] = None, encoded: bool = False, tracked: bool = False, extra_data: List[str] = None)¶ Capsules use NodeDescriptions to describe the kinds of DetectionNodes they take in as input and produce as output.
A capsule may take a DetectionNode as input and produce zero or more DetectionNodes as output. Capsules define what information inputted DetectionNodes must have and what information outputted detection nodes will have using NodeDescriptions.
For example, a capsule that encodes people and face detections would use NodeDescriptions to define its inputs and outputs like so:
>>> input_type = NodeDescription( ... detections=["person", "face"]) >>> output_type = NodeDescription( ... detections=["person", "face"], ... encoded=True)
A capsule that uses a car’s encoding to classify the color of a car would look like this.
>>> input_type = NodeDescription( ... detections=["car"], ... encoded=True) >>> output_type = NodeDescription( ... detections=["car"], ... attributes={"color": ["blue", "yellow", "green"]}, ... encoded=True)
A capsule that detects dogs and takes no existing input would look like this.
>>> input_type = NodeDescription(size=NodeDescription.Size.NONE) >>> output_type = NodeDescription( >>> size=NodeDescription.Size.ALL, >>> detections=["dog"])
- Parameters
size – The number of DetectionNodes that this capsule either takes as input or provides as output
detections – A list of acceptable detection class names. A node that meets this description must have a class name that is present in this list
attributes – A dict whose key is the classification type and whose value is a list of possible attributes. A node that meets this description must have a classification for each classification type.
encoded – If true, the DetectionNode must be encoded to meet this description
tracked – If true, the DetectionNode is being tracked
extra_data – A list of keys in a NodeDescription’s extra_data. A DetectionNode that meets this description must have extra data for each name listed here.
Examples¶
detections¶
A capsule that can encode cars or trucks would use a NodeDescription like this
as its input_type
:
NodeDescription(detections=["car", "truck"])
A capsule that can detect people and dogs would use a NodeDescription like this
as its output_type
:
NodeDescription(detections=["person", "dog"])
attributes¶
A capsule that operates on detections that have been classified for gender use
a NodeDescription like this as its input_type
:
NodeDescription(
attributes={
"gender": ["male", "female"],
"color": ["red", "blue", "green"]
})
A capsule that can classify people’s gender as either male or female would have
the following NodeDescription as its output_type
:
NodeDescription(
detections=["person"],
attributes={
"gender": ["male", "female"]
})
encoded¶
A capsule that operates on detections of cars that have been encoded use a
NodeDescription like this as its input_type
:
NodeDescription(
detections=["car"],
encoded=True)
A capsule that encodes people would use a NodeDescription like this as its
output_type
:
NodeDescription(
detections=["person"],
encoded=True)
tracked¶
A capsule that operates on person detections that have been tracked would use a
NodeDescription like this as its input_type
.
NodeDescription(
detections=["person"],
tracked=True)
A capsule that tracks people would use a NodeDescription like this as its
output_type
:
NodeDescription(
detections=["person"],
tracked=True)
extra_data¶
A capsule that operates on people detections with a “process_extra_fast”
extra_data
field would use a NodeDescription like this as its
input_type
:
NodeDescription(
detections=["person"],
extra_data=["process_extra_fast"])
A capsule that adds an “is_special” extra_data
field to its person-detected
output would use a NodeDescription like this as its output_type
:
NodeDescription(
detections=["person"],
extra_data=["is_special"])
Options¶
Introduction¶
Capsules can provide runtime configuration options that change the way the capsule operates. These options will appear on the client and can also be changed in the UI. Options have a type and constraints that define what values are valid.
-
class
FloatOption
(*, default: float, min_val: Optional[float], max_val: Optional[float], description: Optional[str] = None)¶ A capsule option that holds a floating point value with defined boundaries.
- Parameters
default – The default value of this option
min_val – The minimum allowed value for this option, inclusive, or None for no lower limit
max_val – The maximum allowed value for this option, inclusive, or None for no upper limit
description – The description for this option
-
class
IntOption
(*, default: int, min_val: Optional[int], max_val: Optional[int], description: Optional[str] = None)¶ A capsule option that holds an integer value.
- Parameters
default – The default value of this option
min_val – The minimum allowed value for this option, inclusive, or None for no lower limit
max_val – The maximum allowed value for this option, inclusive, or None for no upper limit
description – The description for this option
-
class
EnumOption
(*, default: str, choices: List[str], description: Optional[str] = None)¶ A capsule option that holds a choice from a discrete set of string values.
- Parameters
default – The default value of this option
choices – A list of all valid values for this option
description – The description for this option
-
class
BoolOption
(*, default: bool, description: Optional[str] = None)¶ A capsule option that holds an boolean value.
- Parameters
default – The default value of this option
description – The description for this option
Creating a Capsule¶
For application developers, OpenVisionCapsules provides a function to package
up unpackaged capsules. The optional key
field encrypts the capsule with
AES.
from vcap import package_capsule
package_capsule(Path("detector_person"),
Path("capsules", "detector_person.cap"),
key="[AES Key]")
For capsule developers for an application, it is the job of the application to provide a way to package capsules. Please see the documentation for the application you are using for more information.
Creating an Object Detector Capsule with Supervisely¶
If you’ve trained tensorflow-object-detection-API object detector using Supervisely, you can follow the following steps to deploy your model as a capsule:
Set up the TF Object Detection API¶
First, set up the Tensorflow Object Detection API on your machine by cloning
the tensorflow/models
repository and following the object detection API
installation instructions. Make sure the tests pass before continuing-
otherwise, you might have forgotten to set up certain environment variables!
Freeze your trained Supervisely model¶
Next, you must download your trained Supervisely model and extract it. Inside you should see the following directory structure:
<DOWNLOADED MODEL DIR>
├── config.json
├── model.config
└── model_weights
├── checkpoint
├── model.ckpt.data-00000-of-00001
├── model.ckpt.index
└── model.ckpt.meta
Now, you must simply freeze the model to get the frozen_inference_graph.pb
.
To do that, run models/research/object_detection/export_inference_graph.py
script inside of your downloaded model directory.
python PATH/TO/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path model.config \
--trained_checkpoint_prefix model_weights/model.ckpt \
--output_directory .
There should now be a frozen_inference_graph.pb in the current directory. This is the model file that has been optimized for inference, and is much more portable for production use.
Stream State¶
It is sometimes desirable to carry state throughout the lifetime of an entire video stream, rather than on a frame-by-frame basis. This is where StreamState comes in.
If the stream_state
field of the capsule’s Capsule class is set, the
process_frame
method for your capsule’s backend will be passed an instance
of the provided class. Any state that should exist for the duration of the
videostream may be saved here.
This is commonly used by capsules that track objects between video frames. Information on previous detections can be stored in the StreamState object and read when new detections are found. This can also be useful for result smoothing, for caching frames (think RNN models), and many other use cases.
A capsule’s StreamState class does not need to implement any methods and has no functional purpose outside of the capsule.
This is a guide on how to encapsulate an algorithm using OpenVisionCapsules. Capsules are discrete components that define new ways to analyze video streams.
This guide discusses the Open Vision Capsule system generally, not any specific
information on how to write a Capsule of a certain type or with certain
technology. Example capsules are available under vcap/examples
that show
how to encapsulate models from various popular machine learning frameworks.