A backend is what provides the low-level analysis on a video frame. For machine learning, this is the place where the frame would be fed into the model and the results would be returned. Every capsule must define a backend class that subclasses the BaseBackend class.

The application will create an instance of the backend class for each device string returned by the capsule’s device mapper.

Required Methods

All backends must subclass the BaseBackend abstract base class, meaning that there are a couple methods that the backend must implement.

class BaseBackend

An object that provides low-level prediction functionality for batches of frames.

close() → None

De-initializes the backend. This is called when the capsule is being unloaded. This method should be overridden by any Backend that needs to release resources or close other threads.

The backend will stop receiving frames before this method is called, and will not receive frames again.

abstract process_frame(frame: numpy.ndarray, detection_node: Union[None, vcap.detection_node.DetectionNode, List[vcap.detection_node.DetectionNode]], options: Dict[str, Union[int, float, bool, str]], state: vcap.stream_state.BaseStreamState) → Union[None, vcap.detection_node.DetectionNode, List[vcap.detection_node.DetectionNode]]

A method that does the pre-processing, inference, and postprocessing work for a frame.

If the capsule uses an algorithm that benefits from batching, this method may call self.send_to_batch, which will asynchronously send work out for batching. Doing so requires that the batch_predict method is overridden.

  • frame – A numpy array representing a frame. It is of shape (height, width, num_channels) and the frames come in BGR order.

  • detection_node – The detection_node type as specified by the input_type

  • options – A dictionary of key (string) value pairs. The key is the name of a capsule option, and the value is its configured value at the time of processing. Capsule options are specified using the options field in the Capsule class.

  • state – This will be a StreamState object of the type specified by the stream_state attribute on the Capsule class. If no StreamState object was specified, a simple BaseStreamState object will be passed in. The StreamState will be the same object for all frames in the same video stream.

Batching Methods

Batching refers to the process of collecting more than one video frame into a “batch” and sending them all out for processing at once. Certain algorithms see performance improvements when batching is used, because doing so decreases the amount of round-trips the video frames take between devices.

If you wish to use batching in your capsule, you may call the send_to_batch method in process_frame instead of doing analysis in that method directly. The send_to_batch method sends the input to a BatchExecutor which collects inference requests for this capsule from different streams. Then, the BatchExecutor routinely calls your backend’s batch_predict method with a list of the collected inputs. As a result, users of send_to_batch must override the batch_predict method in addition to the other required methods.

The send_to_batch method is asynchronous. Instead of immediately returning analysis results, it returns a concurrent.futures.Future where the result will be provided. Simple batching capsules may call send_to_batch, then immediately call result to block for the result.

result = self.send_to_batch(frame).result()

An argument of any type may be provided to send_to_batch, as the argument will be passed in a list to batch_predict without modification. In many cases only the video frame needs to be provided, but additional metadata may be included as necessary to fit your algorithm’s needs.

class BaseBackend

An object that provides low-level prediction functionality for batches of frames.

batch_predict(input_data_list: List[Any]) → List[Any]

This method takes in a batch as input and provides a list of result objects of any type as output. What the result objects are will depend on the algorithm being defined, but the number of prediction objects returned _must_ match the number of video frames provided as input.


input_data_list – A list of objects. Whatever the model requires for each frame.