What is VisionCapsule

VisionCapsule is a lightweight, encapsulated AI model or algorithm used by BrainFrame. It is self-contained, allowing end users to download it from an AI marketplace and run the model within BrainFrame with no engineering involved. If you launch the AI marketplace application included in BrainFrame, you will see a list of VisionCapsules in categories such as detectors, classifiers, encoders, and trackers.

An important property of VisionCapsule is interoperability, meaning that two VisionCapsules can be developed independently but remain able to connect and exchange data with perfect compute efficiency via BrainFrame. For example, a user can download a tracker capsule and a basketball detector capsule developed independently by different developers; when these two capsules are loaded at run-time into BrainFrame, a basketball can be detected and tracked with real-time trajectory tracking.

The Foundation: OpenVisionCapsules

The foundation of these properties is OpenVisionCapsules, the interoperable AI model and algorithm encapsulation format invented by our team and released through OpenCV. This powerful format enables dynamic loading and unloading at runtime, as well as automated algorithm fusion in AI inference pipelines. Together with support for leading AI accelerator chipsets and various machine learning frameworks, BrainFrame achieves “One-Click” AI deployment, allowing users to download and run sophisticated inference pipelines in mass-scale production environments.

Why OpenVisionCapsules?

AI algorithms often require use-case-specific data to train or fine-tune a model. In real-world scenarios, the variation of an AI model quickly grows into the hundreds or thousands, even if the algorithm itself remains the same. While it is feasible to invest engineering effort into “mega-applications,” it is often impossible to do so for “long-tail” applications.

OpenVisionCapsules allows independent developers to collaborate and solve real-world problems in production together, whether for detection, tracking, classification, or recognition. For example, a developer can fine-tune a detector for a specific use case while reusing a universal IOU tracking algorithm developed by someone else.

OpenVisionCapsules is a portable AI model and algorithm encapsulation format. It is a lightweight software encapsulation on top of the Neural Network Model framework, making it compatible with all AI model frameworks. By defining standardized inputs, outputs, and configurations, OpenVisionCapsules is a self-contained and interoperable smart-vision standard designed for both developers and non-developers.

Key Technical Properties

  • Self-contained portable format: Defines standardized preprocessing, postprocessing, and algorithm control/tuning configurations for model inference.
  • Universal Compatibility: Compatible with all AI model frameworks (e.g., TensorFlow, PyTorch, Keras, Caffe, ONNX) and all hardware acceleration frameworks (e.g., OpenVINO, TensorRT).
  • Tiny footprint: Negligible computing and memory overhead for optimal inference performance.
  • Developer-Friendly: Easy to learn with templates requiring ~50 lines of code.
  • High Performance: Near-instantaneous loading/unloading and AI algorithm fusion at runtime to support sophisticated AI use cases.

Industry Impact

OpenVisionCapsules has been contributed to OpenCV.org as an industry standard for the following purposes:

  • For Users: An open architecture designed to provide transparency and trust, enabling an end-user experience of “one-click” downloading and running from an AI Marketplace.
  • For Hardware Vendors & System Integrators: An open standard for interoperability across machines—from edge to cloud—to enable distributed AI inference. It remains agnostic to the underlying hardware acceleration architectures and machine learning frameworks.

For AI Developers: Enables developers to encapsulate any model and collaborate globally with automated algorithm fusion technology. It also includes encryption to protect the intellectual property (IP) of algorithms and neural network models.

Compare neural network model formats with the OpenVisionCapsules format:

OpenVisionCapsules architecture

The architecture defines how an AI model is encapsulated into the OpenVisionCapsules format. As a developer, you use the Capsule SDK to package your model into a .cap file (a zipped package, which can be encrypted or non-encrypted).

Capsule code is written in Python. You specify the hardware acceleration type, define the input and output data types, and implement customized preprocessing and postprocessing logic. In addition, you expose the configuration of the algorithm with a standardized API so it can be fine-tuned at runtime. With all the above information encapsulated, the OpenVisionCapsules format becomes completely self-contained, enabling an AI OS like BrainFrame to load and execute the capsule seamlessly.

 

 

Resources

OpenCV.org/OpenVisionCapsules is released as a BSD license. The source code is available here:

https://github.com/opencv/open_vision_capsules

OpenVisionCapsules downloads: https://aotu.ai/docs/downloads/

Open-sourced OpenVisionCapsules: https://github.com/aotuai/capsule_zoo.git

OpenVisionCapsules documentation: https://openvisioncapsules.readthedocs.io/en/latest/

OpenVisionCapsules development tutorials: https://aotu.ai/docs/tutorials/capsules/installing_a_capsule/