MediaTek Genio board running object detection inference on a live camera feed
mediatekgeniocomputer visiongstreameropencvtflitenpucameraedge ai

MediaTek Genio for computer vision: a practical guide

Aaron Angulo ·

Genio is well-matched to computer vision applications: it has MIPI CSI-2 interfaces with hardware ISP, a Mali GPU for OpenCL/Vulkan compute, and an MDLA NPU for inference. The challenge is wiring these together efficiently — the wrong capture path adds unnecessary CPU copies that hurt latency and throughput. This guide covers the full stack from camera to inference to output.

Key Insights

  • GStreamer is the right capture layer — it uses hardware ISP and keeps frames in device memory; OpenCV VideoCapture forces a CPU copy on MIPI cameras
  • NNStreamer bridges GStreamer and inference — it runs TFLite/ONNX inference as a GStreamer element with no copy between the pipeline and the model
  • INT8 quantized models on the NPU are 4–6× faster than FP32 on CPU; quantize your models before deployment
  • USB cameras are simpler to start — they appear as standard V4L2 devices, no ISP bring-up required; switch to MIPI CSI for production
  • OpenCV for post-processing, GStreamer for capture — use each for what it’s good at; don’t use OpenCV VideoCapture for MIPI cameras

Camera capture options

Option 1: USB camera (V4L2 / UVC)

The easiest starting point. USB cameras appear as /dev/videoN on Genio and work with standard V4L2 tools immediately.

# List connected cameras
v4l2-ctl --list-devices

# Check supported formats
v4l2-ctl -d /dev/video0 --list-formats-ext

# Capture test frame
v4l2-ctl -d /dev/video0 --stream-mmap --stream-count=1 \
  --stream-to=frame.raw

GStreamer capture from USB camera:

gst-launch-1.0 \
  v4l2src device=/dev/video0 ! \
  video/x-raw,format=YUY2,width=640,height=480,framerate=30/1 ! \
  videoconvert ! \
  video/x-raw,format=RGB ! \
  autovideosink

Option 2: MIPI CSI camera (ISP pipeline)

MIPI CSI cameras connect to the Genio CSI connector and go through the hardware ISP. The ISP handles auto-exposure, auto-white-balance, and noise reduction automatically.

# Check that the camera sensor was probed
dmesg | grep -i "sensor\|imx\|ov"

# List V4L2 devices including subdevices
v4l2-ctl --list-devices
media-ctl -d /dev/media0 --print-topology

GStreamer MIPI CSI capture (using libcamera on Ubuntu):

gst-launch-1.0 \
  libcamerasrc camera-name="/base/soc/seninf@1a040000/port@0" ! \
  video/x-raw,format=NV12,width=1920,height=1080,framerate=30/1 ! \
  videoconvert ! autovideosink

On Yocto with V4L2 backend:

gst-launch-1.0 \
  v4l2src device=/dev/video0 ! \
  video/x-raw,format=NV12,width=1920,height=1080 ! \
  videoconvert ! autovideosink

OpenCV with GStreamer capture

Avoid cv2.VideoCapture(0) for MIPI CSI cameras — it goes through V4L2 with a CPU copy. Use a GStreamer pipeline with appsink to feed frames into OpenCV:

import cv2
import numpy as np

# GStreamer pipeline that feeds into OpenCV
pipeline = (
    "v4l2src device=/dev/video0 ! "
    "video/x-raw,format=BGR,width=640,height=480,framerate=30/1 ! "
    "appsink name=sink max-buffers=1 drop=true sync=false"
)

cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # frame is a 480x640x3 BGR numpy array
    # Process with OpenCV
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    edges = cv2.Canny(gray, 100, 200)

    cv2.imshow("Edges", edges)
    if cv2.waitKey(1) == ord('q'):
        break

cap.release()

TFLite inference with Neuron Stable Delegate

For object detection (SSD MobileNet example):

import tflite_runtime.interpreter as tflite
import cv2
import numpy as np

# Load model with NPU acceleration
interpreter = tflite.Interpreter(
    model_path="ssd_mobilenet_v2_int8.tflite",
    experimental_delegates=[
        tflite.load_delegate("libNeuronStableDelegate.so")
    ]
)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']  # [1, 300, 300, 3]
height, width = input_shape[1], input_shape[2]

# GStreamer capture
pipeline = (
    f"v4l2src device=/dev/video0 ! "
    f"video/x-raw,format=RGB,width=640,height=480 ! "
    f"videoscale ! video/x-raw,width={width},height={height} ! "
    f"appsink max-buffers=1 drop=true sync=false"
)
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Preprocess
    input_data = np.expand_dims(frame, axis=0)
    if input_details[0]['dtype'] == np.uint8:
        input_data = input_data.astype(np.uint8)
    else:
        input_data = ((input_data / 255.0 - 0.5) / 0.5).astype(np.float32)

    # Inference
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()

    # SSD MobileNet outputs
    boxes   = interpreter.get_tensor(output_details[0]['index'])[0]
    classes = interpreter.get_tensor(output_details[1]['index'])[0]
    scores  = interpreter.get_tensor(output_details[2]['index'])[0]

    # Draw boxes with score > 0.5
    h, w = frame.shape[:2]
    for i, score in enumerate(scores):
        if score < 0.5:
            break
        ymin, xmin, ymax, xmax = boxes[i]
        cv2.rectangle(frame,
            (int(xmin * w), int(ymin * h)),
            (int(xmax * w), int(ymax * h)),
            (0, 255, 0), 2)

    cv2.imshow("Detection", cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
    if cv2.waitKey(1) == ord('q'):
        break

NNStreamer: inference inside GStreamer

NNStreamer runs TFLite inference as a native GStreamer element, eliminating the copy between the pipeline and the model:

# Object detection with NNStreamer + NPU
gst-launch-1.0 \
  v4l2src device=/dev/video0 ! \
  video/x-raw,format=RGB,width=640,height=480,framerate=30/1 ! \
  videoscale ! video/x-raw,width=300,height=300 ! \
  tensor_converter ! \
  tensor_filter \
    framework=tflite \
    model=ssd_mobilenet_v2_int8.tflite \
    accelerator=true:npu ! \
  tensor_decoder \
    mode=bounding_boxes \
    option1=mobilenet-ssd \
    option2=labels.txt \
    option3=0:1:2:3 \
    option4=640:480 \
    option5=300:300 ! \
  compositor name=mix sink_0::zorder=1 sink_1::zorder=2 ! \
  waylandsink \
  v4l2src device=/dev/video0 ! \
    video/x-raw,format=RGB,width=640,height=480 ! \
    mix.sink_0

NNStreamer is included in packagegroup-rity-ai-ml in the RITY Yocto image.

Multi-camera setup

For applications requiring multiple cameras simultaneously:

import threading
import cv2

class CameraThread(threading.Thread):
    def __init__(self, device, name):
        super().__init__()
        self.cap = cv2.VideoCapture(
            f"v4l2src device={device} ! "
            "video/x-raw,format=BGR,width=640,height=480 ! "
            "appsink max-buffers=1 drop=true sync=false",
            cv2.CAP_GSTREAMER
        )
        self.name = name
        self.frame = None
        self.running = True

    def run(self):
        while self.running:
            ret, frame = self.cap.read()
            if ret:
                self.frame = frame

cam0 = CameraThread("/dev/video0", "cam0")
cam1 = CameraThread("/dev/video2", "cam1")
cam0.start()
cam1.start()

Performance tips for CV pipelines on Genio

Use INT8 quantized models. On the Genio NPU, INT8 inference is 2–3× faster than FP16 and uses less memory bandwidth. Quantize with TFLite’s post-training quantization before deployment.

Skip frames if needed. If your pipeline can’t sustain real-time at 30fps, drop frames at the capture stage rather than queuing them. Use max-buffers=1 drop=true on the GStreamer appsink.

Separate capture and inference threads. Camera capture and NPU inference are independent hardware blocks. Running them in separate threads allows full hardware utilization — the NPU runs the previous frame while the ISP captures the next.

Avoid cv2.VideoCapture for MIPI cameras. It forces a CPU copy at every frame. Use GStreamer appsink and receive frames as numpy arrays directly.

For the NPU inference stack details including model conversion and quantization, see on-device AI without the cloud on Genio. For MIPI CSI camera driver bring-up, see MIPI CSI camera driver setup on Genio.

FAQ

What camera interfaces does MediaTek Genio support for computer vision?

Genio supports MIPI CSI-2 cameras (2–3 four-lane interfaces depending on platform) and USB UVC cameras. MIPI CSI cameras go through the hardware ISP via libcamera or V4L2. USB cameras appear as standard V4L2 video devices.

Should I use GStreamer or OpenCV for camera capture on Genio?

Use GStreamer for camera capture. It uses hardware-accelerated elements and keeps frames in device memory. Feed preprocessed frames from GStreamer’s appsink into OpenCV or TFLite for inference.

What is the fastest way to run object detection on Genio?

GStreamer MIPI CSI capture → hardware ISP → tensor_filter with TFLite INT8 model + NeuronStableDelegate → NNStreamer tensor_decoder → overlay on Wayland display. This keeps data in device memory across all stages and uses the NPU for inference.

Does OpenCV support hardware acceleration on Genio?

OpenCV on Genio uses the ARM CPU (NEON SIMD) for most operations. For AI inference, TFLite and ONNX Runtime with NeuronEP are faster than OpenCV’s DNN module because they use the dedicated MDLA NPU.


MediaTek Genio Expert Support

Building on MediaTek Genio?

BSP bring-up, GStreamer pipelines, NeuroPilot integration, we've shipped it. Get unblocked fast. One call to scope it, fixed bid to deliver it.

Frequently Asked Questions

What camera interfaces does MediaTek Genio support for computer vision?

Genio supports MIPI CSI-2 cameras (2–3 four-lane interfaces depending on platform) and USB UVC cameras. MIPI CSI cameras go through the hardware ISP via libcamera or V4L2. USB cameras appear as standard V4L2 video devices. For multi-camera setups, MIPI CSI is preferred due to lower latency and synchronization support.

Should I use GStreamer or OpenCV for camera capture on Genio?

Use GStreamer for camera capture and preprocessing. GStreamer uses hardware-accelerated elements (mtk-video decode, ISP pipeline) and keeps frames in device memory. OpenCV VideoCapture works for USB cameras but copies frames through CPU memory on MIPI CSI cameras. Feed preprocessed frames from GStreamer's appsink into OpenCV or TFLite for inference.

What is the fastest way to run object detection on Genio?

The fastest end-to-end path is: GStreamer MIPI CSI capture → hardware ISP → tensor_filter with TFLite INT8 model + NeuronStableDelegate → NNStreamer tensor_decoder → overlay on Wayland display. This path keeps data in device memory between pipeline stages and uses the NPU for inference.

Does OpenCV support hardware acceleration on Genio?

OpenCV on Genio uses the ARM CPU (NEON SIMD) for most operations. OpenCV's OpenCL backend can offload some ops to the Mali GPU if built with OpenCL support. For AI inference, TFLite and ONNX Runtime with NeuronEP are faster than OpenCV's DNN module on Genio because they use the dedicated MDLA NPU.

Aarón Angulo, Co-Founder & CEO at ProventusNova

Written by

Aarón Angulo

Co-Founder & CEO · ProventusNova

Obsessed with client outcomes. Aarón ensures every engagement delivers real results, on time, on scope, no exceptions.

Connect on LinkedIn