GStreamer pipeline performance on Jetson: how to find the bottleneck
Key Insights
- The main bottleneck on Jetson GStreamer pipelines is almost always memory copies between system RAM and NVMM, not compute — keep data in NVMM from camera to output
- Replace
videoconvertwithnvvidconveverywhere in a Jetson pipeline; the VIC hardware handles format conversion without touching the CPU - Use
nvv4l2decoderinstead ofavdec_h264for decode — hardware decode on Jetson handles 4K at a fraction of the CPU cost of software decode appsinkwith a slow Python callback is the most common hidden bottleneck — the pipeline backs up waiting for your callback to returnGST_DEBUG_DUMP_DOT_DIRlets you visualize the full pipeline graph and spot where formats are negotiated incorrectly
The unified memory architecture, and why it matters for pipelines
Jetson uses unified memory — the CPU and GPU share the same physical DRAM. This is different from a discrete GPU setup where CPU memory and GPU VRAM are separate. On Jetson, zero-copy between CPU and GPU is theoretically possible, but only when both sides use the right memory type.
NVIDIA’s GStreamer plugins use NVMM buffers: a memory-mapped buffer that the GPU hardware engines (VIC, NVDEC, NVENC, DLA) can access without a DMA copy. The problem comes when you insert a CPU-based element into a pipeline that was otherwise entirely NVMM.
nvarguscamerasrc → nvvidconv → nvinfer → nvvidconv → nveglglessink
NVMM NVMM NVMM NVMM NVMM
This is a zero-copy pipeline. Everything stays in NVMM.
nvarguscamerasrc → videoconvert → appsink
NVMM ← COPY → system RAM
This copies every frame from NVMM to system RAM at videoconvert. On a 4K 30fps stream, that’s roughly 720MB/s of unnecessary memory bandwidth.
Identify the element causing the bottleneck
The fastest diagnostic is the GST pipeline graph. Set this before running your pipeline:
export GST_DEBUG_DUMP_DOT_DIR=/tmp
Then run your pipeline. After it starts (or crashes), .dot files appear in /tmp. Convert to PNG:
sudo apt install graphviz
dot -Tpng /tmp/pipeline*.dot -o /tmp/pipeline.png
Open the PNG. Look for:
- Elements showing
video/x-raw(nomemory:NVMM) — these are CPU-path elements causing copies - Caps negotiation failures — elements trying to send NVMM to a CPU-only sink
For runtime profiling, use GST_DEBUG with specific elements:
GST_DEBUG=nvvidconv:4,nvinfer:4,appsink:4 gst-launch-1.0 \
nvarguscamerasrc ! nvvidconv ! nvinfer config-file-path=det.cfg ! \
nvvidconv ! nvdrmvideosink
The logs will show per-element timing and buffer flow.
The 4 patterns that kill throughput
Pattern 1: videoconvert in a hardware pipeline
The symptom is high CPU usage (70%+) on a pipeline that should be GPU-accelerated. The cause is one videoconvert element in the middle, forcing everything before and after it through system RAM.
Replace every videoconvert with nvvidconv:
# Slow (CPU path)
... ! videoconvert ! video/x-raw,format=BGR ! appsink
# Fast (hardware path)
... ! nvvidconv ! video/x-raw(memory:NVMM),format=NV12 ! nvvidconv ! \
video/x-raw,format=BGRx ! appsink
The second nvvidconv handles the final conversion out of NVMM before the CPU-side sink.
Pattern 2: Software decode
avdec_h264 is a CPU-based H.264 decoder. On Jetson, use nvv4l2decoder instead:
# Slow
... ! h264parse ! avdec_h264 ! videoconvert ! autovideosink
# Fast
... ! h264parse ! nvv4l2decoder ! nvvidconv ! nvdrmvideosink
nvv4l2decoder uses the NVDEC hardware engine. On Jetson Orin, this handles 4K H.264 at roughly 1% CPU vs 40%+ for the software decoder.
Pattern 3: Slow appsink callback
appsink is how you pull frames into Python or C++ code. If your callback takes longer to process a frame than the pipeline produces them, the queue fills up and GStreamer stalls.
The default queue behavior is to block. Your pipeline backs up, your camera drops frames, and latency climbs. Fix it by setting a maximum buffer count and drop policy:
appsink = pipeline.get_by_name("appsink0")
appsink.set_property("max-buffers", 1)
appsink.set_property("drop", True)
appsink.set_property("emit-signals", True)
drop=True means the sink drops old frames rather than blocking the pipeline. You lose frames but maintain real-time processing. If you can’t afford to drop frames, you need to speed up the callback — move heavy work off the GStreamer thread with a separate processing queue.
Pattern 4: Missing queue elements
GStreamer runs elements in the same thread by default unless you insert queue elements to split into separate threads. A pipeline with a slow sink element (like a network sink) will block the camera source thread without a queue between them:
nvarguscamerasrc ! nvvidconv ! queue max-size-buffers=4 ! \
nvv4l2h264enc ! rtph264pay ! udpsink host=192.168.1.100 port=5000
The queue element decouples the camera thread from the encoder/network thread. Without it, network jitter stalls the camera.
A profiling pipeline for benchmarking
This pipeline measures pure throughput from camera to nowhere — useful for finding your theoretical maximum before you add processing:
gst-launch-1.0 -v \
nvarguscamerasrc num-buffers=300 ! \
'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=30/1' ! \
nvvidconv ! \
'video/x-raw(memory:NVMM),format=NV12' ! \
fakesink sync=false
fakesink drops all frames immediately with no rendering overhead. If this pipeline runs at full framerate, your hardware can handle the resolution. If it drops frames here, the bottleneck is upstream of processing — check camera driver configuration and MIPI bandwidth.
For network streaming pipelines, see the GStreamer development service page. If you’re comparing building this yourself against using an external specialist like RidgeRun, the RidgeRun vs ProventusNova comparison has a direct breakdown.
Frequently Asked Questions
Why is my GStreamer pipeline slow on Jetson even though it runs fine on a desktop?
Desktop CPUs have more cores and higher memory bandwidth. On Jetson, mixing CPU-based elements with GPU-accelerated elements causes data copies between system memory and NVMM. Each copy is expensive. Keep data in NVMM by using hardware-accelerated elements throughout the pipeline.
What is NVMM memory in GStreamer on Jetson?
NVMM is a zero-copy memory buffer in the GPU’s address space. Elements like nvvidconv, nvinfer, and nvarguscamerasrc produce and consume NVMM buffers without copying to system RAM. When a CPU-based element receives an NVMM buffer, it copies it to system memory — that copy is usually where latency comes from.
What is the difference between nvvidconv and videoconvert?
videoconvert is CPU-based. nvvidconv runs on Jetson’s VIC hardware engine and operates on NVMM buffers without touching the CPU. On 4K or multi-stream workloads, replacing videoconvert with nvvidconv can cut CPU usage by 60–80%.
How do I enable GStreamer debug logs?
Set GST_DEBUG=3 for general logs: GST_DEBUG=3 gst-launch-1.0 .... For element-specific: GST_DEBUG=nvvidconv:5,nvinfer:4. For a visual pipeline graph: export GST_DEBUG_DUMP_DOT_DIR=/tmp, run the pipeline, then convert with dot -Tpng /tmp/*.dot -o pipeline.png.
How many camera streams can Jetson Orin handle in GStreamer?
Jetson AGX Orin can handle 8–16 1080p30 streams using hardware decode (nvv4l2decoder) and nvstreammux for batching. Using software decode (avdec_h264) drops that to 2–4 streams. The NVDEC engine count and VIC bandwidth are the real limits.
GStreamer pipeline tuning on Jetson gets complex fast — especially when you’re mixing camera inputs, model inference, and encoding for network output. If you’re hitting a wall on throughput or latency, talk to our team.