Loitering Detection

Property Value
Category Object Detection + Tracking + Zone Analytics (GstAnalytics)
Source Framework PyTorch (Ultralytics)
Supported Precisions FP32, FP16, INT8 (mixed-precision)
Inference Engine OpenVINO
Hardware CPU, GPU, NPU
Detected Class person (COCO class 0)

Overview

Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold. It is built on YOLO26 for person detection, paired with a multi-object tracker that assigns persistent IDs across frames. DLStreamer's gvaanalytics element defines the monitoring zone and automatically attaches GstAnalyticsZoneMtd metadata to every tracked person whose center falls inside the polygon. A Python probe reads this GstAnalytics metadata to accumulate per-person dwell time and raises a loitering event when the threshold is exceeded.

Typical Metro deployments include:

  • Restricted-Area Monitoring -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones.
  • Platform Edge Safety -- detect prolonged presence inside a yellow-line buffer.
  • ATM and Ticketing Security -- identify suspicious dwell at unattended kiosks.
  • Crowd-Free Zone Enforcement -- monitor emergency exits and corridors that must remain clear.

Available variants: yolo26n, yolo26s, yolo26m, yolo26l, yolo26x. Smaller variants (yolo26n, yolo26s) are recommended for high-FPS edge deployment.


Prerequisites

Create and activate a Python virtual environment before running the scripts:

python3 -m venv .venv --system-site-packages
source .venv/bin/activate

Note: The --system-site-packages flag is required so the virtual environment can access the system-installed OpenVINO and DLStreamer Python packages.


Getting Started

Download and Quantize Model

Run the provided script to download, export to OpenVINO IR, and optionally quantize:

chmod +x export_and_quantize.sh
./export_and_quantize.sh

This exports the default yolo26n model in FP16 precision.

Optional: Select a Different Variant or Precision

./export_and_quantize.sh yolo26n FP32   # full-precision
./export_and_quantize.sh yolo26n INT8   # quantized
./export_and_quantize.sh yolo26s        # larger variant, default FP16

Replace yolo26n with any variant (yolo26s, yolo26m, yolo26l, yolo26x). The second argument selects the precision (FP32, FP16, INT8); the default is FP16.

The script performs the following steps:

  1. Installs dependencies (openvino, ultralytics; adds nncf for INT8).
  2. Downloads the sample surveillance video (VIRAT_S_000101.mp4) from the Intel Metro AI Suite project into the current directory.
  3. Downloads the PyTorch weights and exports to OpenVINO IR.
  4. (INT8 only) Quantizes the model using NNCF post-training quantization.

Output files:

  • yolo26n_openvino_model/ -- FP32 or FP16 OpenVINO IR model directory.
  • yolo26n_loitering_int8.xml / yolo26n_loitering_int8.bin -- INT8 quantized model (only when INT8 is selected).

Precision / Device Compatibility

Precision CPU GPU NPU
FP32 Yes Yes No
FP16 Yes Yes Yes
INT8 Yes Yes Yes

Note: The INT8 calibration uses frames from the bundled sample video. For production accuracy, replace it with a representative set of frames from the target deployment site.

Defining the Monitoring Zone

The zone is a polygon defined in JSON and passed to DLStreamer's gvaanalytics element, which automatically detects when tracked objects are inside the zone using GstAnalytics metadata -- no Python polygon math required. A typical surveillance-zone configuration on a 1280x720 source might be:

[
  {
    "id": "loiter_zone",
    "type": "polygon",
    "points": [
      {"x": 0, "y": 200},
      {"x": 300, "y": 200},
      {"x": 300, "y": 400},
      {"x": 0, "y": 400}
    ]
  }
]
LOITERING_SECONDS = 5.0       # dwell threshold, in seconds (demo value)

Note: The sample uses a 5-second threshold so that loitering events are triggered quickly on the short demo video. For production deployments, increase this to 10--30 seconds depending on the site's operational requirements.

The gvaanalytics element attaches GstAnalyticsZoneMtd to each detection whose center falls inside the polygon. The Python probe checks for this metadata to accumulate per-person dwell time.

Note: The zone polygon supports arbitrary shapes (not just rectangles). Use draw-zones=true (the default) so that gvawatermark renders the zone boundary on the output video.

DLStreamer Sample

Set up the environment:

source /opt/intel/openvino_2026/setupvars.sh
source /opt/intel/dlstreamer/scripts/setup_dls_env.sh
export PYTHONPATH=/opt/intel/dlstreamer/python:/opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}

Run loitering detection:

from collections import defaultdict
import json
import sys
import gi
gi.require_version("Gst", "1.0")
gi.require_version("GstAnalytics", "1.0")
gi.require_version("DLStreamerMeta", "1.0")
gi.require_version("DLStreamerWatermarkMeta", "1.0")
from gi.repository import Gst, GLib, GstAnalytics, DLStreamerMeta, DLStreamerWatermarkMeta

Gst.init([])

# Register DLStreamerMeta types so GstAnalytics iteration can handle them
_ov = sys.modules["gi.overrides.GstAnalytics"]
_ov.__mtd_types__[DLStreamerMeta.ZoneMtd.get_mtd_type()] = DLStreamerMeta.relation_meta_get_zone_mtd
_ov.__mtd_types__[DLStreamerMeta.TripwireMtd.get_mtd_type()] = DLStreamerMeta.relation_meta_get_tripwire_mtd

MODEL = "yolo26n_openvino_model/yolo26n.xml"
VIDEO = "VIRAT_S_000101.mp4"
ZONE_JSON = json.dumps([{
    "id": "loiter_zone",
    "type": "polygon",
    "points": [{"x": 0, "y": 200}, {"x": 300, "y": 200},
               {"x": 300, "y": 400}, {"x": 0, "y": 400}]
}])
LOITERING_SECONDS = 5.0

pipeline = Gst.parse_launch(
    f"filesrc location={VIDEO} ! decodebin3 ! videoconvert ! "
    f"gvadetect model={MODEL} device=GPU threshold=0.5 ! queue ! "
    f"gvatrack tracking-type=short-term-imageless ! queue ! "
    f"gvaanalytics name=analytics draw-zones=true ! "
    f"gvafpscounter ! identity name=probe ! gvawatermark name=watermark ! "
    f"videoconvert ! video/x-raw,format=I420 ! "
    f"openh264enc ! h264parse ! mp4mux ! filesink location=output_dlstreamer.mp4"
)

pipeline.get_by_name("analytics").set_property("zones", ZONE_JSON)
pipeline.get_by_name("watermark").set_property("displ-cfg", "hide-roi=person")

dwell = defaultdict(float)
last_seen = {}
flagged = set()

def on_buffer(pad, info):
    buf = info.get_buffer()
    now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0
    rmeta = GstAnalytics.buffer_get_analytics_relation_meta(buf)
    if not rmeta:
        return Gst.PadProbeReturn.OK

    # Iterate only over object-detection entries
    for od in rmeta.iter_on_type(GstAnalytics.ODMtd):
        label = GLib.quark_to_string(od.get_obj_type())
        if label != "person":
            continue

        # Find tracking ID via direct relation
        track_id = None
        for trk in od.iter_direct_related(GstAnalytics.RelTypes.RELATE_TO, GstAnalytics.TrackingMtd):
            success, tracking_id, *_ = trk.get_info()
            if success:
                track_id = tracking_id
            break
        if track_id is None:
            continue

        # Check if gvaanalytics placed this detection inside the zone
        in_zone = False
        for zone in od.iter_direct_related(GstAnalytics.RelTypes.RELATE_TO, DLStreamerMeta.ZoneMtd):
            in_zone = True
            break

        if not in_zone:
            continue

        # Accumulate dwell time for persons inside the zone
        dwell[track_id] += now - last_seen.get(track_id, now)
        last_seen[track_id] = now

        if dwell[track_id] >= LOITERING_SECONDS and track_id not in flagged:
            flagged.add(track_id)
            _, x, y, w, h, _ = od.get_location()
            print(f"LOITERING id={track_id} dwell={dwell[track_id]:.1f}s pos=({int(x + w/2)},{int(y + h)})")

    return Gst.PadProbeReturn.OK

pipeline.get_by_name("probe").get_static_pad("src").add_probe(Gst.PadProbeType.BUFFER, on_buffer)
pipeline.set_state(Gst.State.PLAYING)
pipeline.get_bus().timed_pop_filtered(Gst.CLOCK_TIME_NONE, Gst.MessageType.EOS | Gst.MessageType.ERROR)
pipeline.set_state(Gst.State.NULL)

Expected output:

LOITERING id=26 dwell=5.0s pos=(147,341)
LOITERING id=27 dwell=5.0s pos=(122,337)
...

The annotated video is saved to output_dlstreamer.mp4. The gvaanalytics element also draws the zone polygon on each frame via gvawatermark.

Expected Output

DLStreamer expected output

Device targets:

  • device=GPU -- default in the sample code.
  • device=CPU -- change device=GPU to device=CPU.
  • device=NPU -- change device=GPU to device=NPU; use batch-size=1 and nireq=4 for best NPU utilization.

License

Licensed under the MIT License. See LICENSE for details.

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Intel/loitering-detection