Under Review at the ACM MM 2026 Dataset Track

IMPACT: A Multi-view Industrial Assembly Benchmark
for Fine-grained Action Understanding, Procedural Reasoning, and Error Analysis

Di Wen1, Zeyun Zhong1, David Schneider1, Manuel Zaremski1, Linus Kunzmann1

Yitian Shi1, Ruiping Liu1, Yufan Chen1, Junwei Zheng1,2, Jiahang Li1

Jonas Hemmerich1, Qiyi Tong3, Patric Grauberger1, Arash Ajoudani3, Danda Pani Paudel4

Sven Matthiesen1, Barbara Deml1, Jürgen Beyerer1, Luc Van Gool4, Rainer Stiefelhagen1, Kunyu Peng1,4,*

1Karlsruhe Institute of Technology, Germany 2ETH Zurich, Switzerland 3Italian Institute of Technology, Italy 4INSAIT, Sofia University, Bulgaria

* Corresponding author: kunyu.peng@kit.edu

A synchronized five-view RGB-D dataset and benchmark for industrial assembly under viewpoint shift, partial observability, multi-route execution, and anomaly recovery.

Benchmark code, task protocols, and released baselines are maintained in the GitHub repository.

Assembly Task and Recording Setup

IMPACT is collected around real angle grinder assembly and disassembly with synchronized ego and exo RGB-D, gaze, audio, and cognitive metadata.

Two angle grinder configurations and associated assembly parts used in IMPACT.
Manual book showing standardized installation and dismantling order.
Data acquisition pipeline for the IMPACT dataset.
IMPACT data acquisition workstation.
Trials 112 recordings
Participants 13 subjects
Duration 39.5 hours
Views 1 ego + 4 exo RGB-D

Dataset Overview

IMPACT comprises 112 trials from 13 participants and 39.5 hours of synchronized industrial assembly data.

Dataset statistics summary for the IMPACT benchmark.
Assembly object

Two physical product configurations

Two angle grinder configurations provide a natural cross-configuration generalization setting.

Execution structure

Partial-order procedural execution

Valid trajectories follow a prerequisite graph rather than a single rigid sequence.

Supervision

Anomaly and compliance supervision

Explicit anomaly recovery supervision is paired with compliance phases and a six-category anomaly taxonomy.

Human metadata

Trial-level cognitive workload

Each trial is paired with NASA-TLX workload measurement.

Benchmark Tasks

Temporal understanding task family: TAS-S, TAS-BL, and TAS-BR.
Cross-view understanding task family: CV-TA, CV-SMR, and CV-SMC.
Action forecasting task family: AF-S and AF-L.
State and reasoning task family: PSR, ASR, PPR, and ATR.

Download

The public release combines repository-hosted protocol assets with Google Drive bundles for annotations, multi-view media, ego eye-tracking traces, released feature packs, and a quick-start sample subset.

Type Status Content
Protocol Assets In Repo Official splits, mappings, labels, and procedure metadata included in this repository.
Starter Code In Repo Task wrappers, evaluation scripts, and released baselines included in this repository.
Full Annotation Bundle Released Official release folder containing annotations_v1.zip.
RGB Videos Released Five-view RGB archives distributed via the Google Drive release: videos_ego, videos_front, videos_left, videos_right, and videos_top.
Depth Streams Released Exocentric depth archives distributed via the Google Drive release: depth_front, depth_left, depth_right, and depth_top.
Ego Audio Released audio_ego.zip for the egocentric audio stream, distributed via the Google Drive release.
Ego Eye Tracking Released eye_tracking_ego.zip for egocentric eye-tracking TSV traces, distributed via the Google Drive release.
Pre-extracted Features Released Released feature bundles for I3D, MViTv2, and VideoMAEv2, distributed via the Google Drive release.
Sample Pack Released Quick-start subset with 3 executions, multi-view media, eye-tracking traces, released features, and task annotations for download verification and review.
Hugging Face Mirror Live Public dataset mirror for the IMPACT release bundles.

Release Update

Planned dataset maintenance updates will be announced here.

Planned · Late Jun 2026

Annotation Version Update

A refined annotation release is planned for late June 2026. The update will reduce residual annotation noise, improve label accuracy, and include a detailed changelog documenting version differences and corrections. The current public release corresponds to Annotation v1 for academic release and early use; the upcoming release will publish denoised annotations.

Annotation Schema

Aligned supervision spans interaction dynamics, procedural structure, compliance, and state evolution.

Annotation demo for the IMPACT dataset.
Data annotation workflow for the IMPACT dataset.

Fine-grained actions

Hand-specific labels capture coordinated bimanual interaction.

Procedural structure

Coarse steps and completion events connect segmentation to graph-based reasoning.

Assembly states

Component-wise ternary states support direct ASR and indirect PSR.

Compliance and anomalies

Compliance phases are paired with anomaly categories for diagnosis.

Benchmark

Released methods are organized by task family, with detailed protocols maintained in the task-specific pages.

Released coverage

  • TAS, CV-TA, CV-SM, AF-S, PSR, ASR, PPR, and ATR are released.
  • AF-L is documented in the repository with its task protocol and baseline notes.
  • Detailed model lists and launch commands are maintained in the repository.

Metrics

  • Dense prediction metrics for temporal and compliance tasks.
  • Retrieval and classification metrics for cross-view tasks.
  • Task-specific forecasting, state, and procedural reasoning metrics.

Leaderboard

Leaderboard links and submission policy will be added when the public benchmark release is finalized.

Resources

The repository is the operational layer for reproducible benchmarking and protocol inspection.

Paper, Citation, and License

These items are visible early because they matter to both users and reviewers.

Paper & Citation

Paper PDF and citation links will be added here when the public release is finalized.

@inproceedings{impact2026,
  title     = {IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly},
  author    = {TBD},
  booktitle = {ACM Multimedia},
  year      = {2026}
}

License

  • Repository-authored code: repository root LICENSE.
  • Dataset assets and documentation: LICENSE-DATA.
  • third_party/: per-directory provenance and license notices.

Acknowledgements

We gratefully acknowledge Lei Qi, Weitong Kong, Chen Zhang, Haiwen Sun, and Yuwei Hu for their valuable support throughout the data collection and annotation process for IMPACT.