Under Review at the ACM MM 2026 Dataset Track

IMPACT: A Multi-view Industrial Assembly Benchmark
for Fine-grained Action Understanding, Procedural Reasoning, and Error Analysis

Di Wen¹, Zeyun Zhong¹, David Schneider¹, Manuel Zaremski¹, Linus Kunzmann¹

Yitian Shi¹, Ruiping Liu¹, Yufan Chen¹, Junwei Zheng^1,2, Jiahang Li¹

Jonas Hemmerich¹, Qiyi Tong³, Patric Grauberger¹, Arash Ajoudani³, Danda Pani Paudel⁴

Sven Matthiesen¹, Barbara Deml¹, Jürgen Beyerer¹, Luc Van Gool⁴, Rainer Stiefelhagen¹, Kunyu Peng^1,4,*

¹Karlsruhe Institute of Technology, Germany ²ETH Zurich, Switzerland ³Italian Institute of Technology, Italy ⁴INSAIT, Sofia University, Bulgaria

* Corresponding author: kunyu.peng@kit.edu

A synchronized five-view RGB-D dataset and benchmark for industrial assembly under viewpoint shift, partial observability, multi-route execution, and anomaly recovery.

Benchmark code, task protocols, and released baselines are maintained in the GitHub repository.

Paper Download Benchmark Code

Assembly Task and Recording Setup

IMPACT is collected around real angle grinder assembly and disassembly with synchronized ego and exo RGB-D, gaze, audio, and cognitive metadata.

Two angle grinder configurations and associated assembly parts used in IMPACT.

Manual book showing standardized installation and dismantling order.

Data acquisition pipeline for the IMPACT dataset.

Trials 112 recordings

Participants 13 subjects

Duration 39.5 hours

Views 1 ego + 4 exo RGB-D

Dataset Overview

IMPACT comprises 112 trials from 13 participants and 39.5 hours of synchronized industrial assembly data.

Dataset statistics summary for the IMPACT benchmark.

Assembly object

Two physical product configurations

Two angle grinder configurations provide a natural cross-configuration generalization setting.

Execution structure

Partial-order procedural execution

Valid trajectories follow a prerequisite graph rather than a single rigid sequence.

Supervision

Anomaly and compliance supervision

Explicit anomaly recovery supervision is paired with compliance phases and a six-category anomaly taxonomy.

Human metadata

Trial-level cognitive workload

Each trial is paired with NASA-TLX workload measurement.

Benchmark Tasks

Temporal understanding task family: TAS-S, TAS-BL, and TAS-BR.

Cross-view understanding task family: CV-TA, CV-SMR, and CV-SMC.

Action forecasting task family: AF-S and AF-L.

State and reasoning task family: PSR, ASR, PPR, and ATR.

Download

The public release combines repository-hosted protocol assets with Google Drive bundles for annotations, multi-view media, ego eye-tracking traces, released feature packs, and a quick-start sample subset.

Repository Google Drive Release Hugging Face Mirror

Type	Status	Content
Protocol Assets	In Repo	Official splits, mappings, labels, and procedure metadata included in this repository.
Starter Code	In Repo	Task wrappers, evaluation scripts, and released baselines included in this repository.
Full Annotation Bundle	Released	Official release folder containing `annotations_v1.zip`.
RGB Videos	Released	Five-view RGB archives distributed via the Google Drive release: `videos_ego`, `videos_front`, `videos_left`, `videos_right`, and `videos_top`.
Depth Streams	Released	Exocentric depth archives distributed via the Google Drive release: `depth_front`, `depth_left`, `depth_right`, and `depth_top`.
Ego Audio	Released	`audio_ego.zip` for the egocentric audio stream, distributed via the Google Drive release.
Ego Eye Tracking	Released	`eye_tracking_ego.zip` for egocentric eye-tracking TSV traces, distributed via the Google Drive release.
Pre-extracted Features	Released	Released feature bundles for `I3D`, `MViTv2`, and `VideoMAEv2`, distributed via the Google Drive release.
Sample Pack	Released	Quick-start subset with 3 executions, multi-view media, eye-tracking traces, released features, and task annotations for download verification and review.
Hugging Face Mirror	Live	Public dataset mirror for the IMPACT release bundles.

Release Update

Planned dataset maintenance updates will be announced here.

Planned · Late Jun 2026

Annotation Version Update

A refined annotation release is planned for late June 2026. The update will reduce residual annotation noise, improve label accuracy, and include a detailed changelog documenting version differences and corrections. The current public release corresponds to Annotation v1 for academic release and early use; the upcoming release will publish denoised annotations.

Annotation Schema

Aligned supervision spans interaction dynamics, procedural structure, compliance, and state evolution.

Data annotation workflow for the IMPACT dataset.

Fine-grained actions

Hand-specific labels capture coordinated bimanual interaction.

Procedural structure

Coarse steps and completion events connect segmentation to graph-based reasoning.

Assembly states

Component-wise ternary states support direct ASR and indirect PSR.

Compliance and anomalies

Compliance phases are paired with anomaly categories for diagnosis.

Benchmark

Released methods are organized by task family, with detailed protocols maintained in the task-specific pages.

Released coverage

TAS, CV-TA, CV-SM, AF-S, PSR, ASR, PPR, and ATR are released.
AF-L is documented in the repository with its task protocol and baseline notes.
Detailed model lists and launch commands are maintained in the repository.

Metrics

Dense prediction metrics for temporal and compliance tasks.
Retrieval and classification metrics for cross-view tasks.
Task-specific forecasting, state, and procedural reasoning metrics.

Leaderboard

Leaderboard links and submission policy will be added when the public benchmark release is finalized.

Resources

The repository is the operational layer for reproducible benchmarking and protocol inspection.

Paper, Citation, and License

These items are visible early because they matter to both users and reviewers.

Paper & Citation

Paper PDF and citation links will be added here when the public release is finalized.

@inproceedings{impact2026,
  title     = {IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly},
  author    = {TBD},
  booktitle = {ACM Multimedia},
  year      = {2026}
}

License

Repository-authored code: repository root LICENSE.
Dataset assets and documentation: LICENSE-DATA.
third_party/: per-directory provenance and license notices.

Acknowledgements

We gratefully acknowledge Lei Qi, Weitong Kong, Chen Zhang, Haiwen Sun, and Yuwei Hu for their valuable support throughout the data collection and annotation process for IMPACT.

IMPACT: A Multi-view Industrial Assembly Benchmark
for Fine-grained Action Understanding, Procedural Reasoning, and Error Analysis

Assembly Task and Recording Setup