§ Abstract
Action Quality Assessment (AQA)—the ability to quantify the quality of human motion, actions, or skill levels and provide feedback—has far-reaching implications in areas such as low-cost physiotherapy, sports training, and workforce development. As such, it has become a critical field in computer vision and video understanding over the past decade. In this paper, we present a thorough survey of the AQA landscape, systematically reviewing over 200 research papers using the PRISMA framework. We begin by covering foundational concepts and definitions, then move to general frameworks and performance metrics, and finally discuss the latest advances in methodologies and datasets. This survey provides a detailed analysis of research trends, performance comparisons across 33 datasets and 7 principal research trends, challenges, and future directions. Through this work, we aim to offer a valuable resource for both newcomers and experienced researchers, promoting further exploration and progress in AQA.
★ Survey at a Glance
- Largest AQA survey to date — 214 papers, 33 datasets, 9 domains, systematically reviewed via PRISMA
- Comprehensive dataset taxonomy across Surgery, Rehabilitation, Daily Activities, Music, Fitness, Industrial Manufacturing, Dance, AI-Generated Videos, and Sports
- 7 principal research trends: Fine-grained, Multimodal, Generalization, Continual Learning, Explainable, Comprehensive, Self-supervised AQA
- Detailed performance comparison tables with Spearman's ρ and Relative ℓ₂ for all major methods across MTL-AQA, FineDiving, AQA-7 and more
- In-depth analysis of challenges & future opportunities: dataset scale, subject diversity, physiological multimodality, interpretability, real-time deployment, and AGI in Action
- First survey to cover AI-Generated Video (GAIA), Continual Learning, and Causal Inference paradigms in AQA
1 AQA Datasets — 33 Benchmarks Across 9 Domains
| Dataset | Year | Samples | Avg. Dur. | Annotation | Domain | Link |
|---|---|---|---|---|---|---|
| MIT-Dive | 2014 | 159 | ~2.5s | Score | Sport | 🔗 |
| MIT-Skate | 2014 | 150 | ~175s | Score | Sport | 🔗 |
| JIGSAWS | 2014 | 103 | ~92s | Score, Action | Surgery | 🔗 |
| UNLV-Dive | 2017 | 370 | ~3.8s | Score | Sport | 🔗 |
| UNLV-Vault | 2017 | 176 | ~2.8s | Score | Sport | 🔗 |
| UI-PRMD | 2018 | 100 | — | Grade, Action | Rehab | |
| EPIC-Skill | 2018 | 216 | ~86.6s | Rank, Action | Daily | 🔗 |
| AQA-7 | 2019 | 1,189 | ~6.7s | Score, Action | Sport | 🔗 |
| MTL-AQA | 2019 | 1,412 | ~4.1s | Score, Action, Desc. | Sport | 🔗 |
| Fis-V | 2019 | 500 | ~170s | Score | Sport | 🔗 |
| BEST | 2019 | 500 | ~187.6s | Grade, Rank, Action | Daily | 🔗 |
| KIMORE | 2019 | 353 | ~29.9s | Score, Action | Rehab | |
| TASD-2 | 2020 | 606 | ~4.1s | Score, Action | Sport | |
| Rhythmic Gym. | 2020 | 1,000 | ~95s | Score, Action | Sport | 🔗 |
| PISA (Piano-Skills) | 2021 | 992 | ~160fr | Grade, Difficulty | Music | 🔗 |
| FR-FS | 2021 | 417 | ~103fr | Grade, Action | Sport | 🔗 |
| FS1000 | 2021 | 1,000 | — | Score, Action | Sport | 🔗 |
| SMART | 2021 | 5,000 | ~420fr | Score, Action | Sport | |
| SimSurgSkill | 2021 | 315 | — | Score, Action | Surgery | |
| Fitness-AQA | 2022 | 21,284 | ~4.1s | Grade, Action | Fitness | 🔗 |
| FineDiving | 2022 | 3,000 | ~4.2s | Score, Action, Step | Sport | 🔗 |
| Assembly101 | 2022 | 4,321 | ~426s | Score, Action | Industrial | 🔗 |
| LOGO | 2023 | 200 | ~204.2s | Score, Action, Form. | Sport | 🔗 |
| FineFS | 2023 | 1,167 | ~215s | Score, Action | Sport | 🔗 |
| PaSk | 2023 | 1,018 | ~10.7s | Score | Sport | |
| CDRG | 2023 | 240 | ~14.7s | Rank, Action | Dance | |
| GAIA | 2024 | 9,180 | ~2.8s | Score, Action | AIGV | 🔗 |
| EgoExo-4D | 2024 | 5,035 | ~312s | Grade, Action | Daily | 🔗 |
| EgoExo-Learn | 2024 | 3,304 | ~10s | Rank, Action | Daily | 🔗 |
| EgoExo-Fitness | 2024 | 6,131 | ~18.8s | Score, Action, Desc. | Fitness | 🔗 |
| AVOS | 2024 | 1,997 | — | Grade, Action | Surgery | |
| UJ-AQA | 2025 | 8,540 | ~28fr | Score | Sport | 🔗 |
| BASKET | 2025 | 32,232 | ~500s | Grade, Action | Sport | 🔗 |
| FLEX | 2025 | 7,512 | ~234fr | Score, Action, Desc., sEMG | Fitness | 🔗 |
Domain Distribution
2 Seven Principal Research Trends
🔥 Fine-grained AQA
The dominant trend (2018–2025). Four sub-directions:
- Segment-aware: S3D, TSA-Net, FineParser, GOAT — temporal decomposition into atomic sub-actions
- Actor-object centric: JR-GCN, EAGLE-Eye, DuRA — joint-relation and object-context modeling
- Uncertainty-aware: USDL, DAE, UD-AQA — distributional scoring with confidence estimation
- Contrastive regression: CoRe, RGR, MCoRe — relative quality via pairwise/group ranking
Top method: T²CR (ρ=0.9638 on MTL-AQA).
🎯 Multimodal AQA
Beyond RGB: fusion of multiple sensing modalities.
- Skeleton: ST-GCN, EGCN++, HGCN — pose-graph based
- Audio: PISA, Skating-Mixer — audio-visual fusion
- Vision-language: NAE (ρ=0.9790), VATP-Net, SGN — text-guided scoring
- Optical flow: PAMFN — motion-aware fusion
- Physiological: FLEX — sEMG signal integration
Skeleton adds +7.76% ρ over RGB alone; multi-modal late fusion avoids gradient interference.
🌐 Generalization
Cross-action, cross-dataset, cross-domain deployment.
- Adaptive Net, ASGTN: NAS-based architecture search
- AdaST: adaptive skill transfer (0.8832 Acc on EPIC-Skill)
- CoFInAl, PHI: coarse-to-fine instruction alignment
Domain shift from action recognition pretraining remains the core challenge.
🔄 Continual Learning
Addressing catastrophic forgetting as new action types emerge.
- PECoP: parameter-efficient continual pretraining (ρ=0.9520)
- Continual-AQA: feature-score correlation rehearsal
- MAGR: manifold-aligned graph regularization
Rapidly emerging direction (2024–2025); key for real-world deployment across expanding action sets.
🔍 Explainable AQA
From black-box regression to transparent, interpretable assessment.
- Process-level: NeuroSymbolic-AQA (rule-based, EP=0.9610), FineCausal (causal intervention, ρ=0.9447)
- Outcome-level: TechCoach (descriptive coaching text), ExpertAF (corrective pose generation), SkillNet (human-AI collaboration)
Fastest-growing trend in 2024–2025; unlocks AI coaching and physiotherapy applications.
🧩 Comprehensive AQA
Accounting for all contributing factors — not just the actor.
- VTPE: surgical tools + operative field + events + surgeon interaction
- PSGCN-RTCN: skeleton + appearance + facial expression + scene context
Necessary for real-world settings where external factors (equipment, environment, collaboration) influence quality.
🧪 Self-Supervised AQA
Overcoming expensive expert annotation burden.
- SAP-Net: self-supervised sub-action parsing (ρ=0.8450 on FineDiving)
- S⁴AQA: semi-supervised segment feature recovery
- TRS: teacher-reference-student architecture
- VQD-Net: quantized vector decoupling for semi-supervised AQA
Critical for democratizing AQA into low-resource domains and new action types.
3 Challenges & Future Directions
§ Citation
If you find this survey useful, please cite our IJCV 2026 paper:
@article{yin2026decadeactionqualityassessment,
title = {A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions},
author = {Yin, Hao and Parmar, Paritosh and Xu, Daoliang and Zhang, Yang and Zheng, Tianyou and Fu, Weiwei},
journal = {International Journal of Computer Vision (IJCV)},
year = {2026},
doi = {10.1007/s11263-025-02672-4},
url = {https://github.com/HaoYin116/Survey_of_AQA}
}