FLEX Dataset

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

¹ School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, USTC

² Suzhou Institute of Biomedical Engineering and Technology, CAS

³ Institute of High-Performance Computing, A*STAR

⁴ School of Psychology, Beijing Sports University

⁵ School of Competitive Sports, Beijing Sports University

^*Indicates Equal Contribution

Under Review-2025

Abstract

Action Quality Assessment (AQA)—the task of quantifying how well an action is performed—has great potential for detecting errors in gym weight training, where accurate feedback is critical to prevent injuries and maximize gains. Existing AQA datasets, however, are limited to single-view competitive sports and RGB video, lacking multimodal signals and professional assessment of fitness actions. We introduce FLEX, the first large-scale, multimodal, multiview dataset for fitness AQA that incorporates surface electromyography (sEMG). FLEX contains over 7,500 multi-view recordings of 20 weight-loaded exercises performed by 38 subjects of diverse skill levels, with synchronized RGB video, 3D pose, sEMG, and physiological signals. Expert annotations are organized into a Fitness Knowledge Graph (FKG) linking actions, key steps, error types, and feedback, supporting a compositional scoring function for interpretable quality assessment. FLEX enables multimodal fusion, cross-modal prediction—including the novel Video→EMG task—and biomechanically oriented representation learning. Building on the FKG, we further introduce FLEX-VideoQA, a structured question–answering benchmark with hierarchical queries that drive cross-modal reasoning in vision–language models. Baseline experiments demonstrate that multimodal inputs, multi-view video, and fine-grained annotations significantly enhance AQA performance. FLEX thus advances AQA toward richer multimodal settings and provides a foundation for AI-powered fitness assessment and coaching.

BibTeX


      @article{yin2025flex,
      title={FLEX: A Large-Scale Multi-Modal Multi-Action Dataset for Fitness Action Quality Assessment}, 
      author={Hao Yin, Lijun Gu, Paritosh Parmar, Lin Xu, Tianxiao Guo, Weiwei Fu, Yang Zhang, Tianyou Zheng},
      journal={arXiv preprint arXiv:2506.03198},
      year={2025},
      }

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

Abstract

Four cinema cameras and one smartphone were fixed at the four corners of the collection area. Video, sEMG, heart rate, and breath rate are recorded synchronously during collection.

(a) Visualization of frequently used annotation words. (b) FLEX-KG: the structure of the knowledge graph. (c1) Mapping between actions and action knots. (c2) Mapping between action knots and error types.

(a) Sample number of 20 actions. (b) Average duration and score of 20 actions. (c) Overall score distribution of the dataset and the top 20 most frequent error types.

BibTeX