Aeriform in-action is a multiview dataset for recognizing human actions in aerial videos. The proposed dataset consists of 32 high resolution videos containing 13 action classes with 55,477 frames (without augmentation) and 400,000 annotations captured at 30fps and a resolution of 3840 x 2160 pixels. The dataset addresses several concerns like camera motion, illumination changes, diversity in actions, dynamic transitions of actions etc. The action classes can be categorized as atomic actions, human- human interactions and human-object interactions. The 13 actions are carrying, drinking, handshaking, hugging, kicking, lying, punching, reading, running, sitting, standing, walking and waving. This dataset will provide a baseline for recognizing human actions in aerial videos and will encourage the embedding researchers to progress the field.
An open-source tool, DarkLabel is used to annotate
the videos. Click on the
link provided to know more about it.
All the videos are converted into frames and the annotations are provided for each frame. The annotations are mapped back to the original images for validating the dataset.
Annotations are represented with the help of seven attributes:
Name | Action | ID | Coordinates(x,y,w,h) |
---|---|---|---|
Person | Punching | 0 | 2121, 1579, 60, 112 |
Person | Standing | 4 | 1637, 1523, 73, 101 |
Person | Walking | 5 | 1567, 1542, 55, 121 |
Person | Lying | 7 | 1278, 1826, 90, 117 |
University Institute of Engineering and Technology
South Campus,Panjab University, Sector 25
Chandigarh,India 160014