Aerial Person Detector

Developed by: Raphael Makrigiorgis

This AI-based software for identifying individuals consists of two main components, detection and tracking. This is achieved by utilizing footage captured from various test missions using UAV (Unmanned Aerial Vehicles). Person detection is done by means of computer vision algorithms and a Convolutional Neural Network whereas tracking is achieved by means of a combination of algorithms. Furthermore, upon detecting and tracking the persons, all the data collected from the whole process is then saved in CSV files for further analysis.

For the person detection algorithm, YOLOv4 [1] was trained. YOLOv4 is an object detection algorithm that is an evolution of the YOLOv3 model, which is a real-time object recognition system that can recognize multiple objects in a single frame. For the model to recognize persons, a dataset of images was initially created. Around 2500 images were collected and annotated. Some of those were captured from real life test missions and the rest were taken from the Heridal Database[2]. In order to increase the accuracy of the detector and to increase the dataset size, some image augmentations were implemented on several images and used as separate images for the training dataset. These augmentations include brightness, contrast, sharpness, and saturation. Image augmentation helps to enhance the details of the images which is helpful since the altitude at which the drones fly is usually high. The training was done using the Darknet Framework, an open-source neural network framework written in C and CUDA.

Upon detecting the persons in a frame, tracking algorithms starts processing them. Initially the tracking utilizes the Hungarian Algorithm[3], which uses the Intersection of Union of the detections compared to the previous detections as a similarity metric score. The matching association for each person detection is executed in the current frame with the highest achieved Intersection over Union score of the previous frames. Apart from the Intersection over Union matching, a distance matching algorithm is also implemented, assuming the scenario of having no detections matched with previous person already detected, using the Intersection over Union score. The distance matching algorithm calculates the Euclidean distances of a person’s last position, to all other detected persons of the current frame. Then, if the nearest bounding box area and size match approximately the area of the previously tracked person’s box, the newly detected person is the previously targeted person. Furthermore, a Kalman filter is used to estimate the position of the tracked person in case of any occlusion, such as trees, or if the detector fails to detect the person but the person was moving. The Kalman filter updates its variables using the x, y coordinates of the matched box in the current frame, assuming a nearly constant speed model. Finally, all the data collected from the whole process are saved in CSV format files in order to be further processed for data collection and analysis. These files contain the location and trajectories of the persons in x,y pixel coordinates, and the moving direction for each person for each frame.

[1] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,”arXiv preprintarXiv:2004.10934, 2020

[2] D. Božić-Štulić, Ž. Marušić, S. Gotovac: Deep Learning Approach on Aerial Imagery in Supporting Land Search and Rescue Missions, International Journal of Computer Vision, 2019

[3] G. A. Mills-Tettey, A. Stentz, and M. B. Dias, “The dynamic Hungarian algorithm for the assignment problem with changing costs,”Robotics Institute, Pittsburgh, PA, Tech. Rep. CMU-RI-TR-07-27, 2007


Code written in Python is available here:

Leave a comment

Your email address will not be published. Required fields are marked *