IIC Journal of Innovation 15th Edition | Page 9

Physical Distancing and Crowd Density Monitoring
Our solution uses the Tiny-Yolo-V2 4 deep learning model to perform the task of person detection . We chose the Tiny- Yolo-V2 model because it has fewer layers and thus fewer parameters compared to Yolo-V2 , allowing for fast inference on the edge server which is important for real-time deployment . For training the model , we annotated 5,000 video frames with a bounding box around each person .
The original architecture of Tiny-Yolo-V2 uses a combination of convolution and max-pooling layers to generate object detection feature maps of size 13 × 13 from an input image of size 416 × 416 . We modified the original architecture to generate feature maps over a finer grid of size 26 × 26 , to allow the model to detect people which appear small in the video frame . After training the model for several epochs and validating the person detection performance over a holdout dataset , we deployed the trained model on the edge server to detect people in each frame of the input video stream .
4 Redmon J , Farhadi A . YOLO9000 : Better , faster , stronger . Proc - 30th IEEE Conf Comput Vis Pattern Recognition , CVPR 2017 . 2017 ; 2017-Janua : 6517-6525 . doi : 10.1109 / CVPR . 2017.690 .
- 4 - November 2020