Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

Mohammad Javad Shaifee; Brendan Chywl; Francis Li; Alexander Wong

doi:10.15353/vsnl.v3i1.171

Vol. 3 No. 1 (2017)

Articles

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

https://doi.org/10.15353/vsnl.v3i1.171

Published 2017-10-15

Mohammad Javad Shaifee
Brendan Chywl
Francis Li
Alexander Wong

How to Cite

Shaifee, M. J., Chywl, B., Li, F., & Wong, A. (2017). Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video. Journal of Computational Vision and Imaging Systems, 3(1). https://doi.org/10.15353/vsnl.v3i1.171

Download Citation

Abstract

Object detection is considered one of the most challenging problems
in this field of computer vision, as it involves the combination
of object classification and object localization within a scene. Recently,
deep neural networks (DNNs) have been demonstrated to
achieve superior object detection performance compared to other
approaches, with YOLOv2 (an improved You Only Look Once model)
being one of the state-of-the-art in DNN-based object detection
methods in terms of both speed and accuracy. Although YOLOv2
can achieve real-time performance on a powerful GPU, it still remains
very challenging for leveraging this approach for real-time
object detection in video on embedded computing devices with
limited computational power and limited memory. In this paper,
we propose a new framework called Fast YOLO, a fast You Only
Look Once framework which accelerates YOLOv2 to be able to
perform object detection in video on embedded devices in a realtime
manner. First, we leverage the evolutionary deep intelligence
framework to evolve the YOLOv2 network architecture and produce
an optimized architecture (referred to as O-YOLOv2 here) that has
2.8X fewer parameters with just a 2% IOU drop. To further reduce
power consumption on embedded devices while maintaining performance,
a motion-adaptive inference method is introduced into
the proposed Fast YOLO framework to reduce the frequency of
deep inference with O-YOLOv2 based on temporal motion characteristics.
Experimental results show that the proposed Fast YOLO
framework can reduce the number of deep inferences by an average
of 38.13%, and an average speedup of 3.3X for objection
detection in video compared to the original YOLOv2, leading Fast
YOLO to run an average of 18FPS on a Nvidia Jetson TX1 embedded
system.

PDF