Optical Flow-Enhanced Thermal UAV Detection Under Camera Ego-Motion for Real-Time Tactical C-UAV
Abstract
Thermal UAV detection from mobile platforms is difficult because camera ego-motion corrupts the motion cues needed to detect small airborne targets. We present an optical flow enhanced YOLO detector that fuses thermal appearance with dense horizontal and vertical flow channels through a custom OpticalFlowConv stem. On the Anti-UAV thermal benchmark (233,667 frames across 205 sequences), using a sequential per-sequence split that preserves temporal order, the proposed detector reaches 35.8% mAP50 and 21.9% mAP50-95, outperforming a single-frame YOLO11 baseline by 11.3 mAP50 points and a frame-differencing baseline by 9.1 points. These results support motion-enhanced thermal detection as a practical sensing component for mobile tactical C-UAV systems under significant camera motion.
Author Biography
Bob Maser
I am a researcher working at the intersection of computer vision, machine learning. My work spans both classical vision and modern deep-learning methods, with a strong focus on event-based perception, optical-flow modeling, multi-object tracking, and signal-based defect analysis for industrial applications. I am first author and co-authored with colleagues and supervisors. My current research includes: event-based computer vision (motion estimation, ego-motion compensation, event-pillar feature extraction for high-speed tracking); multi-object tracking (drone and flying-object detection using asynchronous event streams, optical-flow residual analysis, and hybrid RGB–event fusion); optical flow and motion analysis (RAFT-based flow estimation, rotational homography rectification, and learned flow-subspace projection); image and texture segmentation (satellite image segmentation, sealed-surface detection, and classical/deep segmentation pipelines); residual-based texture classification (statistical feature engineering and ML classifiers under compression and noise); and deep learning for welding signal analysis (current–voltage waveform interpretation, defect detection, signal segmentation, and reinforcement-learning models for predictive current estimation). Across these projects I work extensively with modern deep-learning frameworks, large-scale datasets, GPU-accelerated pipelines, and advanced visualization systems, primarily within the Vision and Image Processing (VIP) Lab and the Centre for Advanced Materials Joining (CAMJ).
John Zelek
Professor Zelek is a Professor and co-director of the VIP (Vision Image Processing) lab. He formerly served as the Associate Graduate Chair of Systems Design Engineering, from 2013 to 2017.
Professor Zelek’s current main research interests include autonomous robotic mapping and localization, 3D scene understanding, man made infrastructure assessment (e.g., roads, buildings, bridges), eye (fundus, OCT) image understanding for disease, learning 3D models from single-views, athletic sport tracking & biomechanical understanding of play & ability from video feeds, to name a few. Some of these projects make use of AI & deep learning techniques.
Prof. Zelek’s interests in the past have included assistive devices, social engineering, haptics, robot navigation to name a few.Professor Zelek has been the co-founder of two startup companies: Tactile Sight and Sweep3D. Tactile Sight commercialized a haptic navigation device for people who are cognitively (e.g., dementia) or perceptually (e.g., blind) disabled. Sweep3D commercialized technology that produces 3D models by sweeping a camera around objects or spaces for various applications including clothes fitting, orthotics as well as well as exploring real estate premises remotely. Professor Zelek also sits on the advisory boards for Intelligent Health Solutions Inc. and EyeCheck.
