Robots require innovative, intelligent systems to effectively interact with potentially multiple humans simultaneously. A consortium of unique systems may be necessary to properly understand and respond to a variety of human behaviours. This extended abstract presents relevant imaging systems, including visual voice activity detection, gaze estimation, and identification of angular positions of humans relative to the robot. We show that video data alone provides a framework to interact with humans which is of high importance in multi-modal robotics systems. Data collection, processing, and initial results for these algorithms are presented.
pdf