Modout: Learning to Fuse Modalities via Stochastic Regularization
Abstract
Model selection methods based on stochastic regularization such
as Dropout have been widely used in deep learning due to their
simplicity and effectiveness. The standard Dropout method treats
all units, visible or hidden, in the same way, thus ignoring any a priori
information related to grouping or structure. Such structure is
present in multi-modal learning applications, where subsets of units
may correspond to individual modalities. In this abstract we describe
Modout, a model selection method based on stochastic regularization,
which is particularly useful in the multi-modal setting.
Different from previous methods, it is capable of learning whether
or when to fuse two modalities in a layer. Evaluation of Modout
on the Montalbano gesture recognition dataset demonstrates improved
performance compared to other stochastic regularization
methods, and is on par with a state-of-the-art carefully designed
fusion architecture.