Modout: Learning to Fuse Modalities via Stochastic Regularization

Fan Li; Natalia Neverova; Christian Wolf; Graham Taylor

doi:10.15353/vsnl.v2i1.103

Vol. 2 No. 1 (2016)

Articles

Modout: Learning to Fuse Modalities via Stochastic Regularization

https://doi.org/10.15353/vsnl.v2i1.103

Published 2016-10-03

Fan Li
Natalia Neverova
Christian Wolf
Graham Taylor

How to Cite

Li, F., Neverova, N., Wolf, C., & Taylor, G. (2016). Modout: Learning to Fuse Modalities via Stochastic Regularization. Journal of Computational Vision and Imaging Systems, 2(1). https://doi.org/10.15353/vsnl.v2i1.103

Download Citation

Abstract

Model selection methods based on stochastic regularization such
as Dropout have been widely used in deep learning due to their
simplicity and effectiveness. The standard Dropout method treats
all units, visible or hidden, in the same way, thus ignoring any a priori
information related to grouping or structure. Such structure is
present in multi-modal learning applications, where subsets of units
may correspond to individual modalities. In this abstract we describe
Modout, a model selection method based on stochastic regularization,
which is particularly useful in the multi-modal setting.
Different from previous methods, it is capable of learning whether
or when to fuse two modalities in a layer. Evaluation of Modout
on the Montalbano gesture recognition dataset demonstrates improved
performance compared to other stochastic regularization
methods, and is on par with a state-of-the-art carefully designed
fusion architecture.

PDF