Understanding BatchNorm in Ternary Training
Abstract
Neural networks are comprised of two components, weights and
activation function. Ternary weight neural networks (TNNs) achieve
a good performance and offer up to 16x compression ratio. TNNs
are difficult to train without BatchNorm and there has been no study
to clarify the role of BatchNorm in a ternary network. Benefiting
from a study in binary networks, we show how BatchNorm helps in
resolving the exploding gradients issue.