Skip to main navigation menu Skip to main content Skip to site footer

MonolithNet: Training monolithic deep neural networks via a partitioned training strategy

Abstract

In this study, we explore the training of monolithic deep neural net-
works in an effective manner. One of the biggest challenges with
training such networks to the desired level of accuracy is the dif-
ficulty in converging to a good solution using iterative optimization
methods such as stochastic gradient descent due to the enormous
number of parameters that need to be learned. To achieve this,
we introduce a partitioned training strategy, where proxy layers
are connected to different partitions of a deep neural network to
enable isolated training of a much smaller number of parameters
to convergence. To illustrate the efficacy of this training strategy,
we introduce MonolithNet, a massive residual deep neural network
consisting of 437 million parameters. The trained MonolithNet was
able to achieve a top-1 accuracy of 97% on the CIFAR10 image
classification dataset, which demonstrates the feasibility of the pro-
posed training strategy for training monolithic deep neural networks
to high accuracies.

pdf