Skip to main navigation menu Skip to main content Skip to site footer

Disentangling Shape and Orientation with Affine Variational Autoencoders


Is it be possible to disentangle an object's orientation from its shape? In this work we create compressed representations of an object by disentangling its orientation and shape with a variational autoencoder augmented with affine transform layers. Even when trained on randomly oriented data, shape and orientation are disentangled during training while the model learns to encode objects at a fixed orientation. We show this process results in a more compressed latent representation for 2d digits on the MNIST dataset, and for 3d objects on the ModelNet dataset.