Self-supervised representation learning is fundamental in modern
machine learning, however, existing approaches often rely on conventional random image augmentations. This study introduces a
paradigm shift, moving from traditional random augmentations to a
contextually aware approach for feature space generation. We propose a novel augmentation technique that leverages object detection,
capturing spatial relationships and enriching feature representation.
We modify the SimCLR approach by integrating object detection, enabling the model to focus on relevant objects and their relationships.
Our experiments demonstrate that this augmentation yields semantically meaningful and contextually relevant feature representations.
Additionally, we employ the enriched feature space in multi-object detection, showcasing its versatility. Despite modest improvements, the
strategic integration of object detection hints at its potential to augment self-supervised methods. Our work underlines the significance
of contextual augmentation in self-supervised learning, paving the
way for improved downstream tasks and presenting exciting prospects
for future research.
PDF