Skip to main navigation menu Skip to main content Skip to site footer

Seeing the Forest from the Trees: A Novel Deep Learning-Driven Aggregate Embedding for Group-Level Analysis of Public Health Data


In the years since the COMPASS dataset initiative was begun, many important research questions have been investigated using its large amount of health information pertaining to high school students across Canada, with findings guiding many decisions made by policy makers [1]. However, to use traditional statistical methods, specific data points must be selected by researchers to include in the analysis, leading to possible unexpected relationships and connections across the study's 280 data points being missed. As well, most analysis is done on a per-student basis, while policies are often implemented at the school level, so understanding behaviours across a school's population can make it easier for school decision makers to interpret findings. Motivated by these goals, this study introduces a novel deep learning-driven aggregate embedding method to determine group-level representations for individual schools from student-level survey responses based on architecture introduced in Variational Autoencoders [2]. This study aims to produce a method which allows for new patterns to be identified in the COMPASS data and for the resulting embedded representations to be applied in future analysis.