Skip to main navigation menu Skip to main content Skip to site footer

Generating Synthetic Geolocation Data using Conditional Tabular Generative Adversarial Networks: Issues and Challenges


The advent of generative adversarial neural networks (GANs) has created a new frontier for generating synthetic datasets having the same statistical properties as the real data from which it was generated. Real data comprises various types of data, which further complicates the synthesis process as specialized GANs are required to handle these data with diverse variable types. In this context, Conditional Tabular GAN (CTGAN) can be very useful as it has been found to handle various types of non-spatial data successfully. However, the use of CTGAN in generating spatial or geolocation data has remained largely unexplored. This study uses the Traffic Collisions Open Data from the Toronto Police Service to demonstrate the challenges involved in modeling geolocation data in CTGAN and reports the potential limitations of the deep learning data synthesizer in generating synthetic datasets with substantial geolocation and spatial components.