Skip to main navigation menu Skip to main content Skip to site footer

A Hyperparameter Tuning Framework for Tabular Synthetic Data Generation Methods


This paper delves into the potential of synthetic tabular data as a viable alternative to real data, ensuring that essential information is used without compromising confidential information about individuals. Our experiments are centred around the deployment of the Conditional Tabular Generative Adversarial Networks (CTGAN) model for the synthetic data generation process. Recognizing the intricate nature of healthcare data and the precision it demands, the study emphasizes the importance of hyperparameter optimization of the synthetic generation process. By tuning these hyperparameters, our aim is to enhance the authenticity and relevance of the synthesized data, drawing it ever closer to real-world datasets. In an attempt to revolutionize data availability in healthcare research, we study different objective functions and their correlations for the most optimal combination of hyperparameters that results in the highest quality of synthetic data.