Evaluation Methods for Synthetic Data in Pursuit of Open Data

Bing Hu; Mohammad Ahmed Basri; Abu Yousuf Md Abdullah; Shu-Feng Tsao; Zahid Butt; Helen Chen

doi:10.15353/jcvis.v9i1.10008

Vol. 9 No. 1 (2023)
Special Issue: Proceedings of CVIS 2023

Articles

Evaluation Methods for Synthetic Data in Pursuit of Open Data

https://doi.org/10.15353/jcvis.v9i1.10008

Published 2024-04-30

Bing Hu
Mohammad Ahmed Basri
Abu Yousuf Md Abdullah
Shu-Feng Tsao
Zahid Butt
Helen Chen

How to Cite

Hu, B., Basri, M. A., Abdullah, A. Y. M., Tsao, S.-F., Butt, Z., & Chen, H. (2024). Evaluation Methods for Synthetic Data in Pursuit of Open Data. Journal of Computational Vision and Imaging Systems, 9(1), 30–33. https://doi.org/10.15353/jcvis.v9i1.10008

Download Citation

Abstract

Real data containing sensitive or personal data often requires lengthy approval processes and stringent restrictions for access. Synthetic data that resembles the real data and is generated from the real data following FAIR standards is a promising approach to open data for administrative data. Although progress has been made in establishing accepted evaluations for synthetic data models, missing are key holistic metrics for policymakers to aid their decision-making on open data initiatives. In this paper, we introduce and demonstrate a privacy risk with an identity disclosure risk assessment (IDR), a quantitative measure of univariate distribution in Hellinger distance (HD), and a quantitative bivariate measure of differential pairwise correlation (DPC). By including our introduced privacy, univariate, and bivariate metrics in standard synthetic data evaluation, synthetic data models and methods can be better understood and utilized by policymakers in pursuit of open data.

PDF