The increasing use of artificial intelligence in healthcare has created a strong need for large amounts of high-quality data, while privacy and ethical regulations continue to limit data access. Synthetic data offers a practical solution by allowing data to be shared and analysed without exposing information about real individuals. However, its adoption in healthcare has been slow due to uncertainty about how to assess data quality and privacy, as well as limited integration into real-world data systems.
This thesis explores how synthetic tabular data can be generated, evaluated, and safely used in healthcare settings. It shows that synthetic data can closely resemble real-world data, remain useful for analysis, and significantly reduce privacy risks at the same time. The work also demonstrates how synthetic data can be incorporated into secure data-sharing infrastructures, supporting responsible data use while protecting sensitive information. Overall, the results highlight synthetic data as a valuable tool for enabling privacy-preserving analytics in healthcare, while pointing to the need for further development to ensure robust and widespread real-world adoption.
