Synthetic Data
Generation
From Foundations to Frontier
Welcome to Synthetic Data Generation: A Comprehensive Guide — a free, open digital book that takes you from core statistical concepts through cutting-edge generative models, privacy-preserving techniques, and real-world applications.
Whether you're a data scientist augmenting small datasets, a privacy engineer navigating GDPR constraints, or a researcher exploring generative AI, this book provides the theory, code, and context to generate high-quality synthetic data responsibly.
At a Glance
What You'll Learn
Statistical sampling, deep generative models, LLM-based text synthesis, tabular and time-series generation, multimodal synthesis, differential privacy, evaluation metrics, and production governance patterns.
Authors
Written by Mohammad Khalil and Sam Urmian.
How to Read
Use the sidebar to navigate. Start from Chapter 1 for a structured path, or jump directly to the topic you need. Most technical chapters include runnable code and implementation guidance.
License & Contact
Book content is CC BY 4.0 with attribution to Mohammad Khalil and Sam Urmian. Code is MIT licensed. Contact the authors through the repository issue tracker.