20 20

Transactions on
Data Privacy
Foundations and Technologies

http://www.tdp.cat


Articles in Press

Accepted articles here

Latest Issues

Year 2025

Volume 18 Issue 2
Volume 18 Issue 1

Year 2024

Volume 17 Issue 3
Volume 17 Issue 2
Volume 17 Issue 1

Year 2023

Volume 16 Issue 3
Volume 16 Issue 2
Volume 16 Issue 1

Year 2022

Volume 15 Issue 3
Volume 15 Issue 2
Volume 15 Issue 1

Year 2021

Volume 14 Issue 3
Volume 14 Issue 2
Volume 14 Issue 1

Year 2020

Volume 13 Issue 3
Volume 13 Issue 2
Volume 13 Issue 1

Year 2019

Volume 12 Issue 3
Volume 12 Issue 2
Volume 12 Issue 1

Year 2018

Volume 11 Issue 3
Volume 11 Issue 2
Volume 11 Issue 1

Year 2017

Volume 10 Issue 3
Volume 10 Issue 2
Volume 10 Issue 1

Year 2016

Volume 9 Issue 3
Volume 9 Issue 2
Volume 9 Issue 1

Year 2015

Volume 8 Issue 3
Volume 8 Issue 2
Volume 8 Issue 1

Year 2014

Volume 7 Issue 3
Volume 7 Issue 2
Volume 7 Issue 1

Year 2013

Volume 6 Issue 3
Volume 6 Issue 2
Volume 6 Issue 1

Year 2012

Volume 5 Issue 3
Volume 5 Issue 2
Volume 5 Issue 1

Year 2011

Volume 4 Issue 3
Volume 4 Issue 2
Volume 4 Issue 1

Year 2010

Volume 3 Issue 3
Volume 3 Issue 2
Volume 3 Issue 1

Year 2009

Volume 2 Issue 3
Volume 2 Issue 2
Volume 2 Issue 1

Year 2008

Volume 1 Issue 3
Volume 1 Issue 2
Volume 1 Issue 1


Volume 13 Issue 1


The Impact of Synthetic Data Generation on Data Utility with Application to the 1991 UK Samples of Anonymised Records

Jennifer Taub(a),(*), Mark Elliot(a), Joseph W. Sakshaug(b)

Transactions on Data Privacy 13:1 (2020) 1 - 23

Abstract, PDF

(a) Cathie Marsh Institute, The University of Manchester, Manchester, UK.

(b) Institute for Employment Research, Ludwig Maximilian University of Munich, and University of Mannheim, Germany.

e-mail:jennifer.taub @.manchester.ac.uk; mark.elliot @manchester.ac.uk; joe.sakshaug @iab.de


Abstract

Synthetic data generation has been proposed as a flexible alternative to more traditional statistical disclosure control (SDC) methods for minimising disclosure risk. However, a barrier to the use of synthetic data is the uncertainty about the reliability and validity of the results that are derived from these data. Surprisingly, there has been a relative dearth of research on how to measure the utility of synthetic data. Utility measures developed to date have been either information theoretic abstractions or somewhat arbitrary collations of statistics, and replication of previously published results has been rare. In this paper, we adopt a methodology previously used by Purdam and Elliot (2007), in which they replicated published analyses using disclosure-controlled versions of the same microdata used in said analyses and then evaluated the impact of disclosure control on the analytic outcomes. We utilise the same studies as Purdam and Elliot, based on the 1991 UK Samples of Anonymised Records, to facilitate comparisons of synthetic data utility between different utility metrics.

* Corresponding author.


ISSN: 1888-5063; ISSN (Digital): 2013-1631; D.L.:B-11873-2008; Web Site: http://www.tdp.cat/
Contact: Transactions on Data Privacy; Vicenç Torra; Umeå University; 90187 Umeå (Sweden); e-mail:tdp@tdp.cat
Note: TDP's web site does not use cookies. TDP does not keep information neither on IP addresses nor browsers. For the privacy policy access here.

 


Vicenç Torra, Last modified: 00 : 08 May 19 2020.