20 20

Transactions on
Data Privacy
Foundations and Technologies

http://www.tdp.cat


Articles in Press

Accepted articles here

Latest Issues

Year 2025

Volume 18 Issue 2
Volume 18 Issue 1

Year 2024

Volume 17 Issue 3
Volume 17 Issue 2
Volume 17 Issue 1

Year 2023

Volume 16 Issue 3
Volume 16 Issue 2
Volume 16 Issue 1

Year 2022

Volume 15 Issue 3
Volume 15 Issue 2
Volume 15 Issue 1

Year 2021

Volume 14 Issue 3
Volume 14 Issue 2
Volume 14 Issue 1

Year 2020

Volume 13 Issue 3
Volume 13 Issue 2
Volume 13 Issue 1

Year 2019

Volume 12 Issue 3
Volume 12 Issue 2
Volume 12 Issue 1

Year 2018

Volume 11 Issue 3
Volume 11 Issue 2
Volume 11 Issue 1

Year 2017

Volume 10 Issue 3
Volume 10 Issue 2
Volume 10 Issue 1

Year 2016

Volume 9 Issue 3
Volume 9 Issue 2
Volume 9 Issue 1

Year 2015

Volume 8 Issue 3
Volume 8 Issue 2
Volume 8 Issue 1

Year 2014

Volume 7 Issue 3
Volume 7 Issue 2
Volume 7 Issue 1

Year 2013

Volume 6 Issue 3
Volume 6 Issue 2
Volume 6 Issue 1

Year 2012

Volume 5 Issue 3
Volume 5 Issue 2
Volume 5 Issue 1

Year 2011

Volume 4 Issue 3
Volume 4 Issue 2
Volume 4 Issue 1

Year 2010

Volume 3 Issue 3
Volume 3 Issue 2
Volume 3 Issue 1

Year 2009

Volume 2 Issue 3
Volume 2 Issue 2
Volume 2 Issue 1

Year 2008

Volume 1 Issue 3
Volume 1 Issue 2
Volume 1 Issue 1


Volume 7 Issue 3


PeGS: Perturbed Gibbs Samplers that Generate Privacy-Compliant Synthetic Data

Yubin Park(a),(*), Joydeep Ghosh(a)

Transactions on Data Privacy 7:3 (2014) 253 - 282

Abstract, PDF

(a) Department of Electrical and Computer Engineering, The University of Texas at Austin, USA.

e-mail:;


Abstract

This paper proposes a categorical data synthesizer algorithm that guarantees a quantifiable disclosure risk. Our algorithm, named Perturbed Gibbs Sampler (PeGS), can handle high-dimensional categorical data that are intractable if represented as contingency tables. PeGS involves three intuitive steps: 1) disintegration, 2) noise injection, and 3) synthesis. We first disintegrate the original data into building blocks that (approximately) capture essential statistical characteristics of the original data. This process is efficiently implemented using feature hashing and non-parametric distribution approximation. In the next step, an optimal amount of noise is injected into the estimated statistical building blocks to guarantee differential privacy or l-diversity. Finally, synthetic samples are drawn using a Gibbs sampler approach. California Patient Discharge data are used to demonstrate statistical properties of the proposed synthetic methodology. Marginal and conditional distributions as well as regression coefficients obtained from the synthesized data are compared to those obtained from the original data. Intruder scenarios are simulated to evaluate disclosure risks of the synthesized data from multiple angles. Limitations and extensions of the proposed algorithm are also discussed.

* Corresponding author.


ISSN: 1888-5063; ISSN (Digital): 2013-1631; D.L.:B-11873-2008; Web Site: http://www.tdp.cat/
Contact: Transactions on Data Privacy; Vicenç Torra; Umeå University; 90187 Umeå (Sweden); e-mail:tdp@tdp.cat
Note: TDP's web site does not use cookies. TDP does not keep information neither on IP addresses nor browsers. For the privacy policy access here.

 


Vicenç Torra, Last modified: 10 : 37 June 27 2015.