Publishing
exact statistical data creates mathematical risks and vulnerabilities that have
only recently been appreciated. In 2010, the U.S. Census Bureau collected
information on more than 308 million residents and published more than 8
billion statistics. Last year a Census Bureau red team performed a simulated
attack against this public dataset and was able to reconstruct all of
confidential microdata used in these tabulations with very limited error. They
matched 45% of these reconstructed records to commercial datasets acquired
between 2009 and 2011. 38% of these matches were confirmed in the original 2010
confidential microdata. These rates represent vulnerability levels more than a
thousand times higher than had been previously considered acceptable. As a
result of this internal test, the Census Bureau has adopted a new privacy
protection methodology called differential privacy to protect the data
publications of the 2020 Census. Differential privacy is based on
systematically adding statistical "noise" to data products prior to
publication. By carefully controlling the method by which the noise is added,
and through the use of advanced post-processing, the Census Bureau is able to
ensure the analytical validity of its statistical publications while protecting
the underlying confidential data on which those publications are based. It is hypothesized
that similar approaches could be used to protect other kinds of data products
that must be shared outside of a trusted community, such as statistical models
and cyber threat intelligence.