Differential Privacy and You

Posted by Guest on July 12, 2019 in Blog

In yesterday's blog post, we explained the Census Bureau’s legal reasons for protecting your data, how they’ve protected your data in the past, and why they are moving to a new method of data protection: differential privacy.

But how does differential privacy work? There’s a lot of detailed math behind it, which won’t be covered here. Instead, we’ll address the basic principles.

At its most basic, two databases are considered differentially private if they are identical except that one contains your data and the other does not, so that an outside observer cannot tell which database was used to calculate a given statistic. This effect is achieved by adding carefully calculated noise to a database so that the details of individuals’ records are obscured, but the overall pattern of individuals’ behavior is maintained. Researchers decide how much noise to insert and where based on sensitivity: the measure of how much a calculation would change if a particular individual was removed. The majority of Census data is a count of populations, so the sensitivity is 1, because removing one person from the total population will make the new population total one person smaller.

Essentially, individuals do not affect the statistics the Census Bureau reports; only the aggregate of many individuals has an effect. This prevents data reconstruction and reidentification, but leads to a corresponding drop in accuracy.

Every disclosure avoidance method reduces accuracy and usability of the data, sometimes in hard-to-detect ways. However, unlike traditional methods, a key concept in differential privacy is a privacy budget representing the balance of privacy and accuracy. The privacy budget is represented by a finite value. Much like your own personal budget, there is only so much you can spend to get a product you want. The more accurate you make your data, the less private it becomes, and vice versa.

Every piece of data that is released consumes a part of the privacy budget. Because differential privacy is quantifiable, unlike traditional methods, policymakers can calibrate where they want to fall on the privacy/accuracy curve based on the needs of the population they serve. Particularly sensitive data can be made more private and less accurate, and less sensitive data can be made less private and more accurate. The Census Bureau’s goal is to strike the right balance and release data that is private enough to be future-proof, but accurate enough to be usable.

However, the use of differential privacy is still controversial. Differential privacy as a practice is less than twenty years old, and has never before been implemented by a government for a project at the scale of the Census. Many researchers who rely on Census data are unconvinced that the Census’s confidentiality is even truly threatened by data reconstruction and reidentification attacks, since many people move in the ten years between one Census and the next, changing their physical location and the makeup of their household.

The Census Bureau is bound by Title 13, a law that requires Census employees to keep data confidential on pain of a $250,000 fine and a 5-year prison sentence. However, some researchers have argued that differential privacy is based on an excessively strict reinterpretation of Title 13 that prevents revealing a person’s characteristics, not just their identity. The Census Bureau, on the other hand, has stated that while Title 13 allows for release of data that contains any number of personal characteristics, it is impermissible to release that data if it can be used to actually identify individuals through reidentification.

Perhaps the biggest concern, however, is that differential privacy will affect our level of access to a variety of detailed data. Currently, loss of access seems inevitable - the Census Bureau does not have solutions for how to differentially protect tables representing questions with many response options, like detailed race/ethnicity data, or data generated by connecting multiple tables, like the count of children under eighteen by which relatives (if any) they live with. Among the community of data researchers and public policy workers, there are concerns that the restrictions differential privacy brings to the Census and the American Community Survey will render many scientific and public policy research projects impossible to complete. The data that differential privacy may restrict access to is used by everyone from local school districts to national policy advocates, and could have an impact on important projects, from the ability of educators to learn about trends in the living situations of low-income children to the ability of minority advocacy organizations like the Arab American Institute to serve their communities.

The Census Bureau is working with stakeholders, including Census Information Centers and State Data Centers, to determine what the must-have tables are for the 2020 Census data. The Bureau is discussing various solutions to the problems posed by differential privacy, and actively seeking to make as much data as possible publicly available. While the Census Bureau has never yet been threatened by attackers, caution is key; the data the Census collects may not help an attacker steal identities or make money, but the prospect of privacy violations – particularly with the specter of the citizenship question looming large over many Americans – could cause unease.

At this point in time, we do not know how differential privacy will affect us in practice. While debates about data security and how to balance between privacy and accuracy may seem abstract, it’s anything but: privacy affects the security of millions of citizens, and accuracy represents the ability of those same citizens to have their needs met. For the next ten years of American life, the data generated by the 2020 Census will have a real, measurable effect on how we live. The Census influences how the federal government distributes billions of dollars, and it enables advocates across the nation to research what our communities need for success.

One thing is sure, though: the 2020 Census will be a landmark moment and a key opportunity for Americans to directly engage with the political process. Make sure your voice is heard - join the YallaCountMeIn campaign today and help promote a fair and accurate count of Arab Americans!

All posts in this series are guest authored by Summer 2019 Ph.D. Fellow Emma Drobina. 

Differential Privacy Series

Securing the Census Part One: Why Is Data Security Important?

Disinformation and the Census Series 

Lies, Damned Lies, and Disinformation

An Introduction to Disinformation, Interference, and the 2020 Census

Misinformation and the Census

Phishing, Spoofing, and the 2020 Census