How Unique are You?

     Try it

     Samples

About this Service

This service reports how unique your demographics may be using Census data.

Birthdate (month, day and year of birth), gender, and 5-digit postal code (ZIP) uniquely identifies most people in the United States. Surprised? Perhaps at first, but then you do a quick calculation: 365 days in a year x 100 years x 2 genders = 73,000 unique combinations, and because most postal codes have fewer people, the surprise fades. Or, if you are still not convinced, there are more than 32,000 5-digit ZIP codes in the United States; so 73,000 x 32,000 is more than 2 billion possible combinations but there are only 310 million people in the United States. In 1997, Latanya Sweeney did this kind of uniform calculation on populations reported in the U.S. Census for age groups in each postal code and summed the results to predict that at most 87 percent of the U.S. population had unique combinations [1].

Of course the percentage drops as you give less specific information. Below is a figure from Sweeney's report that shows how the maximum percent of unique combinations may drop as you move from date of birth to age, and from 5-digit ZIP code to county. Notice that even knowing the county, age, and gender can make some people unique. They are few, and they tend to live in remote locations, but notice it is not 0.

This service allows an individual to see how unique his demographics may be using the latest Census data [2]. For example, the Census data reports 4 males of age 20 living in ZIP code 01008. There are 365 days in a year that a birthdate can assume, so at most, each can be unique based on date of birth, gender and 5-digit ZIP code. On the other hand, the Census data reports 1523 males residing in ZIP code 01003. Assuming all birthdays are equally likely and evenly distributed, then about 4 people (1523 / 365 = 4) would share the same date of birth; and no one, would be expected to be uniquely identified.

Why is this a problem? Often people share data about you with your name, address, and Social Security number removed, but your demographics may remain. If your demographics are unique in the general population, then they will be unique everywhere recorded. So the if information about you is in a dataset without your name and other information about you is in another dataset with your name, and both datasets have your demographics, these datasets can be linked together by matching your demographics.

In 1997, Sweeney showed how demographics appearing in medical data that did not have the names of patients can be linked to registries of people (e.g., voter lists) to restore name and contact information to the medical data [3]. Her earliest example was identifying the medical information of William Weld, former governor of Massachusetts, using just his date of birth, gender, ZIP appearing in a voter list. Numerous experiments have been done since then. Most recently, members of the Data Privacy Lab linked demographics found in publicly available health and genomic profiles in the Personal Genome Project to voter lists and other public information to put names to the profiles, and the results were 84-97% accurate for those profiles for which names were predicted [4].

References

[1] Simple Demographics Often Identify People Uniquely

[2] American Fact Finder. U.S. Bureau of the Census, 2010.

[3] Sharing Medical Data

[4] Re-identification, Personal Genome Project

Try it  |  Samples

Contact

Send the project leader, Latanya Sweeney, email at latanya@mit.edu or follow @LatanyaSweeney on twitter.


Copyright © 2013. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab   |