Anonymity not guaranteed: Identity of personal genomic DNA revealed by Web search

A few decades ago, people might have looked at you funny if you asked them to publicly share the intimate details of their personal lives—where they live, their age, what they had for dinner a few nights ago, photos of their children and more. However, between Facebook, Google, LinkedIn and the rest, it’s almost a trivial matter to find out people’s private details today. And soon, a new study suggests, your entire genome could get added to that list of personal information so easily found online—whether you want it or not.

“The issue is the current status of privacy,” says Yaniv Erlich, a geneticist at the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts, who led the research. “We need [sponsors of genomic studies] to be respectful to participants, to tell them the truth: that someone can identify you.”

To lift the mask off of genomic data that had been seemingly stripped of identifying information, Erlich and his team focused on the Y-chromosome, typically passed along with surnames from fathers to sons. Genetic ancestry services such as FamilyTreeDNA and Ancestry.com allow customers to trace their paternal genealogy through an analysis of a series of genetic markers known as short tandem repeats on the Y-chromosome (Y-STRs). As a free service, many of these companies also share their large databases of Y-STRs, with accompanying surnames and built-in search engines, to the public. Since demographic information, including year of birth and state of residency, are often included in published scientific reports, and can also be linked to surname records on sites such as such as PeopleFinders.com or USApeople-search.com, it proved relatively straightforward for Erlich and his colleagues to narrow the identity of DNA contributors down to small lists of likely suspects.

As an example, they tested their procedure on 10 ‘anonymous’ personal genomes, taken from the 1000Genomes project and the European Nucleotide Archive. They recovered surnames for half of these men with a high probability of accuracy. After an internet search, they identified not only the individuals to whom the genomes belonged, but their entire family trees. The findings were published today in Science.

Other research teams have previously developed methods for identifying participants from large public DNA databases, but their techniques have proven considerably more complex. In 2008, for example, a group from the Translational Genomics Research Institute in Phoenix, Arizona, used thousands of DNA probes in a microarray to prove that single nucleotide polymorphisms (SNPs), or single letter changes in the genetic code, could be used to identify individuals in genomic databases. Last year, another group at the Mount Sinai School of Medicine in New York generated a unique ‘SNP barcode’ from publicly available RNA data using a sophisticated statistical method, which could reveal individuals in genetic studies.

Neither study prompted major changes in privacy policy at the US National Institutes of Health. The ease with which Erlich’s team’s method can pinpoint identities with nothing more than an internet search engine is potentially more troubling—and opens a new realm of legal and ethical questions for the public.

The Genetic Information Nondiscrimination Act prevents insurers and employers in the US from discriminating based on DNA information, but few other legislative restrictions exist over the use of genetic data. There are already social consequences to publicly sharing personal information—even faces, which reveal details on ancestry, disease, family and mood—for advertising, shopping, law enforcement, news media and more. “With DNA, we’re just beginning to turn that page,” says George Church, a geneticist at Harvard Medical School in Boston.

Church believes that DNA could soon become an area of opportunity for businesses, the government and anyone with imagination. He envisions scenarios where people could scrutinize a potential date’s genome for psychiatric disorders or law enforcement agencies accessing names and family trees from forensic information.

For now, both Erlich and Church believe it’s important to raise awareness, for individuals who participate in genomic studies thinking their identity is anonymous. “I don’t want to say there are only risks. There are some, some that we don’t even know how to anticipate, but there are many benefits,” says Erlich. “In the long run, we want to promote legislation that will make it harder to misuse the data.”

Image: Shutterstock

Leave a Reply

Your email address will not be published. Required fields are marked *