TechBlog: Mike Goodstadt: A circuitous route to bioinformatics

Mike Goodstadt (2)

{credit}CNAG-CRG{/credit}

Most coders come to bioinformatics by one of two routes. They’re either biologists skilled in programming, or programmers with an interest in biology. Mike Goodstadt, the programmer behind the genome-visualization tool TADkit, took a different approach.

In the early-to-mid 1990s, Goodstadt was a student at the University of Bath in the UK. His course of study: Architecture. Continue reading

25 years of Nature Genetics

 

AprilThis April marks the 25th anniversary of the first issue of Nature Genetics, and I think it’s safe to say that the field of genetics has come quite a long way. In 1992, we were still nearly a decade away from the draft human genome sequence, “omics” was not yet a word in common usage, and CRISPR/Cas9 gene editing wasn’t even a pipe dream.

Most of the content in our current issue would have possibly seemed like far-fetched science fiction to geneticists in 1992. Take for instance the new-and-improved domestic goat genome assembly reported on page 643 of this issue, for which multiple, relatively new technologies were employed to create one of the most complete and contiguous genome assemblies to date. However, as the News & Views by Kim Worley exemplifies, science marches on. While the geneticists of the past might have marveled at the possibility of a whole-genome shotgun assembly (indeed, a major advance reported in that first issue was a new technology allowing for automated sequencing of 106kb), Worley refers to the scientists of the present who are “frustrated with the highly fragmented genome sequences available for most species.”

Still, many things have remained the same.

Taking a look back at the very first editorial published in the journal, much of the journal’s mission in 1992 is still applicable to 2017. Take this passage:

“Researchers should not be dismayed that developments like this are widely reported in the general press. That is merely a measure of the widespread compassionate interest in inheritable disease. Who can be but flattered by such public testimony to the importance of a field of research?

“The research community’s interest, rather, is that there should also be a wide general understanding that the identification of an aberrant gene does not imply that there is a cure at hand for the condition for which it is responsible. […] The elucidation of the mechanisms by which genes determine the behaviour of the cells that carry them will be a general preoccupation in the years ahead. Nature Genetics intends to play its part in the publication of this important research, and also of course, in classical genetics that throws light on the human genome.”

NG1992

{credit}doi:10.1038/ng0492-1{/credit}

While there is no denying that important medical advances have been enabled by the identification of disease genes, it is still painfully true that simply finding the gene does not directly lead to a cure on its own. Thus, both the identification of new disease-causing genetic alterations and studies that bring new mechanistic understanding of how a given mutation gives rise to disease are still core to the journal’s scope and aims.

The focus of the journal, as can be seen from this first editorial, was very much on human genetics at the beginning. Model organisms were considered just that, models for human biology. One of the major changes in the journal since that time has been our expansion to genetics (and genomics) more broadly, as represented by the many reference genomes and population genetics studies published for other organisms.

Too many landmarks to count

The editorial published in this month’s issue highlights a few selected articles from our among our more than 5,000 research publications over the years. These are obviously a restricted set of examples, and they are by no means the “best” papers, as such a ranking system would be ill-advised and ultimately useless. But the papers selected cover a wide range (though not all) of the sub-fields represented by the journal. This list includes landmark papers in human genome mapping (Kong et al. 2002) and cataloging of genetic variation (Iafrate et al. 2004); statistical methods that helped drive an entire field of research (Price et al. 2006); Mendelian disease gene discoveries that shed new light on biological mechanisms (Amir et al. 1999); key advances in the field of epigenetics (Heintzman et al. 2007); and advances in crop plant improvement (Ren et al. 2005).

We invite you to take a trip down memory lane and revisit these and other landmark papers from our archives. As a part of the celebration of 25 years of Nature Genetics, the editors will be blogging throughout April to highlight some of our past content.

A brief history of Nature Genetics

Nature Genetics was launched as the first of the Nature Research journals (if we ignore the very brief existence of Nature New Biology and Nature Physical Science in the early 1970s and the earlier version of Nature Biotechnology, Bio/Technology, published first in 1983).

While the history of genetics as field is by far more interesting than the history of a single journal, the occasion of our 25th anniversary has us thinking about our roots. For our 15th anniversary, founding editor Kevin Davies contributed a guest editorial telling the story of how Nature Genetics came about. I highly recommend that you check it out, if you haven’t seen it before.

Another feature of our 15th birthday celebration was the Question of the year. What would you do if the $1,000 genome were a reality today? To read the nearly 50 replies we received from leaders in the field, see the Question of the Year special here: https://go.nature.com/2mTMKBf.

The next 25 years

Just as researchers in 1992 would have been very unlikely able to predict the many breakthroughs that have occurred in genetics over the past 25 years, we have no idea where the next 25 years will take us. The goals will remain the same: to elucidate the mechanisms by which the genetic material produces the many phenotypic variations we see in nature and to identify the causes (and, more hopefully, cures) for human genetic disease.

That said, let’s take a stab at looking toward the future. What do you think will be the next major breakthrough in genetics? What will the field of genetics look like in another 25 years? Tell us below in the comments.

25 years from now, I hope to still be watching as geneticists make some of the greatest discoveries in biology. And I am confident that Nature Genetics will be there, playing its small role in announcing those discoveries to the world.

 

Escape gene name-mangling with ‘Escape Excel’

It’s been nearly a decade since Eric Welsh first noticed some weirdness with Microsoft Excel. A senior staff scientist in the Cancer Informatics Core at the H. Lee Moffitt Cancer Center and Research Institute in Tampa, Florida, Welsh was using Microsoft’s venerable spreadsheet application to view mouse and human gene expression data, the better to sort and understand the numbers. But a quick glance revealed the import hadn’t gone exactly as planned. “Excel would screw them up every time,” he says.

How so? When data are imported into Excel, the program works hard to figure out what kind of value each cell holds. Most of the time, Excel is smart enough to do that correctly, and values like ‘BRCA1’ and ‘12345’ are converted into text and integers, as expected. But “Excel is a little too smart for its own good,” Welsh says. If a cell reads “SEPT7,” the program assumes the author meant to write a date, and converts it automatically. It also sometimes translates what appear be numbers in scientific notation – say, ‘2310009E13’ – into actual scientific notation (‘2.31E+13’). The problem is, those two terms are neither dates nor numbers – they are proper names, scientifically speaking: gene names, sample identifiers or accession numbers. And by autoconverting them, those names are lost, or at least, obscured.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-80

Continue reading

Farm to Genomes: African Rice

Meyer at al., Nature Genetics, 2016

Meyer at al., Nature Genetics, 2016

Rice is one of the most important crops on the planet, responsible for feeding billions of people. Given this global significance, studying rice in different geographies can be useful and aid in harnessing genetic diversity underlying particular traits and adaptations favorable to different environments. African rice (Oryza glaberrima Steud.) is mainly grown in sub-Saharan Africa and known for its stress tolerance. In a new article this week in Nature Genetics, Michael Purugganan and colleagues report the whole genome re-sequencing of 93 African rice landraces from various regions of Western coastal and sub-Saharan Africa. They create a genome-wide SNP map and through comparative genomic analysis study the domestication and population history of African rice. They use their map to perform GWAS for salt tolerance and find 11 significantly associated regions, highlighting the value of this unique genetic resource.

Meyer et al., Nature Genetics, 2016

Meyer et al., Nature Genetics, 2016

By studying various regions with distinct environments, the authors were able to get clues about adaptation and geographic spread of the populations. They focused on coastal Senegal and inland Togo, which have higher and lower levels of soil salinity, respectively, and interviewed farmers in the region to understand the agricultural practices they employ in each region. The knowledge of the farmers helped to inform the genetic analysis and contributed to the model of African rice domestication and dispersal.

You can watch some of the interviews with the farmers here:

African rice farmers- interviews

Additionally, we spoke with authors Michael Purugganan and Rachel Meyer to get some background on this research.

Why do you think that rice is understudied in Africa compared to other places?

MP: I think it’s because it is not widely grown, unlike its Asian counterpart which has pretty much taken over the world.  But there definitely is more interest in African rice as breeders are trying to figure out how to increase food production in Africa, as well as to try to see what genes in African rice can be used to improve Asian rice.

RM: There is a lot of great research on improving Asian rice for African farmers that is being done by brilliant AfricaRice scientists, and they are working hard on the social science side too. But there are so many challenges that Africa disproportionately faces – particularly climate variation – that demands ramping up rice research. There is insufficient support for programs that integrate crop experiments and trials into the different farmlands. A better connection between scientists and small-scale farmers would really help farmers adopt new varieties too- because there is sometimes resistance to trying new ones.

How did you choose which samples to include in your analysis?

RM: Recognizing that a lot of NGO work encouraging farmers to grow Asian rice ramped up in the 80’s and 90’s, we took advantage of the germplasm largely donated in the 70’s to the West Africa Rice Development Association, which were duplicated and available through IRRI (International Rice Research Institute). We chose accessions with the most metadata available, preferring ones with georeferenced location and a cultivar name. It wasn’t until later that we realized water tables far inland were high in salinity, so we just tried to make sure we had a fair number of samples within 250km of the coast, or along rivers connecting to the ocean.

Were you surprised by any of your findings?

MP: There definitely were a few surprises in the data, but the big revelation for me was the long time for the population bottleneck that led to domestication.  We found from the genomic data that it may have taken more than 10,000 years of steady population decline before full-blown domesticated African rice shows up in the archaeological record.  This suggests the possibility that humans were already cultivating or managing its ancestor for thousands of years, and I think if this pattern holds for other domesticated crop species it will change our thinking on how domestication has taken place.

RM: I was surprised we got nice GWAS results with so few samples, and even more surprised that we saw several of those exhibiting signatures of geographic selection. We were lucky to find a broad distribution of traits in the landraces we chose to sequence, for we had made the DNA libraries ahead of the phenotyping experiments.

What was it like to meet and talk with the farmers?

RM: It was one of the highlights of my life to meet the farmers! I’m grateful to have gotten a glimpse of their heritage, their pride, and their struggles. We were all so impressed with the generosity of women, in particular, to help each other. We were also shocked by how many farms are run by the elderly; their children don’t see farming as profitable and many have left. For the three of us in the field, it made us think hard about how we can give back to the communities that gave us their time. I hope that crop science, publicity (like this blog) and policy changes can raise the profile of the small-scale farmer.

In each interview, the farmers also had a chance to interview us, and that part was especially interesting. Several asked really good questions about African and Asian rice domestication. You could see the cultural value of the basic science.

You chose to focus on salinity tolerance as a trait particularly relevant to farming in Africa.  In what ways do you see your results being used for crop improvement?

RM: One of the authors, from AfricaRice, Dr. Kofi Bimpong, had actually been working on salt tolerance separately as well, and has two graduate student collecting African rice landraces in Casamance. If from this paper we can consider that domestication possibly occurred in the Inner Niger Delta region and also in the West, then these collecting efforts are all the more important because they are from a center of origin, promising more genetic variation than people would have ever estimated. If you look through the available germplasm there is so little that has been collected or studied from Casamance. It’s tricky collecting there, for there is social unrest, and landmines. Hats off to the young graduate students, Mamadou Sock and Bathe Diop, doing that fieldwork; I’m sure there is a lot of discovery to be made with those collections, and more promising salt tolerant landraces to integrate into breeding programs.

In addition, our results suggesting many of the salt tolerance genes are shared in both rice species make them more valuable to explore in other crops.  Shared adaptive mechanisms are especially fascinating to evolutionary biologists and are powerful assets of the breeder’s toolbox.

June issue cover: What’s going on here?

Carrot canang sari by Rachel Meyer

Carrot canang sari by Rachel Meyer

As June comes to a close, it’s time to look back at our June issue and ask “what’s going on here?” with the cover image. As you may have guessed, the image is related to the publication of the carrot genome sequence in this month’s issue.

The cover image was provided by Rachel Meyer, a scientist who was not a co-author of the genome paper. Dr. Meyer was previously a postdoctoral researcher with Michael Purugganan at NYU and is an AAAS Science and Technology Policy fellow. She is also a co-founder of Shoots & Roots in New York.

Dr. Meyer gave us the following information about the carrot canang sari on the June cover:

Celebrating the recent availability of rainbow carrots year-round in Washington DC, I cut them in various ways and laid them out in a public dirt plot between the sidewalk and the street that was still bare because Spring had barely started and planting was far from beginning. The cold kept the carrots nicely preserved for three days. The installation took about 6 hours, and the design itself was lifted from a Persian carpet, sharing an origin with some of the earliest domesticated carrots. I had no intention to leave the installation there but people in the busy U-street/Shaw district, coming home late at night from the bars, would stop and photograph it, and even some of the suits interrupted their morning power walks to work to investigate it. After a few days, to my surprise it was not rats, but a middle-aged man who had decimated the carrots for a meal.

Shelby Ellison, an author of the carrot genome article this cover references, did this research as part of her NSF Plant Genome Postdoctoral Fellowship. We were in the same class of Fellows together and became friends because we would look for cool restaurants around DC together during our brief visits to NSF for annual Plant Genome meetings. I’m grateful to be able to explore the subject of her science through installation.

For more about the carrot genome paper, see our previous blog post, featuring Q&A with the corresponding author.

Cancer clones- mixing and spreading

Shah 1

McPherson et al., Nature Genetics 2016

The trajectory of tumor cells during metastasis can be influenced by many factors, including the physical environment and the genetic makeup of metastatic clones. In high-grade serous ovarian cancer, there are limited barriers in the intraperitoneal space, allowing for extensive spreading and mixing of tumor cells. A recent article published in Nature Genetics explores these different patterns of clonal evolution in metastatic ovarian cancer using a combination of bulk and single cell sequencing.

The authors characterized the mutation landscapes of different metastatic tumors and find both monophyletic and polyphyletic clones. While in most patients there was unidirectional seeding from the original ovarian tumor, two patients exhibited polyclonal spread and reseeding. Therefore, high-grade serous ovarian cancer cells can migrate through and establish metastasis within the intraperitoneal space via different evolutionary routes.

McPherson et al., Nature Genetics 2016

McPherson et al., Nature Genetics 2016

We spoke to lead author, Sohrab Shah, to get some background on this research.

What features of this particular cancer made you want to study its metastasis? Were you surprised by your findings?

High grade serous ovarian cancers are often widespread through the peritoneal cavity at diagnosis.  We wanted to ask what are the characteristics of cells that spread and what is the distribution of these cells throughout the abdominal lesions.  The focus was to study the disease state prior to any treatment to characterize the diversity and take in inventory of the ‘substrate’ of clones upon which treatment selective pressures may be acting.  Many patients experience relapse after initial response to treatment.  Mapping which clones lead to relapse remains a key question in the field.  This was borne out in one patient in our study where specific clones that led to relapses were already present at diagnosis but only represented a minority of branches in the clonal phylogeny.

It is important to note that the mode of spread in this disease differs from most solid cancers, where spread is achieved through the bloodstream or lymphatics.  Ovarian cancer represents a unique opportunity to study disease spread through a relatively physically unencumbered anatomic space.  One might expect that in such an environment the potential for clonal intermixing is high.   This might lead to many clones co-existing at many sites.  But the majority of intraperitoneal samples were clonally pure, suggesting unidirectional spreading from ovary sites with diverse clonal repertoires, and a lack of clonal intermixing.

You provide evidence that the microenvironment influences the metastatic success of tumors. What does this say about in vitro cancer models that don’t account for tissue context?

One of the intriguing findings suggested that specific clones were present in specific sites.  This may indicate that particular microenvironments are differently suited to particular clones. Another surprising finding was that every patient harbored at least one lesion that was very diverse in its clonal make-up (typically within primary ovary sites).  This leads to the natural question of whether properties of specific microenvironments in some way promote or ‘tolerate’ clonal diversity.  If this were the case, then both in vitro and in vivo model systems such as cell lines, organoids and mouse xenografts may not adequately represent the natural disease state we find in patients prior to treatment.

How did you choose your sampling strategy?

The study results are naturally biased by the sampling strategy.  The study design was subject to what material could be obtained during the provision of care.  In our setup, we consented patients for collection and study of all material removed at primary debulking surgery.  Wherever possible tissue was cryopreserved, but inevitably many deposits were preserved in formalin.   Our strategy led to acquisition of a median n=10 samples per patient.  The nature of the samples and their locations are presented in Figure 4 and are also available in interactive web-form at:

https://compbio.bccrc.ca/research/tumour-evolution/

Users can click on the links for each patient and explore the clonal maps.

You utilize both bulk and single cell sequencing as complementary approaches to elucidating tumor evolution. Can you comment on the trade offs between cost and throughput and how you chose your sample sizes?

The field is entering an interesting time.   There are several limitations to both bulk and single cell sequencing strategies to define the clonal constituents of a tumor sample.  Most single cell techniques suffer from vast under-sampling of the clonal repertoire since they are limited in throughput and may only practically yield data from 100s of cells.  Furthermore, single cell techniques are prone to two key experimental sources of noise: missing data and allele-dropout.  We used targeted, multiplexed single cell sequencing as a form of validation from inferences made from the bulk sampling including validating co-occurrence of point mutations and structural variations in the same cells.  Hypotheses were generated from multi-site bulk analysis and were then tested using orthogonal single cell approaches.   Accordingly, the sample sizes in single cell were chosen to identify clones that were detected in bulk samples – in the range of 5% prevalence.  Notably, the noise properties of targeted multiplexed single cell data required some careful statistical treatment, the results of which were published as a standalone contribution in Nature Methods simultaneously with this publication.  As the field moves forward, it may become practical to sequence the whole genomes of 1000s of cells per sample. I look forward to the day when a single experimental design would be sufficient to dissect the important clones present in a cancer.  This would enable studying evolutionary properties at scale, leveraging richly defined principles and statistical models from the field of population genetics.

You find that there are differences in the potential for migration and metastasis across the tumors from your patients. What clinical implications might this have?

Our study is underpowered to provide a clear answer on this.  Our results hint anecdotally that cases with strong patterns of unidirectional spread fared poorly in their treatment trajectories.  Whether cancers harboring clones with strong potential to invade new micro-environments and dominate their local landscapes indicates potential to evade chemotherapy remains an important question to consider.  As we take this study forward in model systems derived from spatially distinct sites, reproducible treatment selection experiments can be carried out to robustly address this question.

 

May issue cover: What’s going on here?

May2016This month’s cover image is inspired by the Article on p. 528 of this issue, by Jeff Wall, Nicola Illing, Nadav Ahituv and colleagues. The paper reports the genome of the bat Miniopterus natalensis and transcriptional dynamics in the developing bat wing. This species, one of a group known as vesper bats, is also known as the Natal long-fingered bat and is found in parts of Africa.

The image chosen for the cover is a frontal view of a bat embryo at a late stage of development (stage CS21) taken by study co-author Mandy Mason. This developmental stage is known as
“Translucent Wing”, as you can clearly see the skeletal structures in the wing and the membrane between the outstretched digits. The embryo in this image was stained with Alizarin red (maroon-red-pink) for bone and Alcian blue (blue-cyan) for cartilage. The image was actually taken as part of an earlier study to understand the progression of limb development in this species and to compare it with that of the mouse.

The current study presents not only the genome sequence of the Natal long-fingered bat, but also RNA-seq and ChIP-seq (for H3K27ac and H3K27me3) profiling of the developing limbs. The authors identified more than 7,000 genes that were differentially expressed between the forelimbs—the eventual wings—and the hindlimbs. Through comparative genomics analyses, they found nearly 3,000 regions showing evidence of accelerated evolution along the bat lineage that overlapped with H3K27ac peaks, suggesting that these are candidate enhancer regions for wing development. “This study offers a comprehensive resource for future work in comparative limb development,” co-author Mandy Mason told us. “Aside from the results that we have presented in this paper, these open datasets can be queried to help answer additional questions that may be asked by both our and other research groups.”

 

April issue cover: What’s going on here?

Tlalcacahuatl gold by Erin Dewalt

Tlalcacahuatl gold by Erin Dewalt

This month’s cover image is a visual tribute to the peanut and its importance to both the ancient civilizations of the Americas and modern agriculture. The genome sequences of the two progenitor species to the cultivated peanut were published in this month’s issue by David Bertioli and colleagues. The genome sequences are the first step to characterizing the genome of cultivated peanut, which was formed by the hybridization of these two species thousands of years ago. The genome sequences give us valuable clues about the evolution of these species. The authors also identified candidate genes for pest resistance, which could lead to advances in peanut cultivation in the future.

The image was inspired by a gold and silver necklace with beads in the shape of peanuts that was found in the tomb of the Great Lord of Sipan of the ancient Peruvian Moche culture. The necklace (c. 300) is now at the Museo Arqueológico Nacional Brüning in Peru. You can see an image of the necklace here and with more context here. The peanuts in the cover image have the same wavy shape as the beads in the necklace. The speckled texture and symmetric division of gold and silverish-blue in the cover image are also inspired by this ancient artifact.

Erin Dewalt, senior graphic designer for Nature Publishing Group, developed the image concept. She shows the peanuts underground, almost dangling from the plant above like beads. Peanut seeds develop underground after the flowers are fertilized. The ovary develops into a “peg” (gynophore) that drives back down into the soil, where it develops into the fruit that we cultivate as peanuts.

640px-Arachis_hypogaea_006

Peanut pegs growing into the soil. The tip of the peg, once buried, swells and develops into a peanut fruit. {credit}H. Zell via Wikimedia Commons{/credit}

The title of the image, Tlalcacahuatl gold, is a reference to the ancient Aztec name for peanut, tlalcacahuatl. But it is also a reference to the wealth represented by the peanut, both for ancient cultures and for modern agriculture. Because peanut plants fix nitrogen, thanks to the symbiotic bacteria in their root nodules, they return nutrients to the soil and improve cultivation of other crops (a fact famously advertised to farmers in the U.S. by George Washington Carver).

Tangential reading: The peanut necklace of the Great Lord of Sipan was almost lost to history forever. As this LA Times article from 1988 reported, grave robbers nearly made off with the treasures of the Lord of Sipan, including the necklace.  

 

December issue cover: What’s going on here?

December

{credit}Sahve Greef & Aurora Lupus{/credit}

This month’s cover image is related to the pineapple genome paper, but is also a celebration of all things genome. The cover art is from a collage produced by young artists Sahve Greef and Aurora Lupus. The image shows a pineapple outline with genome tracks or chromosomes contained within the scales of the outer fruit, all set on a background reminiscent of outer space.

We asked Sahve to give us some insight into the process that led to this design:

 I was working on the cover and had a difficult time creating my original concept which would have been genomes shaped like pineapples and than it became a pineapple silhouette with genomes shaped like pineapples inside, pineapple inception! It was becoming too complicated so I was thinking it over. Aurora  suggested creating the pineapple’s scales from genome tracks, and we began working together. Originally, I was composing the collage on an orange bristol board, but felt that it made the pineapple appear flat, and disappearing into the background too much. I wanted to create a dynamic image, one that exploded off the cover and made people wonder, “Hey, what’s going on with that crazy pineapple that just punched me in the eyeballs???”  I’m amazed by how incredibly small genomes are in relation to just about everything, and it makes me really think about how small we are in the universe. 

To see more of Sahve’s art, visit her Facebook and Tumblr sites. Aurora’s art can be found at her Tumblr site.

 

Original artwork (via Aurora Lupus on Instagram)

Original artwork (via Aurora Lupus on Instagram){credit}Sahve Greef & Aurora Lupus{/credit}

On the history of pigs

USDA_ARS_Meishan_pig-Cropped

{credit}Agricultural Research Service via Wikipedia{/credit}

Understanding the genomic changes that occurred during the domestication of animals and plants by humans is important on many levels. Such insights can provide information about human history and our interactions with other species, as is the case with genetic studies of dog and cat domestication. These studies can also help us to improve crop plants (such as tomato) and livestock (such as cattle) for human consumption or other use. Finally, genetic studies on domestication can help to identify disease-causing mutations that have been selected for as a by product of selection for beneficial traits (for example, in cats and dogs).

Though humans have a huge influence on important traits in domesticated species, those species are still responding to natural selection during the domestication process, which in turn may affect traits important for agricultural purposes. Identifying genomic regions influenced by positive natural selection in domesticated animals  can lead to important insights into the biology of specific breeds.

In this respect, the pig is an excellent model to study. Humans domesticated pigs approximately 10,000 years ago in the Near East and China, but a relatively open method of keeping pigs allowed for continued interbreeding with wild boars for some time. In a study published this week in Nature GeneticsLusheng Huang, Jun Ren and colleagues from Jiangxi Agricultural University sequenced the genomes of 69 diverse domestic and wild pigs in China to better understand their evolutionary history.

Pig sampling in China

Pig sampling in China{credit}Lusheng Huang{/credit}

The study included pigs from 11 diverse breeds (and 3 populations of wild boar) within China in order to compare the adaptations in breeds from cold vs. hot areas. They identified over 700 genomic regions that showed evidence of selective sweeps. Many of the genes in these regions were involved in processes important for regulation of temperature during cold or heat stress, such as hair development, energy metabolism and blood circulation.

However, one of the most striking results was the identification of a large (~14Mb) sweep region on the X-chromosome. More than 94% of the single nucleotide polymorphisms (SNPs) in the 69 pig sample that had extreme allele frequency differences between North and South populations were located within the X-linked sweep region. All Northern Chinese samples showed a strong signature of selection in this region. Upon further analysis, the authors were able to determine that the most likely scenario, given their data, was that this region was introgressed from a now-extinct species of Sus. This region of the X-chromosome undergoes very little recombination. This fact, combined with the strong signal of positive selection in the region, meant the introgressed sequence remained mostly preserved for more than 8 million years.

We asked one of the study’s senior authors, Lusheng Huang, to tell us a little more about the work:

How did you collect the DNA samples from the pigs for your study? Were any of the samples difficult to get?

We collected DNA samples from 4,100 three-generation consangeneously unrelated pigs representing all 68 indigenous breeds that are distributed in 24 provinces of China. It took us four and half years to complete sample collections, Some native pigs lived in the high attitude regions (Yunnan, Guizhou, Sichuan and Tibet) were very hard to get. Afterwards, we constructed a DNA bank for Whole China indigenous pigs. As a pilot study, we first genotyped 520 unrelated pigs (no common ancestor within 3 generations) from 32 Chinese breeds for 60K SNPs in the Illumina porcine beadchip. Then, we selected 69 representative pigs from the 520 pigs according to their genetic relationships in the neighbor-joining tree constructed with the 60K SNP data. The 69 pigs selected for whole-genome sequencing are highly rep­resentative of populations at the geographical extremes of China.

pig sampling

{credit}Lusheng Huang{/credit}

Most of the sampled pigs were originally raised in government-sponsored conservation farms. We selected animals to cover a majority of consanguinity of each breed according to their pedigree information. However, samples of several breeds were collected from isolated villages or farms at rural areas. For example, it was a big challenge for us to collect samples of Tibetan pigs from different geographic populations in the vast region of the Tibet Plateau. To find purebred Tibetan pigs that were not influenced by human-mediated hybrid with exotic breeds, we had to travel to remote pastoral areas at high altitudes and make an in-depth field investigation with the kind help of local residents. To cover the consanguinity of each Tibetan population as broad as possible, we preferably collected samples from Tibetan boars that are usually aggressive like wild boars and were really difficult to get (see above picture).

What do the positively selected regions tell us about the history of pig domestication?

These regions clearly illustrate that pigs have experienced natural selection for local fitness before (ancient event) or after (recent event) domestication. The selection footprints in the pig genomes can be visualized by whole-genome sequencing, characterized by reduced heterozygosity, excess of low-frequency variants, extended and differentiated haplotypes. The selected sweep regions harbor functional genes that play a role in adaptation to local environments. DCF17 and VPS13A are two such examples highlighted in this study.

What do you think was the most unexpected result in this study? Did you believe it at first?

The extremely divergent haplotype in the X-linked sweep region between Southern and Northern Chinese pigs, an indication of a possible ancient interspecies introgression event, was the most unexpected result in this study. It is a big surprise. Frankly speaking, we did not believe it at first.

Adapted from Fig. 4a in Huashui Ai et al. 2014

The pattern of haplotype sharing in diverse populations. The haplotypes were reconstructed for each individual using all of the variants on the X chromosome. Alleles that are identical to or different from the ones in the Wuzhishan reference genome are indicated by red and blue, respectively. Adapted from Fig. 4a in Huashui Ai et al. 2014{credit}Nature Genetics{/credit}

Why is the finding of a large introgression region on the X chromosome important?

Although evidence of adaptive evolution driven by introgression from archaic species has been recently identified in some species including humans, the X-linked introgression region shows that adaptive introgression is not limited to closely related species, but in some cases, introgression with very divergent species can provide the basis for the evolution of radically new traits in a species. This radical example of so-called ‘reticulate evolution’ in mammals shakes the foundation of most modern evolutionary biology and provides a new view of adaptive evolution that emphasizes saltationist (sudden) processes driven by introgression. Moreover, as discussed in the paper, our ability to detect this, potentially quite old, introgression event is facilitated by the fact that the introgression fragment falls in a recombination-decreasing region. This has allowed the introgressed haplotype to be maintained for a prolonged period. Our results may suggest that introgression generally plays a much more dominant role in adaptive evolution than previously thought, but has been difficult to detect because introgression fragments in other systems degenerate quickly due to recombination.

Do you think similar ancient introgressions have occurred in other domesticated species? If so, how would you test this?

We cannot rule out the possibility. If one wants to test this hypothesis, we would suggest to use a research strategy similar to that used in this study. First, we would need to get the genome sequences of multiple species divergent from a domesticated species. Then, we can perform a genome-wide scan for possible introgression regions from another divergent species in the domestic species. Several statistics of ABBA, F4, haplotype sharing and phylogenetic analysis can be explored to identify such ancient introgressions.

Erhualian

{credit}Lusheng Huang{/credit}

Bonus question: What is your favorite breed of domestic pig?

Erhualian, the most prolific pig breed in the world.