How we built a better tomato

One species of wild tomato, Solanum lacerdae

One species of wild tomato, Solanum lacerdae{credit}Sandy Knapp{/credit}

Most wild tomato species bear little resemblance to the large, red fruits you’re used to seeing in the supermarket. This is because humans have been molding the tomato to their own taste for thousands of years, by selecting for larger, tastier and (of course) redder fruits.

As a consequence of this selective breeding, we have significantly altered the tomato genome. A new paper published online this week in Nature Genetics analyzed the genomes of 360 tomato accessions, including multiple wild species and cultivated varieties, to understand exactly how and where humans have left their mark on the tomato genome.

This study, the product of a collaboration between many groups around the world, found that human selection on the tomato has led to vast improvement in certain traits at the cost of dramatically reducing genetic variation in large swaths of the genome. An unintended consequence of historical selective breeding in tomato is that there is now little room for improvement on many traits that we care about. By identifying these regions, the study will allow tomato breeders to make more strategic plans for future crop improvement.

We asked one of the study’s senior authors, Sanwen Huang, to tell us a little more about the work and why it is important:

This study was obviously a huge undertaking. How did collaborations come about, and what were the major difficulties in the project?

As an international consortium, we sequenced the tomato genome together (Nature 2012) and this project was regarded as another milestone of tomato research. The difficulty in the current project was deciding what to sequence. Fortunately, our team includes experts who understand tomato germplasm and they studied the natural variation of tomatoes for a long time. As a corollary, we combined tomato lines from many well studied core collections from several countries, such as the US (Roger Chetelat), Israel (Dani Zamir), France (Mathilde Causse), Italy (Andea Mazzucato), and China (Yongchen Du, Zhibiao Ye, and Jingfu Li).

What do you see as the most important aspect of your study’s results?

There are several important results that came out of this work. First, the evolution of tomato fruit size had two stages, from the wild progenitor of the modern cultivated tomato, Solanum pimpinellifolium, to cherry tomato (from ~1g to ~10g), and from cherry tomato to big-fruited tomato (from ~10g to ~100g). We found that there are two independent sets of QTLs or genes that have been selected during the two evolutionary stages. Second, there is a huge genomic signature of the divergence between fresh tomato and processing tomato [tomatoes used for commercial canning], on chromosome 5. This genomic region harbors several genes related to higher soluble solid content and fruit firmness that were selected during breeding for processing tomato. And more interestingly, we noticed that in recent fresh tomato F1 breeding, this region was also exploited for better taste and longer shelf-life.  Third, we identified the causal variants for the pink tomato, which can be used for selective breeding. Pink tomato is a favorite in North China and I prefer it too, as it tastes better than the red ones. Finally, we found there have been costs to historical selection. For example, the near fixation of 25% of the tomato genome due genetic hitchhiking that occurred during domestication and improvement sweeps, as well as the linkage drags associated with wild introgression.

Cover of Nature, May 2012

Were you at all surprised to find such a large number of domestication and improvement sweeps? Did these results differ at all from other prominent vegetables, such as cucumber or potato?

The number and genomic proportion of domestication sweeps in tomato are similar to those in cucumber. However, the linkage disequilibrium blocks are bigger in tomato than in cucumber, possible due to the fact that tomato is a self-crossing species. Based on our data, we predict that the effective population size of tomato at domestication was about 300, similar to that of cucumber (~500), which is significantly smaller than that of maize (~150,000). This means these two vegetables have undergone much more severe bottlenecks during domestication as compared to maize.

How do you envision tomato breeders using the results of your study?

As a result of this work, tomato breeders will have a panoramic view of tomato variation and a better understanding of the raw materials used in their own breeding programs. From a practical standpoint, they will have access to a database of 11 million SNPs, from which they can pick the ones best suited to their molecular breeding programs. For example, they can combine the SNP dataset with their phenotypic data, to elucidate the genetic bases of important traits. Finally, and importantly I think, they will better understand the limitations of conventional breeding and the cost of historical selection, which will give them clues to improve their future programs.

NRCSHI07018_-_Hawaii_(716072)(NRCS_Photo_Gallery)

{credit}Photo courtesy of USDA Natural Resources Conservation Service{/credit}

Congratulations on your recent move to the Agricultural Genome Institute at Shenzhen where you are a co-founding director. Can you tell us a little about this new institute and what its goals are?

Thanks! The leadership of the Chinese Academy of Agricultural Sciences set up the institute (AGIS) to innovate agricultural research using genomics.

AGIS is located at the Dapeng District of Shenzhen, a beautiful bay area. The Shenzhen municipal government is developing the Dapeng Peninsula as the International Bio-valley and high-tech agriculture is one of the highlights. AGIS will recruit ~200 scientists who will decode, analyze, and utilize agricultural genomes. There will be three themes of research: the first theme is to develop basic algorithms and bioinformatic tools tailored for agricultural genomes, many of which are quite different from the human genome that has been the focus for most bioinformatians; the second theme is to empower agricultural breeding with genomics, to increase the efficiency and effectiveness of breeding that is essential to global food security; and the third theme is to provide genomic surveillance of food safety and agricultural environment, which is a huge concern of society and a need for sustainable development.

A vegetable market in Shanghai, China

A vegetable market in Shanghai, China{credit}nadja robot via Flickr.com{/credit}

Bonus question: What is your favorite vegetable?

China is a country of vegetables, as there are over 200 kinds of vegetables that are regularly consumed in the country. I enjoy the diversity. For fruit vegetables I like tomato, cucumber, and chili; for leaf vegetables, I like Chinese cabbage, lettuce, and coriander.

 

You can read more about this exciting study at The Scientist. Read the full paper here

Patients should learn about secondary genetic risk factors, say recommendations

Imagine getting a chest X-ray to identify the cause of a serious cough. The radiologist finds a shadow that wasn’t causing the cough but could be a tumour. In many cases, it is obvious what to do upon uncovering these sorts of secondary or incidental findings — most doctors would follow up on the search for a possible lung tumour, for example.

But genomic information presents a special case: genes are predictive, but not perfectly so, making some results murky. And many genetic diseases and predispositions to disease don’t have clear and obvious paths for clinical management, potentially making them a lifelong psychological burden.

Today, the American College of Medical Genetics and Genomics (AMCG) released recommendations for how genome-sequencing laboratories should report incidental findings after a doctor orders a full or partial genome sequence. It defines a minimum list of about 60 genes and 30 conditions that should be reported to the doctor as part of a patient’s care, whether the patient wants to know them or not. But the guidelines stop far short of recommending that all risk factors be passed on to doctors and patients.

Continue reading

DNA has limits, but so does study questioning its value, geneticists say

Scientists are irked over a paper claiming, as The New York Times reported on Monday, that “DNA’s power to predict illness is limited.”  “Yes,” geneticists have replied. “What else is new?”

Geneticists don’t dispute the idea that genes aren’t the only factor that determines whether we get sick; many  of them agree with that point. The problem, geneticists say, is not that the study, published on 2 April in Science Translational Medicine, arrived at a false conclusion, but that it arrived at an old, familiar one via questionable methods and is now being portrayed by the media as a new discovery that undermines the value of genetics. Here are the main criticisms of the new study and the resulting press coverage:

1. This study critiques the power of genomic medicine but does not contain any genome data. The paper is titled, “The predictive power of personal genome sequencing,” but it doesn’t include any sequence data. Instead, the authors analysed data on how often twins developed the same diseases. Because twins have very similar genomes but don’t always develop similar ailments, the authors, led by Bert Vogelstein and Victor E. Velculescu of the Johns Hopkins Kimmel Cancer Center in Baltimore, Maryland, assumed that the frequency with which the twins got the same illnesses reflects the power of their underlying genome sequences to determine their health. This assumption is not true (see point 4), and isn’t a good basis on which to dismiss the value of genome sequencing in the absence of data from large genome-sequencing studies, which are just now getting underway.

“Let’s fast-forward a year or two, when we’ve sequenced a million or two million people in whole-genome sequencing studies,” says Eric Topol, a cardiologist at Scripps Health in La Jolla, California, and author of The Creative Destruction of Medicine: How The Digital Revolution Will Create Better Health Care. “Then let’s see whether or not the predictive capacity is limited, or limited for certain conditions but not others.”

2. This study is beating a dead horse. Many other studies have already found that genes alone don’t predict a person’s risk for developing most diseases very well. They’ve also specifically questioned the value of commercial genetic tests that promise to reveal users’ risk for various illnesses. The new study doesn’t acknowledge any of the previous studies that have already arrived at the same answer and have done a better job of it, geneticists say (see point 3).

3. The mathematical model used in the study is unrealistic. Geneticists have developed a slew of mathematical models that try to predict how likely a person is to develop various diseases. Scientists debate how well these models work, but the models are largely based on how diseases actually behave in the real world. The Vogelstein–Velculescu model is not, say statisticians.

Vogelstein, Velculescu and their colleagues first developed a model that poses a theoretical idea of how diseases might behave. They then tested their model against data from twin studies. The model divides the universe of human genomes into 20 groups, or “genometypes.” Each of the genometypes encodes a certain disease risk and occurs with a certain frequency, but the authors don’t know how often different genometypes carrying various disease risks occur. To figure this out, they ask which combinations of disease risk and genometype frequency are realistic by comparing them to what they actually see in twin studies.

The problem with this approach, statistical geneticists say, is that it uses flawed data to test unrealistic assumptions. Geneticists know how often certain genetic risk variants for various diseases occur in the general population, and how much risk each of these variants confers. The new model ignores this information, and instead allows diseases to behave in ways that differ from how they behave in real life. “The particular parameters in the model don’t really correspond to anything in terms of real world behaviour of genetic risk variants,” explains Luke Jostins, a statistical geneticist at Cambridge University, UK. “This divorces the model from population-genetic plausibility, making the results potentially meaningless.”

By ignoring information about how diseases act in the real world, the new model also allows the authors to sidestep some controversial unanswered questions, such as whether standard models overestimate the genetic contribution to disease in twin studies. That could be a nice feature of the model, geneticists say. But because of the limitations of twin data, combined with the authors’ flawed analysis of these data (see point 4), there’s nothing in the paper to ground the new model in reality. If this were the first-ever paper to try to define the limits of genetic-disease prediction, it wouldn’t be convincing, says Jostins, who also blogs at Genomes Unzipped. “It’s very hard to interpret this model,” he says.

4. The study doesn’t correct for errors that can affect twin studies. The study assumes that genetics is the sole factor that determines whether two twins develop the same disease. But twins also grow up in a common environment, and the study doesn’t account for this, as the authors admit.

It’s also rare for both members of a twin pair to develop the same disease. So even a study such as this, which combines data from many different twin studies, suffers from a relatively small overall sample size of affected twins. That lowers the statistical reliability of its findings and introduces unpredictable errors into the study, Jostins says. Again, there are ways to account for for these errors, but this study doesn’t try to do that.

5. The media coverage of the study could weaken support for genetic research. Geneticists have lobbed some pretty heavy artillery at the Science Translational Medicine study, even though it claims to affirm what they already know. That’s because the new study has received more press coverage than your run-of-the-mill statistical genetics paper, and geneticists are concerned that the coverage has overblown the study’s conclusions in ways that could harm public support for science. “I don’t see the harm in telling the public yet again that there is no such thing as genetic determinism,” says Leonid Kruglyak a geneticist at Princeton University in New Jersey. “But I worry about the message being distorted to mean that genes have no value, or that genetic research is not worthwhile.”

Follow Erika on Twitter at @Erika_Check.