Finding the hidden variation in the human genome

A new method from researchers at the Broad Institute improves variant discovery in the human genome.

A new method from researchers at the Broad Institute improves variant discovery in the human genome. {credit}Webridge via Wikipedia{/credit}

Identifying novel sequence variants is a crucial first step toward understanding the genetic basis of many diseases. However, current methods for variant calling, while very good in general, miss the variants in about 10% of the human genome. This 10% of the genome presents unique challenges, such as high GC content, low-complexity sequences and duplications.

In a paper published online this week at Nature Genetics, David Jaffe and colleagues present a two-hit improvement over existing methods. First, they modify existing methods for generating 250bp paired-end reads using a PCR-free protocol. Because PCR amplification isn’t used, the method significantly reduces coverage bias in the final sequence. Second, they present a new algorithm, called DISCOVAR, that is specifically designed to analyze these data and to call variants in the trickiest parts of the genome.

As proof-of-principle, the authors apply their methods to identify all sequence variants in approximately 4MB of  sequence from the human cell line GM12878, as compared to the human reference sequence. They found that the Illumina Platinum variant call set, which was based on 100bp reads, actually missed about 25% of the variants, mostly due to low coverage in challenging genomic regions.

The new sequencing and assembly method is comparable in cost to existing methods and paves the way for significant improvements in disease-associated variant discovery. We asked the study’s senior author, David Jaffe, to tell us a little more about the background of this work.

Your background is in mathematics, but you became interested in bioinformatics about 15 years ago. What inspired you to apply your skills to biological problems? What was the major difficulty you encountered in switching fields?

Well, suppose you had been a bricklayer for twenty years. It’s good, but after a while your love affair with the bricks wears off. And then you see this new exciting thing that you can do, that people seem to care a lot about. So you jump.

As for the major difficulty, imagine you’re making a change but you don’t really know anything about the new field—and other people know that! So you have to really listen to what they have to say, and hang in there.

How did the idea for DISCOVAR initially come about?

What our group does is look for ways to combine laboratory and computational improvements so as to achieve a better view of the genome (without breaking the bank). broad-logoWe’re always looking for new approaches, and the Broad is a good place to do that because there is a lot of lab innovation and the culture encourages lab/computational interaction. In the summer of 2012, we first saw 250 base-pair reads generated from PCR-free libraries (using Illumina technology), and we realized that these data had enormous power. So we set about designing an algorithm that might work exceptionally well on this data type. Also, we had two generations of assembly algorithms under our belt (ARACHNE and ALLPATHS-LG) and knew we could do better. The DISCOVAR laboratory protocols are available online at our DISCOVAR blog.

What was the most surprising result of your study?

The most surprising thing was the level of contiguity and completeness that could be achieved in local assemblies. The older methods yield a break every 20kb or so, but the new methods just keep going. One reason is that we no longer had PCR dropouts, loci with little or no coverage. Also much of our computational effort went into error correction that could correct almost any sequence, reducing the incidence of assembly holes attributable to polymerase slippage.  Consequently, in most cases, it would be possible to find nearly all the differences with a reference genome.

The DISCOVAR algorithm is designed to work on a specific type of sequencing data—is this sequencing method commonly used, or do you envision it becoming so? Do you think DISCOVAR could become the gold standard for variant calling?

Nearly concurrent with publication, Illumina announced official support for 250 base reads, thus eliminating the need to hack their protocol. We think people will switch because the new data give better results! Variant calling covers a lot of ground. For example, there are very good (economic) reasons why people will still want to sequence exomes. But for cases where the goal is to get a nearly complete inventory of all genomic changes, we think we have the best tool to date.

Image from the DISCOVAR demo site. The magenta edge represents a 30 kb heterozygous insertion in the reference sequence.  Each edge represents a DNA sequence. Red vertices “continue on” in full graph.

Image from the DISCOVAR demo site. The magenta edge represents a 30 kb heterozygous insertion in the reference sequence. Each edge represents a DNA sequence. Red vertices “continue on” in full graph. {credit}David Jaffe{/credit}

What were the major hurdles, if any, during the course of this research?

Squeezing the maximum information from the data is just a really hard problem, depending on the fine detail of the data properties (like exactly what sorts of sequences did we get wrong, and why). But to know right from wrong, we had to build a set of reference sequences for our control sample (NA12878), and getting these exactly right was itself a significant undertaking. All of this required an R&D effort with many iterations.

Bonus question: What does the name “DISCOVAR” stand for? Who came up with the name?

Coauthor Iain MacCallum came up with DISCOVAR, which stands for “discover variants.” We actually went through a series of other names first. We thought about Varitas, but somebody else was using it and we thought Harvard might sue us (ha ha). We found a similar name that nobody was using, but dropped it after a colleague told us its street meaning in Brazil…

Click here to read the full paper describing DISCOVAR

Make sure to check out the DISCOVAR blog from the Broad Institute and the online demo tool!

Promising sequencing contender Oxford Nanopore and market leader Illumina sever financial ties

Two closely watched genetic sequencing technology firms who had been unhappily affiliated have now divorced. UK-based Oxford Nanopore announced on 15 November that it has raised £56.4 million — mostly by selling the 13.5% of its shares that had been owned by San Diego-based Illumina since 2009.

Illumina had purchased the shares for $18 million in pursuit of an alliance that would give it a foothold in nanopore sequencing technology, in which different genetic bases are identified by changes in electric conductance caused when they are fed through a nanoscale pore. The technology is seen as highly promising because it offers the potential for very rapid sequencing at low cost. But after Oxford announced in 2012 that it was commercializing a version of its technology that is slightly different from the one in which Illumina invested, the two companies severed commercial ties and Illumina licensed a competing nanopore technology.

Oxford also said that it will begin allowing scientists to register to test its MinION portable genetic sequencer on 25th November in a “substantial but initially controlled programme designed to give life science researchers access to nanopore sequencing technology at no risk and for a refundable deposit of $1,000.”

The impetus for Oxford’s divestiture of Illumina shares isn’t yet clear. As computational biologist Mick Watson of the University of Edinburgh writes on his blog, Illumina may have figured that it would never make much money from the investment, as Oxford is now staking out a competitive position. “The simple answer may be that Illumina had nowhere to go with this,” Watson writes. “Therefore this is probably the logical conclusion — sell the shares and compete, try and beat [Oxford Nanopore] at their own game.”

So far, financial analysts give Illumina the edge in this game: “We continue to believe that [Illumina] has the dominant platform for the foreseeable future,” wrote Goldman Sachs analyst Isaac Ro in a research note on 15 November.

Scientists who have tested MinION so far have agreed, though they been impressed with the technology. Geneticist Yaniv Erlich of the Cambridge, Mass. Whitehead Institute for Biomedical Research wrote earlier this month that “MinION (and presumably its GridION scale-up) is far from being a threat to Illumina.”

Follow Erika on Twitter @Erika_Check.

Illumina, BGI spar over Complete Genomics

In a battle that will shape the market for DNA sequencing services, sequencing company Complete Genomics has received letters from rival suitors. Illumina, a dominant supplier of high-throughput DNA sequencing machines, and BGI, a giant sequencing-services firm, have both offered to purchase the Mountain View, California-based company, which has highly accurate, proprietary technology for sequencing human genomes.

Complete Genomics accepted BGI’s merger offer in September, but the deal still requires US regulatory approval. In addition, shareholders have filed suit to block the deal, saying that BGI’s purchase price of US$3.15 per share is too low.

Today’s letter from China-based BGI-Shenzhen says that its offer is in the best interest of stockholders, customers, and employees and argues that the offer from Illumina, based in San Diego, California, to buy Complete Genomics is intended merely to eliminate competition. “It is a thinly veiled attempt by Illumina to disrupt and interfere with our Merger Agreement in order to prevent Complete’s technology from posing a competitive threat to Illumina’s market dominance,” wrote BGI chief operational officer Ye Yin in a letter that also accuses Illumina of hypocrisy and knowingly making false assertions.

Yin also refuted allegations that the deal raised US national security issues, noting that BGI has been a major purchaser of Illumina machines and reagents and is a private company, not a state-owned enterprise.

BGI’s letter is a response to one sent last week by Illumina that touted the merits of its unsolicited proposal: a 5% premium over BGI’s bid and no need for approval by a US committee that considers foreign investment. What’s more, pointed out Illumina chief executive Jay Flatley, BGI’s proposal still requires receipt of financing and other approvals that had still not been completed two months after Complete Genomics and BGI announced their agreement.

Experts at Leerink Swann have noted that the offer makes sense for Illumina. If the deal goes through, Illumina keeps a competitor away from a large customer and expands its technology.

Several sequencing platforms are on the market, including ones made by 454, Life Technologies and Pacific Biosciences, and other technologies are in development (see ‘The battle for sequencing supremacy’; subscription required). However, Illumina, which successfully fought off a takeover bid earlier this year, dominates the space. The company itself claims that more than 90% of sequencing data comes from its machines. When the agreement between BGI and Complete Genomics was first announced, researchers largely welcomed it, saying that it would encourage competition and keep a valued technology available.

Meanwhile, Complete Genomics and Illumina are engaged in a patent dispute. In another announcement today, Illumina said that a judge would reconsider an earlier ruling that invalidated claims in one of its patents, which Illumina believes Complete Genomics has infringed upon.

Roche calls off Illumina takeover effort

Roche has backed off of its hostile takeover bid for Illumina. The move came after Illumina shareholders rebuffed Roche’s efforts to install board members favorable to a merger at the San Diego-based gene sequencing technology company’s 18 April annual meeting.

In a statement, Severin Schwan, CEO of Swiss-based Roche said, “We continue to hold Illumina and its management in very high regard but, with access only to public information about Illumina’s business and prospects, we do not believe that a price above Roche’s offer for Illumina of $51.00 per share would be in the interest of Roche’s shareholders.”

Roche had initially offered shareholders $44.50 in January, then raised its offer price to $51 in March. It was also trying to expand Illumina’s board and install favorable directors.

Illumina shareholders’ rejection of the measures was not surprising, as Illumina has fought the merger, and shareholder advisory firms had recommended against Roche’s advances.

“We are pleased that Roche has decided not to extend its inadequate offer to acquire Illumina and that we can now return our full focus to growing our business, making the most of the expanding opportunities in our space, and delivering superior results for our customers and stockholders,” Illumina CEO Jay Flatley said in a statement.

But Roche’s decision today that its offer for Illumina will expire on 20 April did surprise some analysts; most had predicted that Roche would be willing to extend or increase its offer even further to see the merger through, as the company did when it acquired Genentech and Ventana Medical Systems.

“I anticipated an extension of the offer at unchanged terms,” wrote analyst Martin Vögtli of Kepler Capital Markets in an email. “Initially, I thought that Roche is playing a tactical game,” perhaps hoping that Illumina would be more receptive to an offer later this year if it falters amid heavy competitive pressure and tightening government funding. “But after talking to Roche representatives I firmly believe now that this was the end of the bid.”

The move now raises pressure on Illumina to continue to dominate the sequencing market by both holding off its larger competitor, Life Technologies of Carlsbad, Calif., whose Ion Torrent technology debuted last year, and fending off companies with newer, potentially disruptive technologies such as UK-based Oxford Nanopore, which has said it will release its first commercial systems this year.

“Illumina will need to figure out longer term how to fight against Life’s much bigger sales channel,” wrote analyst David Ferreiro of New York-based Oppenheimer in an email. He notes that Illumina is under increasing pressure to cut the cost of both its machines and of their output: “Pricing will continue to be an issue, especially if Life’s Ion Proton delivers the $1000 genome,” as it has promised to do by the end of the year.

Vögtli says that Oxford’s potentially powerful platform may be one of the factors that dissuaded Roche from continuing to pursue the merger, along with Illumina’s resistance to a deal and with pressure from Roche shareholders not to pay too much for the acquisition.

“I think the move sends out a strong signal that cost discipline is high on the agenda and that Roche is no longer willing to overpay, especially for risky technologies,” Vögtli wrote.

Roche signaled that it may be interested in pursuing other sequencing companies, saying in its statement that it “will continue to consider options and opportunities to develop further its portfolio of businesses in order to expand its diagnostics leadership position.” But it is unclear who Roche could target; Life is a large company, while others are too new and have too little market share to be attractive targets for a firm whose main focus is bringing sequencing to the clinic.

The unraveling of the Roche bid heightens the competition in the race for the $1000 genome. Sequencing industry veterans had predicted that Illumina would become a much less innovative company under Roche’s management. But Illumina hasn’t clearly spelled out what technology will replace its current one under increasing pressure from other competitors that are just entering the market or on the immediate horizon. Illumina has a partnership with Oxford Nanopore, but it is for a separate technique than the one that Oxford itself is commercializing.

Analysts will be watching closely as Illumina announces its first quarter 2012 results on 23 April to see how well positioned the company is to profit from its continuing independence.

Follow Erika on Twitter at @Erika_Check.

Illumina board rejects Roche offer

The board of directors of San Diego-based Illumina today rejected a takeover bid launched by Roche, based in Basel, Switzerland. The move by Illumina, which estimates that its machines produce 90% of the world’s genetic-sequencing output, was not surprising, given the company’s earlier moves to deter the offer.

Illumina’s board “unanimously determined that the $44.50 per share cash offer is grossly inadequate in multiple respects, dramatically undervalues Illumina and is contrary to the best interests of Illumina’s stockholders,” the company said in a press release.

Illumina said that the timing of Roche’s bid was opportunistic, coming weeks after Illumina announced weak third-quarter earnings and said that it would lay off 8% of its workforce, sending its stock price for a dive. The company’s stock had traded as high as US$79.40 in July 2011.

Illumina’s fourth-quarter 2011 earnings, announced today, were 6.3% higher than its earnings in the last quarter of 2010, although when one-time charges were accounted for, the company’s earnings fell 70%.

Illumina said that it was best positioned to capture growth in the sequencing industry in areas such as “molecular diagnostics, reproductive health, cancer management and industrial-end markets such as agricultural biotechnology, veterinary medicine and forensics.” It also said that the company has “a robust line of new products and services, which the Board believes will create powerful new tools in the armaments of researchers and healthcare providers.”

Illumina’s press release says that two new platforms, the HiSeq 2500 and MiSeq, will diversify the company’s customer base beyond genome research centres, but does not mention what platforms may eventually replace these, which employ Illumina’s current technology. Illumina faces serious competition in the race to deliver the $1000 genome from Life Technologies of Carlsbad, California, and its Ion Torrent platform.

Illumina has a commercialization agreement with the UK-based company Oxford Nanopore Technologies, which is developing a new genetic-sequencing platform. However, Oxford Nanopore said earlier this month that it will commercialize its own DNA sequencing system this year, and that this system will employ a separate analysis technique from the one licensed to Illumina. Oxford will present the first data readout using its technology on 17 February at the Advances in Genome Biology and Technology conference in Marco Island, Florida.

Other documents released by Illumina today give more details of the Roche offer and of a lawsuit filed on 30 January against the Illumina board by shareholders who favour the offer.

Roche did not immediately respond to the rejection of its offer. Roche succeeded in previous hostile takeover bids for the companies Ventana Medical Systems and Genentech by considerably raising its offer price.

Follow Erika on Twitter at @Erika_Check.

The $1,000 genome: are we there yet?

The race to the US$1,000 genome heated up today as Life Technologies, based in Carlsbad, California, announced that it will debut a new sequencing machine this year that will eventually be capable of decoding entire human genomes in a day for less than $1,000. The machine, called the Ion Proton, will be the successor to the Personal Genome Machine made by the company Ion Torrent, a subsidiary of Life Technologies.

Not to be outdone, Illumina, the present market leader based in San Diego, California, said that it will release its own genome-in-a-day contender, the HiSeq 2500, in the second half of this year. Unlike Life Technologies, which is asking customers to buy an entirely new machine, Illumina says that it will be able to upgrade existing customers’ HiSeq 2000 machines for a relatively low price.

So how will this battle of the sequencers shake out?

Ion Torrent is positioning its new machine as a lower-cost alternative to Illumina’s $690,000 HiSeq.  Scientists seem willing to believe that the Ion Proton will reach its speed goals, largely because Ion Torrent’s present model, the Personal Genome Machine, is performing well for its customers. That sets Ion Torrent apart from other companies with novel technologies that couldn’t deliver on their first-generation models, such as Pacific Biosciences of Menlo Park, California, which switched CEOs last week amid financial and legal hiccups, and the Cambridge, Massachusetts-based Helicos, which continues to struggle with lackluster demand for its machines.

Continue reading