News blog

The German E. coli outbreak: 40 lives and hours of crowdsourced sequence analysis later

Posted on behalf of Marian Turner.

The outbreak of E. coli infections in Germany is fortunately on the wane, although 122 new infections were still reported over the weekend. Since the first infection was identified on 1 May, 3,593 people have been infected, of which 849 contracted the severe complication haemolytic uraemic syndrome (HUS) and 40 died.

On 10 June, the Robert Koch Institute announced that the bacterial strain responsible for the outbreak, O104:H4, had been found on an organic sprouts farm in northern Germany. While this answered the public’s most burning question of where the bacteria came from, scientists are looking for deeper origins.

Information is pouring out of collaborative open-source genomic analyses that have sprung up since the Beijing Genomics Institute released the full genome sequence of the outbreak strain on 2 June. An international group of scientists jumped on the opportunity to align and annotate the genome, which was sequenced using Ion Torrent technology, giving the infant platform a massive boost.

Five isolates of the strain have now been sequenced, and the scientists are sharing their data analysis on a GitHub site and discussing on Twitter. Kat Holt, microbiologist at the University of Melbourne and Mike the Mad Biologist are still posting regular updates on their blogs.

A detailed analysis of the sequence compared to previously isolated strains and a summary of the findings were also posted by Eurosurveillance on 16 June.

These analyses are crucial for understanding the evolution of the outbreak strain. Early analyses by Kat Holt found that the strain is most genetically similar to strain Ec55989, isolated in Africa over a decade ago. The Eurosurveillance study links it most closely to strain SRX067313, isolated in Germany in 2002. Scientists are still waiting for Helge Karch’s German reference laboratory at the University of Münster to release the genome sequence of an O104:H4 isolate from Germany from 2001.

All of the analyses show the strain to be an enteroaggregative E. coli which has acquired the gene for Shiga toxin. The fact that it is an EAEC strain means it is highly likely that it has resided unnoticed in humans until it gained its toxin-producing abilities. This is in contrast to the majority of pathogenic E. coli, which are spread to humans from ruminant animals via contamination of food.

Germany saw last week that the bacteria can not only arise in humans, but can be readily passed between them. An asymptomatic woman who was carrying the O104:H4 bacteria passed the infection to at least 20 others while working in a catering firm in northern Germany. The outbreak strain was also isolated during routine testing of water from a stream near Frankfurt on Friday. There is no word yet as to how the bacteria got into the water.

The scientists are calling for more isolates from the outbreak to be sequenced, and for historical samples to be dug out of the freezers of reference laboratories for sequencing too. “We need the sequences of multiple isolates to understand what mutations accumulate in a short time and which parts of the genome are under selective pressure”, says Holt. This might give clues as to the origins and line of transmission of the strain, she says.

Nick Loman, bioinformatician at the University of Birmingham, agrees, saying “There’s little historical data on these strains, and we really need multiple genomes to see if this strain is clonal or has multiple phylogenies.”

The scientists are hoping that further comparative genetic analysis, and tracking the genetic history of the strain, will show how the bacteria possibly got from Africa to the sprout farm. It might also provide clues as to whether more pathogenic strains are likely to arise from a human reservoir.


  1. Report this comment

    Peter Grandics said:

    What happened to the definitive identification of the strain in Spanish cucumbers, then tomatoes? What is the guarantee that this latest identification is any more correct than the previous ones?

Comments are closed.