A new computational technique developed at The University of Texas at Austin has enabled an international consortium to produce an avian tree of life that points to the origins of various bird species. A graduate student at the university is a leading author on papers describing the new technique and sharing the consortium’s findings about bird evolution in the journal Science.
The results of the four-year effort — which relied in part on supercomputers at the university's Texas Advanced Computing Center (TACC) — shed light on the timing of a "big bang" in bird evolution, rearrange evolutionary relationships between some bird species and provide new insights on the origins of song pattern recognition in birds, as well as a host of other avian traits.
To build the new bird tree of life, researchers first sequenced the complete genomes of 48 living bird species. With about 14,000 genomic regions per species, the size of the data sets and the complexity of analyzing them required a new computing method, which was led by computer scientists Tandy Warnow, an adjunct professor at The University of Texas at Austin and professor at the University of Illinois at Urbana-Champaign; and Siavash Mirarab, a graduate student at The University of Texas at Austin.
Previous bird evolutionary trees were based on analyses of a few dozen genes as opposed to this latest study, which analyzed entire bird genomes. Those earlier studies did use more bird species (about 200 compared with 48), but with hundreds of times more genetic data per species in the latest study, the new bird family tree draws from far more data, resulting in some surprising findings such as that flamingoes are more closely related to pigeons than to pelicans and other water birds.
"In the computer science community, we often focus on how to make faster tools to analyze big data sets," said Mirarab, co-lead author on one of Science’s major papers about the project. "This project is exciting because it shows that it's not just about being bigger and faster. Simply having more data doesn't make you more accurate. You have to come up with more intelligent ways to analyze your data."
By testing the new technique, called statistical binning, on simulated data sets, the team demonstrated that their approach is more accurate than previous techniques.
The entire effort to construct an avian evolutionary tree took 400 years of CPU time and required the use of parallel processing supercomputers at TACC, the Munich Supercomputing Center and the San Diego Supercomputer Center. For the statistical binning portion alone, developing and testing the method took over 100 years of CPU time, divided between TACC and the Condor Cluster in the university's Department of Computer Science.
"TACC was essential," said Mirarab. "It's where most of the work on the statistical binning paper was done. We couldn't have done it without these supercomputers."
Mirarab and Warnow are part of the Avian Phylogenomics Consortium, which has so far involved more than 200 scientists from 80 institutions in 20 countries.
The consortium is led by Guojie Zhang of the National Genebank at BGI in China and the University of Copenhagen, Erich D. Jarvis of Duke University and the Howard Hughes Medical Institute, and M. Thomas P. Gilbert of the Natural History Museum of Denmark.
The group's first findings are being reported nearly simultaneously in 23 papers — eight in a Dec. 12 special issue of Science and 15 more in Genome Biology, GigaScience and other journals.
Mirarab was also co-lead author on a paper in the Proceedings of the National Academy of Sciences in October that used a different computational technique to reveal important details about key transitions in the evolution of plant life on our planet.
The National Science Foundation and the Howard Hughes Medical Institute funded Warnow and Mirarab.
For an interactive graphic of the new bird tree of life, go to: http://news.illinois.edu/infographics/birdtree.html
For a slideshow highlighting some of the most striking research results, go to: http://news.sciencemag.org/biology/2014/12/slideshow-untangling-bird-family-tree
For an audio slideshow of the 48 bird species and their sounds, go to: http://youtu.be/jM2BRSeb7S8
For more information, contact: Marc Airhart, College of Natural Sciences, University of Texas at Austin, mairhart@austin.utexas.edu; Siavash Mirarab, College of Natural Sciences, University of Texas at Austin, smirarab@gmail.com.
MORE INFO
Research Highlights
The Avian Phylogenomics Consortium, using a range of techniques including the new statistical binning technique, has yielded many new insights about bird evolution, including:
-When dinosaurs went extinct 66 million years ago, only a few bird lineages survived. But during the next few million years, new bird species proliferated. This contradicts the idea that the "bird big bang" happened 10 million to 80 million years earlier, as some recent studies suggested.
-The genes in the specialized song-learning brain circuitry of songbirds, parrots and hummingbirds evolved in a similar way to one another and to speech regions in human brains, making the four an example of convergent evolution.
-Flamingoes are more closely related to pigeons than to pelicans and other water birds.
-Falcons are more closely related to parrots than to eagles and other birds of prey.
-Researchers were able to infer a significant portion of the genome sequence of the common ancestor of all birds, dinosaurs and crocodilians, which lived more than 250 million years ago.
A Better Way
Until now, two primary computational methods have been used for drawing evolutionary trees based on genetic data, called concatenation and coalescence. But both have inherent weaknesses and often yield different trees. So scientists have argued for years about which one is better under which circumstances. A new technique developed by researchers Tandy Warnow and Siavash Mirarab, called statistical binning, helps bridge those two techniques by essentially improving on coalescence and showing that the results are similar to concatenation on the avian data set, and are as accurate or more accurate than concatenation in simulation studies on synthetic data sets.
The coalescence-based methods require that the evolutionary history can be inferred with high accuracy for many positions randomly distributed across the genome. The main drawback of the coalescence-based approaches is that for many genes across the genome, inferring their evolutionary history (the so-called gene trees) with high accuracy is impossible because of limited data available. In the statistical binning technique, genes or other short pieces of genetic information are combined into larger chunks called bins. There are, on average, about seven genes in each bin for the avian data set. By carefully picking which genes go into which bin, their method reduces noise in estimating the evolutionary histories for various parts of the genome. The less noisy data results in a more accurate species tree.
"We've addressed a weakness of the current approaches and found what may be one of the critical solutions to improving accuracy," said Warnow. "So it's actually addressing a big controversy in the field and perhaps resolving some of the controversy."
Comments