Wired Magazine’s cover story this month on The End of Science / The Dawning of the Petabyte Age (Anderson, Chris, vol. 16, no. 7, July 2008, pp 107-121) has a very mundane answer to John’s enthusiasm: just scoop up tones of seawater, sequence every piece of DNA that you find, and compare it to a database of known DNA. The system will be able to flag each strand as existing species / new species.
We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.
Unfortunately this doesn’t do much to tell us about what the creature is like.
If the words “discover a new species” call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn’t know what they look like, how they live, or much of anything else about their morphology. He doesn’t even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.
This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It’s just data.
But who knows, soon enough we’ll have software that will take a DNA sequence as input and produce a virtual model of a creature complete with visualization and tables of physiological data (bone density, blood chemistry, synapse count, etc.). We’ll never even have to find an instance of the creature.
Update, 25 June 2008: I think I’ve got my references a little crossed here. I titled the post The Jules Verne of the Future Will be a Computer Scientist for symmetry with John’s post, but Jules Verne is the author of the exploration stories, not the explorer himself, whereas the hypothetical computer scientist to which I am referring would be one of Jules Verne’s characters. The proper title should have been The Captain Nemo of the Future Will be a Computer Scientist.