Apr 19, 2011

ID: CSI

Having gotten rather distracted by personal life and other subjects of interest, I now return to the Intelligent Design movement to have a look at their other signature concept, complex specified information, or CSI.  CSI--sometimes called 'functional information'--is the mana from heaven of creationist arguments.  "What is it?" we ask, and as was the case with the Israelites of old, no one knows.

In information theory, strings carry information which can be measured in a couple of different ways: Shannon information, which is a measure of how much the string reduces our uncertainty, and Kolmogorov information.  Kolmogorov information is the harder of the two to get a handle on.  If I have understood it correctly, the Kolmogorov information of string X is defined as the shortest string S and function F such that F(S)=X.  In other words, it is a measure of compressibility.  To the layperson, myself included, both theories are strange and counter-intuitive.   Nevertheless, they are rigorously defined, consistently computable, grounded in solid mathematics, and useful within their fields.

William Dembski, however, proposes a third kind of information: complex specified information.  This information, he claims, is unlike either Shannon information or Kolmogorov complexity.

The first crucial difference is that, in both of the standard formulations of information theory, the string XX contains more information than the string X—not necessarily twice as much, but always more.  However, in complex specified information, we are repeatedly told by Dembski himself and his cohorts that the string XX does not carry more information than the string X.  Usually this point is driven home by a tortured analogy to plagiarism: students can't just copy someone else's paper and then claim it's 'new' information.

The analogy is like the witch's nose in Monty Python and the Holy Grail: false.  What professors want isn't 'information' in the information theory sense: a page full of randomly generated characters would have rather more of that than the finest essay.  This is as stupid as saying that what your employers want is 'work' in the physics sense--they want you to do your job, not apply force over a distance. (This is where the counter-intuitive part comes in: written English has less information in the formal sense than randomized characters because English falls into computable, repetitive patterns—such as ' the ', ' and ', '. ' and ' a '--whereas randomized characters generally do not, and are therefore harder to compress.)  What they want is written evidence of thought, research, learning, and effort.

But let's look more specifically at the claim that, when dealing with CSI, the string X contains the same amount of information as the string XX.  Consider the binary string: 0010. In a four-digit binary system, it is the number 2.  It could also be a record of heads and tails in coin tosses.  Its Shannon information is 4 bits.   Now consider the string: 00100010, a simple doubling of the previous string.  That's the number 34, a completely different number (“new information”, to use the DI's buzzword).   It could also be a record of coin tosses, but it would be a record of twice as many of them.  It is immaterial that the second four fell out the same way as the first four: they might not have, and we didn't know until we saw the string (new information again).  It contains 8 bits of Shannon information—the extra four are 'new'.   Its Kolmogorov complexity is also increased (at the very minimum, the function has to contain a 'double the output string' command, whose size is always non-zero)--the increased information is 'new'.  More importantly, however, whereas before it was only a binary string, it can now be interpreted as an ASCII code, specifically the double quote character: “. Was this information present before?  No.  Is information that is present now but not before new?   Yes.  Are we justified in calling shenanigans when people try to weasel their way out of this problem by fudging definitions?  Absolutely.

These simple demonstrations might leave one wondering: why on earth would someone claim that doubling the string doesn't increase the information content?  Perhaps it is due to one of the main ways that new genes come into being: gene duplication.  Most genes are necessary to the organism to one degree or another.  If one is changed through mutation to code for a different protein, then the organism suffers for lack of the original gene, even if the new protein is also beneficial.  However, if a gene is first duplicated (through reverse transcriptase), and then the copy mutates in a later generation, the organism isn't missing a necessary gene.  The mutated gene can then stand or fall on its own merits.  This is the process that made the gene for nylonase, for example.  (In that particular case, one of the copies was altered by a frame shift mutation.)  However, by defining 'information' in such a way as to never be increased by string doubling, they get to claim that this well-known and well-documented process by which new genes come into being does not create 'new genetic information'.

But enough about string duplication.  Stay tuned for the thrilling two-part conclusion in which we examine mutations as a vehicle for information increase and propose a series of experiments by which the idea of CSI could be definitely vindicated.

No comments:

Post a Comment