FreeStyle:Information

From ResearchID.org

Jump to: navigation, search
Stub

This page about FreeStyle:Information is a stub. Please help us make this article by contributing the missing information.

Contents

Layman's Definitions of Information

The word "information" has many intuitive meanings in common parlance. Two of the most popular can be described as "data that means something" (such as a work of Shakespeare) and "data that does something" (such as a computer program). The difference is critical - the former is fundamentally subjective so will not generally exist without an intelligence to perceive it, whilst the latter is fundamentally objective so may be able to be generated by unintelligent processes such as genetic algorithms.

These definitions, as intuitive as they are, have proven to be very hard to formalise. As such, it's effectively impossible to make generalised statements about them. The two mathematical definitions may be viewed as a partial attempt at formalisation of these intuitions.

Traditional Mathematical Definitions of Information

Shannon Information

The Shannon information content of a string is the inverse logarithm of the probability of receiving that string. This has the nice feature of being positive and additive (the Shannon information of "AB" is equal to the Shannon information of "A" plus that of "B"). Traditionally, Shannon information is calculated to the base 2, but that's just a convention.

A sample calculation:

Say we want to calculate the information content of the string "ILIKEMATHS" in the context of a given communications channel. Inherent in that channel will be a set of fundamental symbols - the dot-dash patterns of morse code, the two-byte characters of ASCII, the hand movements of sign language - and a set of probabilities for the appearance of each of those symbols.

For the sake of argument here we'll say that the symbols are letters and the probabilities are:

  • "A": 0.082
  • "E": 0.127
  • "H": 0.061
  • "I": 0.070
  • "K": 0.008
  • "L": 0.040
  • "M": 0.024
  • "S": 0.063
  • "T": 0.091

Then I("ILIKEMATHS") = I("AEHIIKLMST") = -(log0.082 + log0.127 + log0.061 + 2*log0.070 + log0.008 + log0.040 + log0.024 + log0.063 + log0.091) = 42.73

Note that the biggest single contributor to this total is the K - Ks don't appear very often. Shannon information has been graphically referred to as "surprisal" - it's a rough estimate of the surprise you feel on seeing a given string.

Note also that the information content of a random string - a string generated by selecting each letter with probability 1/26 - will in general be higher than that of a string produced according to the rules of English, as rare, high-information letters will be more likely to crop up.

Shannon information doesn't measure the amount of meaning that a string contains - it merely measures the amount that it could contain.

The information entropy of a communications source is the "expectation" of the information content of its output. Formally, H(A) = -sum_i(p(A_i)*log(p(A_i))). For example, the entropy of the English language defined in terms of letters, using the probabilities listed at the page linked to above, is about 4.18 to the base 2 or 2.90 to the base e.

Kolmogorov Complexity

Kolmogorov complexity, in common with other Algorithmic Information Theory definitions, is broadly speaking a measure of the computational resources necessary to generate a string. The Kolmogorov complexity of a string is the length of the shortest description (unique specification) of that string in some description language.

So, for example, using English as the description language the string "AAAAAAAAAAAAAAAAAAAA" could be written as "twenty 'A's" - that's two units of English. The string "NAILHBQHFSSARQBIYFDX" has no such short description so would have much higher K-complexity.

There is no general technique for measuring the K-complexity of a string, due to the Berry paradox. This proceeds as follows:

  • Step 1) Consider the set of all strings with K-complexity greater than 50 (i.e. they can't be described by less than 50 words of English)
  • Step 2) Select the shortest string. If there is more than one shortest string, pick the string that appears first when they're listed in alphabetical order.
  • Step 3) That string is now uniquely specified by the language used in steps 1 and 2 - a description of length 47. Therefore the K-complexity of that string is less than 50.
  • Step 4) This contradicts the fact that by definition the string has K-complexity greater than 50. Hence the premise that it's possible to assess the K-complexity of any string is wrong

However, in general it's very easy to put an upper bound on the K-complexity of a string - it's just lower bounds that are problematic. Intuitively, this just means that there may be a shorter description that you haven't yet stumbled across.

Note that, as with Shannon information, the K-complexity of a random string will generally be higher than the K-complexity of a non-random string (albeit for completely different reasons). As with Shannon information, K-complexity can be thought of as an upper bound on the amount of meaning carried by a string. Apart from these conceptual similarities, the two definitions will generally behave completely differently - for example, doubling a string will double the Shannon information of the string but only slightly increase its K-complexity.

Design-Theoretic Definitions of Information

Complex Specified Information

Copied verbatim from Defining Intelligent Design to avoid losing track of our definitions


Complex Specified Information, also known as Specified Complexity is an attribute of events that are very unlikely (i.e. high Shannon Information), very complex (i.e. high Kolmogorov complexity), and are specified (i.e. there is a description of them that is in some sense independently given).

It is claimed that a high-information specified event, such as a set of scrabble letters spelling out a long English word, cannot occur by chance. It is additionally claimed that a high-complexity event cannot occur as a result of natural regularity. If these claims are true, and the options of chance, natural regularity and intelligent design are exhaustive, then the conclusion can be drawn that an event with high CSI is the product of intelligent design.

It is generally accepted that there exist biological structures with high CSI, so such a conclusion, if accurate, would be problematic for evolutionary biology.

See the Specified Complexity page for more detail.

Personal tools