Technologies!!!

Simulated reality

2009-03-20T06:11:00.000-07:00

Simulated reality is the proposition that reality could be simulated—perhaps by computer simulation—to a degree indistinguishable from "true" reality. It could contain conscious minds which may or may not know that they are living inside a simulation. In its strongest form, the "simulation hypothesis" claims it is possible and even probable that we are actually living in such a simulation.

This is different from the current, technologically achievable concept of virtual reality. Virtual reality is easily distinguished from the experience of "true" reality; participants are never in doubt about the nature of what they experience. Simulated reality, by contrast, would be hard or impossible to distinguish from "true" reality.

The idea of a simulated reality raises several questions:

Is it possible, even in principle, to tell whether we are in a simulated reality?
Is there any difference between a simulated reality and a "real" one?
How should we behave if we knew that we were living in a simulated reality?

Types of simulation

Brain-computer interface

In a brain-computer interface simulation, each participant enters from outside, directly connecting their brain to the simulation computer. The computer transfers sensory data to them and reads their desires and actions back; in this manner they interact with the simulated world and receive feedback from it. The participant may even receive adjustment in order to temporarily forget that they are inside a virtual realm (e.g. "passing through the veil"). While inside the simulation, the participant's consciousness is represented by an avatar, which could look very different from the participant's actual appearance.

Simulation-brain communications

If one were to effectively communicate with the brain, a code or sequence must be created/discovered to send information between the part of our brain that hears and talks.

Virtual people

In a virtual-people simulation, every inhabitant is a native of the simulated world. They do not have a "real" body in the external reality. Rather, each is a fully simulated entity, possessing an appropriate level of consciousness that is implemented using the simulation's own logic (i.e. using its own physics). As such, they could be downloaded from one simulation to another, or even archived and resurrected at a later date. It is also possible that a simulated entity could be moved out of the simulation entirely by means of mind transfer into a synthetic body. Another way of getting an inhabitant of the virtual reality out of its simulation would be to "clone" the entity, by taking a sample of its virtual DNA and create a real-world counterpart from that model. The result would not bring the "mind" of the entity out of its simulation, but its body would be born in the real world.

This category subdivides into two further types:

Virtual people-virtual world, in which an external reality is simulated separately to the artificial consciousnesses;
Solipsistic simulation in which consciousness is simulated and the "world" participants perceive exists only within their minds.

Emigration

In an emigration simulation, the participant enters the simulation from the outer reality, as in the brain-computer interface simulation, but to a much greater degree. On entry, the participant uses mind transfer to temporarily relocate their mental processing into a virtual-person. After the simulation is over, the participant's mind is transferred back into their outer-reality body, along with all new memories and experience gained within (as in the movie The Thirteenth Floor, or when one flatlines in Neuromancer).

Also worthy is mentioning the option of a completely virtual-person (born in the simulation) becoming somehow self-aware (after "waking up") and willing to escape the simulation, consequently somehow succeeding to be transferred into an outer-reality person (transcendent to the simulated world), and this option can be contributed to Gurdjieff's aspect in Fourth Way that "humans are not born with a soul. Rather, a man must create a soul through the course of his life".

This "creation of a soul" for a (by its nature soulless) virtual-person (part of the Program) would ultimately mean exiting (emigrating) and getting transformed on exit into a real (outer-reality) person, assuming the outer-reality is a realm of Spirit. And the (right) "course of life" in simulation would only be the preparation for that final act of emigration (transferring and related transforming).

In this case, since the emigrating inhabitant of the simulation didn't have an associated outer-reality person (user with a "real body"), this virtual person would be transferred into either a new outer-reality person (assuming that possible), or an already existing one, whether being a player of the simulation or not at all. And if being a player, that outer-reality person, as a user, would be previously associated with some other inhabitant from the simulated world and thus with "taking over" (or merging with) this chosen special previous-inhabitant that emigrates, he could choose to destroy that other/old inhabitant, or abandon him (leaving him then in the simulated world without a user temporarily or permanently). Or if neither destroying or abandoning, but willing to further 'play the simulation' and choosing to play that same old inhabitant (that didn't emigrate), he would do that now as a 'transformed' user ('enriched' with an emigrated virtual-person, or now even completely being that previously virtual person, if that was chosen and possible, and as such continuing to play the simulation using a 'new' virtual-person).

And the outer-reality person (which as self is transcendent to the simulated world) can be 'something' completely indescribable from the point of the simulated world, but as self(=soul), essentially emanates from the Spirit, with a 'personality' manifesting the Spirit.

Intermingled

Morpheus teaches Neo inside a small simulated reality

An intermingled simulation supports both types of consciousness: "players" from the outer reality who are visiting (as a brain-computer interface simulation) or emigrating, and virtual-people who are natives of the simulation and hence lack any physical body in the outer reality.

The Matrix movies feature an intermingled type of simulation: they contain not only human minds (with their physical bodies remaining outside), but also sentient software programs that govern various aspects of the computed realm.

Arguments

We are living in a simulation

Nick Bostrom's argument

The philosopher Nick Bostrom investigated the possibility that we may be living in a simulation. A simplified version of his argument proceeds as such:

i. It is possible that a civilization could create a computer simulation which contains individuals with artificial intelligence.

ii. Such a civilization would likely run many—say billions—of these simulations (just for fun; for research, etc.)

iii. A simulated individual inside the simulation wouldn’t necessarily know that it’s inside a simulation—it’s just going about its daily business in what it considers to be the "real world."

Then the ultimate question is—if one accepts that theses 1, 2, and 3 are at least possible— which of the following is more likely?

a. We are the one civilization which develops AI simulations and happens not to be in one itself? Or,

b. We are one of the many (billions) of simulations that has run? (Remember point iii.)

In greater detail, his argument attempts to prove the trichotomy, that:

either

intelligent races will never reach a level of technology where they can run simulations of reality so detailed they can be mistaken for reality (or this is impossible in principle); or
races who do reach such a level do not tend to run such simulations; or
we are almost certainly living in such a simulation.

Bostrom's argument uses the premise that given sufficiently advanced technology, it is possible to simulate entire inhabited planets or even larger habitats or even entire universes as quantum simulations in time/space pockets, including all the people on them, on a computer, and that simulated people can be fully conscious, and are as much persons as non-simulated people.

A particular case provided in the original paper poses the scenario where we assume that the human race could reach such a technological level without destroying themselves in the process (i.e. we deny the first hypothesis); and that once we reached such a level we would still be interested in history, the past, and our ancestors, and that there would be no legal or moral strictures on running such simulations (we deny the second hypothesis)—then

it is likely that we would run a very large number of so-called ancestor simulations to study our past;
and that, by the same line of reasoning, many of these simulations would in turn run other sub-simulations, and so on;
and that given the fact that right now it is impossible to tell whether we are living in one of the vast number of simulations or the original ancestor universe, the likelihood is that the former is true.

Assumptions as to whether the human race (or another intelligent species) could reach such a technological level without destroying themselves depend greatly on the value of the Drake equation, which gives the number of intelligent technological species communicating via radio in a galaxy at any given point in time. The expanded equation looks to the number of posthuman civilizations that ever would exist in any given universe. If the average for all universes, real or simulated, is greater than or equal to one such civilization existing in each universe's entire history, then odds are rather overwhelmingly in favor of the proposition that the average civilization is in a simulation, assuming that such simulated universes are possible and such civilizations would want to run such simulations.

Frank J. Tipler's Omega Point

Physicist Prof. Frank J. Tipler envisages a similar scenario to Nick Bostrom's argument, one that Tipler maintains is a physically required cosmological scenario in the far future of the universe: as the universe comes to an end in a solitary singularity during the Big Crunch, the computational capacity of the universe is capable of increasing at a sufficient rate that is accelerating exponentially faster than the time running out. In principle, a simulation run on this universe-computer can thus continue forever in its own terms, even though proper time lasts only a finite duration.

Prof. Tipler identifies this final singularity and its state of infinite information capacity with God. According to Prof. Tipler and Prof. David Deutsch, the implication of this theory for present-day humans is that this ultimate cosmic computer will essentially be able to resurrect everyone who has ever lived, by recreating all possible quantum brain states within the master simulation, somewhat reminiscent of the resurrection ideas of Nikolai Fyodorovich Fyodorov. This would manifest as a simulated reality. From the perspective of the inhabitant, the Omega Point represents an infinite-duration afterlife, which could take any imaginable form due to its virtual nature. At first glance, Tipler's hypothesis requires some means by which the inhabitants of the far future can recover historical information in order to reincarnate their ancestors into a simulated afterlife. However, if they really have access to infinite computing power, that is no problem at all—they can just simulate "all possible worlds". (This line of thought is continued in Platonic simulation theories). Tipler's argument can also be intertwined with Nick Bostrom's aforementioned argument from probability. If Omega Point will simulate an infinite number of virtual worlds then it would be infinitely more likely that our reality is in one of those simulated worlds, rather than in the lone real world that created the Omega Point.

Prof. Tipler's Omega Point Theory is predicated on an eventual Big Crunch, thought by some to be an unlikely scenario by virtue of a number of recent astronomical observations. Tipler has recently amended his views to accommodate an accelerating universe due to a positive cosmological constant. He proposes baryon tunneling as a means of propelling interstellar spacecraft. He states that if the baryons in the universe were to be annihilated by this process, then this would force the Higgs field toward its absolute vacuum, cancelling the positive cosmological constant, stopping the acceleration, and allowing the universe to collapse into the Omega Point.

Computationalism & Platonic simulation theories

Computationalism is a philosophy of mind theory stating that cognition is a form of computation. It is relevant to the Simulation Hypothesis in that it illustrates how a simulation could contain conscious subjects, as required by a "virtual people" simulation. For example, it is well known that physical systems can be simulated to some degree of accuracy. If computationalism is correct, and if there is no problem in generating artificial consciousness from cognition, it would establish the theoretical possibility of a simulated reality. However, the relationship between cognition and phenomenal consciousness is disputed. It is possible that consciousness requires a substrate of "real" physics, and simulated people, while behaving appropriately, would be philosophical zombies. This would also seem to negate Nick Bostrom's simulation argument; we cannot be inside a simulation, as conscious beings, if consciousness cannot be simulated. However, we could still be within a simulation, and yet be envatted brains. This would allow us to exist as conscious beings within a simulated environment, even if a simulated environment could not simulate consciousness.

Some theorists have argued that if the "consciousness-is-computation" version of computationalism and mathematical realism (also known as mathematical Platonism) are both true our consciousnesses must be inside a simulation. This argument states that a "Plato's heaven" or ultimate ensemble would contain every algorithm, including those which implement consciousness. Platonic simulation theories are also subsets of the multiverse theories and theories of everything.

Dreaming

A dream could be considered a type of simulation capable of fooling someone who is asleep. As a result the "dream hypothesis" cannot be ruled out, although it has been argued that common sense and considerations of simplicity rule against it. One of the first philosophers to question the distinction between reality and dreams was Zhuangzi, a Chinese philosopher from the 4th Century BC. He phrased the problem as the well-known "Butterfly Dream," which went as follows:

Once Zhuangzi dreamt he was a butterfly, a butterfly flitting and fluttering around, happy with himself and doing as he pleased. He didn't know he was Zhuangzi. Suddenly he woke up and there he was, solid and unmistakable Zhuangzi. But he didn't know if he was Zhuangzi who had dreamt he was a butterfly, or a butterfly dreaming he was Zhuangzi. Between Zhuangzi and a butterfly there must be some distinction! This is called the Transformation of Things. (2, tr. Burton Watson 1968:49)

The philosophical underpinnings of this argument are also brought up by Descartes, who was one of the first Western philosophers to do so. In Meditations on First Philosophy, he states "... there are no certain indications by which we may clearly distinguish wakefulness from sleep", and goes on to conclude that "It is possible that I am dreaming right now and that all of my perceptions are false".

Chalmers (2003) discusses the dream hypothesis, and notes that this comes in two distinct forms:

that he is currently dreaming, in which case many of his beliefs about the world are incorrect;
that he has always been dreaming, in which case the objects he perceives actually exist, albeit in his imagination.

Both the dream argument and the Simulation hypothesis can be regarded as skeptical hypotheses; however in raising these doubts, just as Descartes noted that his own thinking led him to be convinced of his own existence, the existence of the argument itself is testament to the possibility of its own truth.

Another state of mind in which an individual's perceptions have no physical basis in the real world is called psychosis.

Computability of physics

A decisive refutation of any claim that our reality is computer-simulated would be the discovery of some uncomputable physics, because if reality is doing something no computer can do, it cannot be a computer simulation. In fact, known physics is held to be computable.

The objection could be made that the simulation does not have to run in "real time". But it misses an important point: the shortfall is not linear, rather it is a matter of performing an infinite number of computational steps in a finite time. This objection does not apply if the hypothetical simulation is being run on a hypercomputer, a machine more powerful than a Turing machine. Unfortunately, there is no way of working out if computers running a simulation are capable of doing things that computers in the simulation cannot do. No one has shown that the laws of physics inside a simulation and those outside it have to be the same, and simulations of different physical laws have been constructed. The problem now is that there is no evidence that can conceivably be produced to show that the universe is not any kind of computer, making the Simulation Hypothesis unfalsifiable and therefore scientifically unacceptable, at least by Popperian standards.

CantGoTu Environments

The concept of a CantGoTu Environment takes the ideas embedded in the Diagonal Argument of George Cantor, the Undecidability theorems of Kurt Gödel, and the limits of computability highlighted by Alan Turing, and applies them to Virtual Reality environments. The argument is set out in The Fabric of Reality (1997) by David Deutsch, and runs thus:

Imagine a computer built to render every possible Virtual Reality. Suppose all possible environments produced by this generator can be laid out sequentially, as Environment 1, Environment 2, etc. Take time slices through each of these of equal duration. (Deutsch specifies one minute, but this could, in principle be anything, e.g. Planck time.) Now construct a new environment as follows. In the first time-period, generate in the environment anything which is different from Environment 1, and in the second time period, anything different from Environment 2, and so on. This new environment cannot be found in the sequential layout of environments specified earlier, as it differs from all possible environments by what happens in one particular time-slice. Hence this means that no such universal VR generator can be created, and there are environments which effectively can never be rendered by any means (since there are infinitely many).

[Yet if all possible virtual reality initial conditions have been simulated and still it is possible to create a reality that plays out differently to those already created (despite starting at an initial condition common to one of those already in existence) then that extra environment must obey slightly different cause and effect laws of reality, or else it would simply play out in the same way as one of those already simulated. This implies that the argument by Deutsch is only valid if the laws that govern each virtual reality may be different: i.e. they would have to allow inconsistencies such as objects suddenly disappearing or appearing out of nowhere for every time an environment transitions from one time slot to another. If instead one simply assumes that there are infinitely many possible initial conditions, since they vary by infinitesimally small amounts, then (even if all follow the same laws) there will be infinitely many possible virtual realities that could be generated, which leads to the same conclusion as Deutsch.]

However, later on in the book, Deutsch goes on to argue for a very strong version of the Turing principle, namely: "It is possible to build a virtual reality generator whose repertoire includes every physically possible environment."

However, in order to include every physically possible environment, the computer would have to be able to include a full simulation of the environment containing itself. Even so, a computer running a simulation need not have to run every possible physical moment to be plausible to its inhabitants.

Computational load

Virtual people

As of 2007^[update], the computational requirements for molecular dynamics are such that it takes several months of computing time on the world's fastest computers to simulate 1/10th of one second of the folding of a single protein molecule. To simulate an entire galaxy would require more computing power than can presently be envisioned, assuming that no shortcuts are taken when simulating areas that nobody is observing.

In answer to this objection, Bostrom calculated that simulating the brain functions of all humans who have ever lived would require roughly 10³³ to 10³⁶ calculations. He further calculated that a planet-sized computer built using known nanotechnological methods would perform about 10⁴² calculations per second — and a planet-sized computer is not inherently impossible to build, (although the speed of light could severely constrain the speed at which its subprocessors share data). In any case, a simulation need not compute every single molecular event that occurs inside it; it may only process events that its participants can actively perceive. This is particularly the case if the simulation contained only a handful of people; far less processing power would be needed to make them believe they were in a "world" much larger than was actually the case.

Brain-computer interface

Some have argued that a dream is a reality being simulated for certain parts of the dreamer's brain by other parts of the dreamer's brain — possibly showing that a 'computer' less powerful than a whole human brain can simulate oft-believable realities for the senses. Similar arguments would apply to vivid recollections, imaginings, and especially hallucinations. However, all of these things are usually less vivid and do not have to consistently obey the laws of physics, which our world does and which constraint presumably requires more computational power. (Another point som^e have made about hallucinations is that the hallucination cannot be interacted with in a rich, vivid way requiring simulation of multiple senses, possibly because the brain knows it does not have the computing power to support such interaction.)

Additionally, it's possible that the parts of our brains that question the validity of a situation are impaired when we sleep. The believability of a simulation is an important influence on the results it generates.

Validity of the arguments

In any case, it is perhaps erroneous to apply our current sense of feasibility to projects undertaken in an outer reality, where resources and physical laws may be very different. It also assumes designers would need to simulate reality beyond our natural senses.

Also, a simulated reality need not run in real time. The inhabitants of a simulated universe would have no way of knowing that one day of subjective time actually required much longer to calculate in their host computer, or vice-versa. Isaac Asimov pushed the limits of this by claiming that, unbeknownst to the inhabitants, the simulation could even run backwards, or in pieces on different computers, or with a million generations of monks working weekends on abacuses — all without the simulation missing a beat 'in simulation time'.

Nested simulations

The existence of simulated reality is unprovable in any concrete sense: any "evidence" that is directly observed could be another simulation itself. In other words, there is an infinite regress problem with the argument. Even if we are a simulated reality, there is no way to be sure the beings running the simulation are not themselves a simulation, and the operators of that simulation are not a simulation, ad infinitum. Given the premises of the simulation argument, any reality, even one running a simulation, has no better or worse a chance of being a simulation than any other.

Occam's razor

It has been noted that there is no definitive way to tell whether one is in a simulation. It is generally the case that any number of hypotheses can explain the same evidence. This situation often prompts the use of a heuristic rule called Occam's razor, which prefers simpler explanations over more complex ones, and is often implicated in skeptical criticisms of far-fetched hypotheses.

Since it is a heuristic rule, and not a natural law, it is not an infallible guide as to what is ultimately the truth, but only what is usually best to believe, all other things being equal. If we assume Occam's Razor applies, then it would tell us to reject simulated reality as being too complex, in favor of reality being what it appears to be.

Scientific and technological approaches

Software Bugs

A computed simulation may have voids or other errors that manifest inside. A simple example of this, when the "hall of mirrors effect" occurs in the first person shooter Doom, the game attempts to display "nothing" and, obviously fails in its attempt to do so. If a void can be found and tested, and if the observers survive its discovery, then it may reveal the underlying computational substrate. However, lapses in physical law could be attributed to other explanations, for instance inherent instability in the nature of reality.

In fact, bugs could be very common. An interesting question is whether knowledge of bugs or loopholes in a sufficiently powerful simulation are instantly erased the minute they are observed since presumably all thoughts and experiences in a simulated world could be carefully monitored and altered. This would, however, require enormous processing capability in order to simultaneously monitor billions of people at once. Of course, if this is the case we would never be able to act on discovery of bugs. In fact, any simulation significantly determined to protect its existence could erase any proof that it was a simulation whenever it arose, provided it had the enormous capacity necessary to do so.

To take this argument to an even greater extreme, a sufficiently powerful simulation could make its inhabitants think that erasing proof of its existence is difficult. This would mean that the computer actually has an easy time of erasing glitches, but we all think that changing reality requires great power.

Hidden messages or "Easter eggs"

The simulation may contain secret messages or exits, placed there by the designer, or by other inhabitants who have solved the riddle, in the way that computer games and other media sometimes do. People have already spent considerable effort searching for patterns or messages within the endless decimal places of the fundamental constants such as e and pi. In Carl Sagan's science fiction novel Contact, Sagan contemplates the possibility of finding a signature embedded in pi (in its base-11 expansion) by the creators of the universe.

However, such messages have not been made public if they have been found, and the argument relies on the messages being truthful. As usual, other hypotheses could explain the same evidence. In any case, if such constants are in fact infinite, then at some point an apparently meaningful message will appear in them (this is known as the infinite monkey theorem), not necessarily because it was placed there.

The Easter Egg Theory also assumes that a simulation would want to inform its inhabitants of its real nature; it may not. Otherwise, if we consider that the human race will eventually be capable of creating intelligent programs (i.e. machines) living inside a virtual subspace of our "real" world, then an interesting question would be to define whether or not we will be capable of suppressing from our sentient robots their capability of knowing their artificial nature.

Processing power

A computer simulation would be limited to the processing power of its host computer, and so there may be aspects of the simulation that are not computed at a fine-grained (e.g. subatomic) level. This might show up as a limitation on the accuracy of information that can be obtained in particle physics.

However, this argument, like many others, assumes that accurate judgments about the simulating computer can be made from within the simulation. If we are being simulated, we might be misled about the nature of computers.

Taken one step further, the "fine grained" elements of our world could themselves be simulated since we never see the sub-atomic particles due to our inherent physical limitations. In order to see such particles we rely on other instruments which appear to magnify or translate that information into a format our limited senses are able to view: computer print out, lens of a microscope, etc. Therefore, we essentially take on faith that they're an accurate portrayal of the fine grained world which appears to exist in a realm beyond our natural senses. Assuming the sub-atomic could also be simulated then the processing power required to generate a realistic world would be greatly reduced.

Digital physics and cellular automata

In theoretical physics, digital physics holds the basic premise that the entire history of our universe is computable in some sense. The hypothesis was pioneered in Konrad Zuse's book Rechnender Raum (translated by MIT into English as Calculating Space, 1970), which focuses on cellular automata. Juergen Schmidhuber suggested that the universe could be a Turing machine, because there is a very short program that outputs all possible programmes in an asymptotically optimal way. Other proponents include Edward Fredkin, Stephen Wolfram, and Nobel laureate Gerard 't Hooft. They hold that the apparently probabilistic nature of quantum physics is not incompatible with the notion of computability. A quantum version of digital physics has recently been proposed by Seth Lloyd. None of these suggestions has been developed into a workable physical theory.

It can be argued that the use of continua in physics constitutes a possible argument against the simulation of a physical universe. Removing the real numbers and uncountable infinities from physics would counter some of the objections noted above, and at least make computer simulation a possibility. However, digital physics must overcome these objections. For instance, cellular automata would appear to be a poor model for the non-locality of quantum mechanics.

Other issues

Non-player characters or "bots"

Some of the people in a simulated reality may be automatons, philosophical zombies, or 'bots' added to the simulation to make it more realistic or interesting or challenging. Indeed, it is conceivable that every person other than oneself is a bot. Bostrom called this a "me-simulation", in which oneself is the only sovereign lifeform, or at least the only inhabitant who entered the simulation from outside.

Bostrom further elaborated on the idea of bots:

In addition to ancestor-simulations, one may also consider the possibility of more selective simulations that include only a small group of humans or a single individual. The rest of humanity would then be zombies or "shadow-people" – humans simulated only at a level sufficient for the fully simulated people not to notice anything suspicious. It is not clear how much [computationally] cheaper shadow-people would be to simulate than real people. It is not even obvious that it is possible for an entity to behave indistinguishably from a real human and yet lack conscious experience.

The idea of "zombies" has a well known corollary in the video game industry where computer generated characters are known as Non-Player Characters ("NPCs"). The term 'bots' is short for 'robots'. The usage originated as the name given to the simple AI opponents of modern video games.

Subjective time

A brain-computer interface simulated reality may be required to progress at a rate that is near realtime; that is, time within it may be required to pass at approximately the same rate as the outer reality which contains it. This might be the case because the players are interacting with the simulation using brains which still reside in the outer reality. Therefore, if the simulation were to run faster or slower, those brains could notice because they were not contained within it.

It is possible that time passes slower or quicker for brains in a dream state (i.e., in a brain-computer interface trance); however, the point is that they still function at a finite, biological speed, and the simulation must track with them. Unless those interacting with the simulation are augmented and capable of processing information at the same rate as the simulation itself.

A virtual-people or emigration simulated reality, on the other hand, need not. This is because its inhabitants are using the simulation's own physics in order to experience, think, and react. If the simulation were slowed down or sped up, so also would the inhabitants' own senses, brains, and muscles, as well as every other molecule inside. The inhabitants would perceive no change in the passage of time, simply because their method of measuring time is dependent on the cosmic clock that they are seeking to measure. (They could perform the measurement only if they had some access to data from the outer reality.)

For that matter, they could not even detect whether the simulation had been completely halted: a pause in the simulation would pause every life and mind within it. When the simulation was later resumed, the inhabitants would continue exactly as they were before the pause, completely unaware that (for example) their cosmos had been paused and archived for a billion years before being resumed by a completely different director. A simulation could also be created with its inhabitants already possessing memories as though they had already lived part of their lives before; said inhabitants would not be able to tell the difference unless informed of it by the simulation. (Compare with the five minute hypothesis and Last Thursdayism).

One practical implication of this is that a virtual-people or a hybrid simulation does not require a computer powerful enough to model its entire cosmos at full speed. Per the Turing completeness theorem, a simulation can progress at whatever speed its host computer can manage; it would be constrained by available memory but not by computation rate.

Recursive simulations

A simulated reality could contain a computer that is running a simulated reality. The 'parent' simulator would be simulating all of the atoms of the computer, atoms which happen to be calculating a 'child' simulation. By way of illustration: imagine that a human is playing a game of The Sims in which one of the player's Sims (simulated people) is playing a computer game in the game. Alternatively, imagine a Java Runtime Environment running a virtual computer on a "real-world" computer that itself is located within a simulation.

This recursion could continue to infinitely many levels — a simulation containing a computer running a simulation containing a computer running a simulation and so on. The recursion is subject only to one constraint: each 'nested' simulation must be:

smaller than its parent reality, because its own memory must be a subset of the parent's;

...and must be at least one of the following:

slower than its parent reality, because its own calculations must be a subset of the parent's; or
less complex than its parent reality, via simplifications of processes that are computationally intensive in the parent reality; or
less complete than its parent reality, via approximations of objects that nobody is observing.

The latter is the basis of the idea that quantum uncertainties are circumstantial evidence that our own reality is a simulation. However, this assumes that there is a finite limitation somewhere in the chain. Assuming an infinite number of simulations within simulations, there need not be any noticeable difference between any of the subsets.

Simulated reality in fiction

Simulated reality is a theme that pre-dates science fiction. In Medieval and Renaissance religious theatre, the concept of the world as theater is frequent. Works, early and contemporary, include:

Literature

Neuromancer (1984) and Mona Lisa Overdrive (1988) by William Gibson
Otherland (1998) by Tad Williams
Permutation City (1994) by Greg Egan
The Metamorphosis of Prime Intellect (1994) by Roger Williams
The Reality Bug, a novel by D. J. MacHale, is set on a world destroyed by simulated reality.
Realtime Interrupt, a novel by James P. Hogan, is set in near future, a cyber reality with its creator trapped inside.
The Remnants series by K. A. Applegate is set on a ship which creates virtual landscapes.
Riverworld (1979) by Philip José Farmer
The Seventh Sally and The Princess Ineffabelle(from the cyberiad) by Stanislaw Lem
Simulacron 3 (1964) by Daniel F. Galouye
Snow Crash (1992) by Neal Stephenson
Sophie's World (1991) by Jostein Gaarder
Words Made Flesh (1987) by Ramsey Dukes
"They", a 1941 short-story by Robert Heinlein, focuses on a man who believes the universe was created to deceive him.

Film, plays & TV series

.hack//SIGN, an anime series about a person whose mind is trapped in an online computer role-playing game.
Avalon by Mamoru Oshii
The Red Dwarf episodes "Better Than Life" and "Back to Reality", by Rob Grant and Doug Naylor.
The Big O by Hajime Yatate and Chiaki J. Konaka. N.B. the reality in question has not been confirmed as simulated, but it is extremely likely.
Brainscan by John Flynn
"The Cage" and "The Menagerie", the unaired pilot and later episodes (respectively) of Star Trek, screenplays by Gene Roddenberry.
Cube 2: Hypercube (2002) written by Sean Hood
The Gamekeeper, an episode of Stargate SG-1.
Danger Room A training simulator from the (X-Men) universe.
Dark City by Alex Proyas, in which the sim is halted every night at midnight, rearranged, and then restarted. People are given false memories of different lives than they led in the previous 24 hours, reminiscent of last Thursdayism.
"The Deadly Assassin," an episode of Doctor Who written by Robert Holmes.
Die Another Day - James Bond, shows the protagonist wearing VR glasses which very closely reflect reality.
Eternal Family, a 1997 surreal comedy anime OVA.
eXistenZ by David Cronenberg, in which level switches occur so seamlessly and numerously that at the end of the movie it is difficult to tell whether the main characters are back in "reality".
Ghost in the Shell, a 1995 postcyberpunk anime film and series
Good Bye Lenin!, by Wolfgang Becker, a Berlin family tries to make the feeble mother believe that East Germany did not fall.
Aeon Flux took place in a cartoon world.
Harsh Realm the short lived TV series created by Chris Carter which took place in a virtual world.
The Island, directed by Michael Bay.
Jacob's Ladder, a 1990 thriller film directed by Adrian Lyne
Lost Highway, a 1997 movie by David Lynch
Lyoko The virtual world run by a super computer in the French anime Code Lyoko.
The Matrix series by the Wachowski brothers
The Thirteenth Floor, a 1999 film directed by Josef Rusnak
Megazone 23 (1985-1989), an anime OVA series created by Noboru Ishiguro and Shinji Aramaki based on a simulated reality of Tokyo controlled by a super computer.
In the Doctor Who universe.
The Nines, a 2007 film which, unknowingly to the viewer, is focused completely on the subject of simulated reality.
Noein a 24 episode anime directed by Kazuki Akane and Kenji Yasuda where a simulated reality is created.
Paranoia Agent by Satoshi Kon
Possible Worlds, both the play and the 2000 film adaptation of that play.
The Prisoner the tv series
Robotech: The Movie, a 1986 adaptation of Megazone 23.
"The Sentence", an episode of The Outer Limits television series.
Serial Experiments Lain, a 13 episode anime series by Chiaki J. Konaka.
"Ship in a Bottle", episode of Star Trek: The Next Generation, in which the fictional Professor Moriarty of Sir Arthur Conan Doyle's Sherlock Holmes stories is allowed to exist in a simulation of the world.
In the Star Trek fictional universe, particularly in and since the series Star Trek: The Next Generation, holodecks are simulators aboard starships and other facilities used for training and recreation.
The 13th Floor, 1999 film loosely based on the novel Simulacron-3 by Daniel F. Galouye
Total Recall, 1990 Paul Verhoeven film, based on a Philip K. Dick's story We Can Remember It for You Wholesale.
Tron (1982) by Walt Disney Pictures
The Truman Show, in which the titular character unknowingly lives his entire life in a false reality created to make a voyeur television show about him
The Twilight Zone has featured a number of episodes involving false or simulated realities of some sort.
Vanilla Sky by Cameron Crowe (a remake of Abre los ojos by Alejandro Amenábar).
La vida es sueño (Life is a Dream), a Spanish play by Pedro Calderón de la Barca (1600-1681) that evolved from the legends of the early years of Siddhartha Gautama, the Buddha.
The X-Files has featured a number of episodes involving simulated realities of some sort.
Welt am Draht a 1973 German film adaptation of Simulacron-3 from Rainer Werner Fassbinder.
Zegapain, a 2006 anime series.

Interactive fiction

A Mind Forever Voyaging by Steve Meretzky

Video games

.hack series.
Active Worlds
Assassin's Creed
Chrono Trigger
Creatures
Darwinia
Deus Ex
Digital Devil Saga
Eternal Sonata
Fallout 3
Final Fantasy X
Harvester
Metal Gear Solid 2: Sons of Liberty
Persona
Planescape:Torment
Second Life
Shadowrun
Shin Megami Tensei
Spore
Star Ocean: Till the End of Time
The Sims
The World Ends With You
There.com
Ultima (series), especially starting with Ultima V which simulated people's daily activities using a schedule, which was novel at the time.

Xenosaga (series)

Technological singularity

2009-03-20T05:56:00.000-07:00

According to Ray Kurzweil, the logarithmic graph of 15 separate lists of paradigm shifts for key events in human history show an exponential trend. Lists prepared by, among others, Carl Sagan, Paul D. Boyer, Encyclopædia Britannica, American Museum of Natural History and University of Arizona; compiled by Kurzweil.

The technological singularity is a theoretical future point of unprecedented technological progress—typically associated with advancements in computer hardware or the ability of machines to improve themselves using artificial intelligence.

Statistician I. J. Good first wrote of an "intelligence explosion", suggesting that if machines could even slightly surpass human intellect, they could improve their own designs in ways unforeseen by their designers, and thus recursively augment themselves into far greater intelligences. The first such improvements might be small, but as the machine became more intelligent it would become better at becoming more intelligent, which could lead to an exponential and quite sudden growth in intelligence.

Vernor Vinge later called this event "the Singularity" as an analogy between the breakdown of modern physics near a gravitational singularity and the drastic change in society he argues would occur following an intelligence explosion. In the 1980s, Vinge popularized the singularity in lectures, essays, and science fiction. More recently, some prominent technologists such as Bill Joy, founder of Sun Microsystems, voiced concern over the potential dangers of Vinge's singularity (Joy 2000). Following its introduction in Vinge's stories, particularly Marooned in Realtime and A Fire Upon the Deep, the singularity has also become a common plot element in science fiction.

Others, most prominently Ray Kurzweil, define the singularity as a period of extremely rapid technological progress. Kurzweil argues such an event is implied by a long-term pattern of accelerating change that generalizes Moore's Law to technologies predating the integrated circuit and which he argues will continue to other technologies not yet invented. Critics of Kurzweil's interpretation consider it an example of static analysis, citing particular failures of the predictions of Moore's Law.

Robin Hanson proposes that multiple "singularities" have occurred throughout history, dramatically affecting the growth rate of the economy. Like the agricultural and industrial revolutions of the past, the technological singularity would increase economic growth between 60 and 250 times. An innovation that allowed for replacement of virtually all human labor could trigger this singularity.

Critics allege that the singularity concept does not take into account increased energy resource usage by the new technologies, or the current physical (atomic) limits in electronic components miniaturization. However, by its nature, the theory implies the creation of currently unknown technologies and relies on the concept of improvements in one field affecting another — an event paralleled in the industrial revolution.

Intelligence explosion

Good (1965) speculated on the consequences of machines smarter than humans:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

Mathematician and author Vernor Vinge greatly popularized Good’s notion of an intelligence explosion in the 1980s, calling the creation of the first ultraintelligent machine the Singularity. Vinge first addressed the topic in print in the January 1983 issue of Omni magazine. Vinge (1993) contains the oft-quoted statement, "Within thirty years, we will have the technological means to create superhuman intelligence. Shortly thereafter, the human era will be ended." Vinge refines his estimate of the time scales involved, adding, "I'll be surprised if this event occurs before 2005 or after 2030."

Vinge continues by predicting that superhuman intelligences, however created, will be able to enhance their own minds faster than the humans that created them. "When greater-than-human intelligence drives progress," Vinge writes, "that progress will be much more rapid." This feedback loop of self-improving intelligence, he predicts, will cause large amounts of technological progress within a short period of time.

Most proposed methods for creating smarter-than-human or transhuman minds fall into one of two categories: intelligence amplification of human brains and artificial intelligence. The means speculated to produce intelligence augmentation are numerous, and include bio- and genetic engineering, nootropic drugs, AI assistants, direct brain-computer interfaces, and mind transfer.

Despite the numerous speculated means for amplifying human intelligence, non-human artificial intelligence (specifically seed AI) is the most popular option for organizations trying to advance the singularity, a choice addressed by Singularity Institute for Artificial Intelligence (2002). Hanson (1998) is also skeptical of human intelligence augmentation, writing that once one has exhausted the "low-hanging fruit" of easy methods for increasing human intelligence, further improvements will become increasingly difficult to find.

It is difficult to directly compare silicon-based hardware with neurons. But Berglas (2008) notes that computer speech recognition is approaching human capabilities, and that this capability seems to require 0.01% of the volume of the brain. This analogy suggests that modern computer hardware is within a few orders of magnitude as powerful as the human brain.

One other factor potentially hastening the singularity is the ongoing expansion of the community working on it, resulting from the increase in scientific research within developing countries.

Economic aspects

Dramatic changes in the rate of economic growth have occurred in the past because of some technological advancement. Based on population growth, the economy doubled every 250,000 years from the Paleolithic era until the Neolithic Revolution. This new agricultural economy began to double every 900 years, a remarkable increase. In the current era, beginning with the Industrial Revolution, the world’s economic output doubles every fifteen years, sixty times faster than in the agricultural era. If the rise of superhuman intelligences causes a similar revolution, one would expect the economy to double at least quarterly and possibly on a weekly basis.

Machines capable of performing most mental and physical tasks as well as humans would cause a rise in wages for the jobs at which humans can still outperform machines. However, a sudden proliferation of humanlike machines would likely cause a net drop in wages, as humans compete with robots for jobs. Also, the wealth of the technological singularity may be concentrated in the hands of only a few. These wealthy few would be those who own the means of mass producing the intelligent robot workforce.

Potential dangers

Superhuman intelligences may have goals inconsistent with human survival and prosperity. AI researcher Hugo de Garis suggests AIs may simply eliminate the human race, and humans would be powerless to stop them.

Berglas (2008) argues that unlike man, a computer based intelligence is not tied to any particular body, which would give it a radically different world view. In particular, a software intelligence would essentially be immortal and so have no need to produce independent children that live on after it dies. It would thus have no evolutionary need for love.

Other oft-cited dangers include those commonly associated with molecular nanotechnology and genetic engineering. These threats are major issues for both singularity advocates and critics, and were the subject of Bill Joy's Wired magazine article "Why the future doesn't need us" (Joy 2000).

Bostrom (2002) discusses human extinction scenarios, and lists superintelligence as a possible cause:

When we create the first superintelligent entity, we might make a mistake and give it goals that lead it to annihilate humankind, assuming its enormous intellectual advantage gives it the power to do so. For example, we could mistakenly elevate a subgoal to the status of a supergoal. We tell it to solve a mathematical problem, and it complies by turning all the matter in the solar system into a giant calculating device, in the process killing the person who asked the question.

Moravec (1992) argues that although superintelligence in the form of machines may make humans in some sense obsolete as the top intelligence, there will still be room in the ecology for humans.

Eliezer Yudkowsky proposed that research be undertaken to produce friendly artificial intelligence in order to address the dangers. He noted that if the first real AI was friendly it would have a head start on self-improvement and thus might prevent other unfriendly AIs from developing. The Singularity Institute for Artificial Intelligence is dedicated to this cause. Bill Hibbard also addresses issues of AI safety and morality in his book Super-Intelligent Machines. However, Berglas (2008) notes that there is no direct evolutionary motivation for an AI to be friendly to man.

Isaac Asimov’s Three Laws of Robotics are one of the earliest examples of proposed safety measures for AI. The laws are intended to prevent artificially intelligent robots from harming humans. In Asimov’s stories, any perceived problems with the laws tend to arise as a result of a misunderstanding on the part of some human operator; the robots themselves are merely acting to their best interpretation of their rules. In the 2004 film I, Robot, a possibility is explored in which AI take complete control over humanity for the purpose of protecting humanity from itself. (The movie was based loosely on Asimov's stories; the aspect of machines taking over bears closer resemblance to Capek's R.U.R., the first novel ever to use the term robot.) In 2004, the Singularity Institute launched an Internet campaign called 3 Laws Unsafe to raise awareness of AI safety issues and the inadequacy of Asimov’s laws in particular (Singularity Institute for Artificial Intelligence 2004).

Many Singularitarians consider nanotechnology to be one of the greatest dangers facing humanity. For this reason, they often believe seed AI (an AI capable of making itself smarter) should precede nanotechnology. Others, such as the Foresight Institute, advocate efforts to create molecular nanotechnology, claiming nanotechnology can be made safe for pre-singularity use or can expedite the arrival of a beneficial singularity.

Accelerating change

Kurzweil writes that, due to paradigm shifts, a trend of exponential growth extends from integrated circuits to earlier transistors, vacuum tubes, relays and electromechanical computers.

Various Kardashev scale projections through 2100. One results in a singularity.

Some singularity proponents argue its inevitability through extrapolation of past trends, especially those pertaining to shortening gaps between improvements to technology. In one of the first uses of the term "singularity" in the context of technological progress, Ulam (1958) tells of a conversation with John von Neumann about accelerating change:

One conversation centered on the ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.

Hawkins (1983) writes that "mindsteps", dramatic and irreversible changes to paradigms or world views, are accelerating in frequency as quantified in his mindstep equation. He cites the inventions of writing, mathematics, and the computer as examples of such changes.

Ray Kurzweil's analysis of history concludes that technological progress follows a pattern of exponential growth, following what he calls The Law of Accelerating Returns. He generalizes Moore's Law, which describes geometric growth in integrated semiconductor complexity, to include technologies from far before the integrated circuit.

Whenever technology approaches a barrier, Kurzweil writes, new technologies will cross it. He predicts paradigm shifts will become increasingly common, leading to "technological change so rapid and profound it represents a rupture in the fabric of human history" (Kurzweil 2001). Kurzweil believes that the singularity will occur before the end of the 21st century, setting the date at 2045 (Kurzweil 2005). His predictions differ from Vinge’s in that he predicts a gradual ascent to the singularity, rather than Vinge’s rapidly self-improving superhuman intelligence.

This leads to the conclusion that an artificial intelligence that is capable of improving on its own design is also faced with a singularity. This idea is explored by Dan Simmons in his novel Hyperion, where a collection of artificial intelligences debate whether or not to make themselves obsolete by creating a new generation of "ultimate" intelligence.

The Acceleration Studies Foundation, an educational non-profit foundation founded by John Smart, engages in outreach, education, research and advocacy concerning accelerating change (Acceleration Studies Foundation 2007). It produces the Accelerating Change conference at Stanford University, and maintains the educational site Acceleration Watch.

Presumably, a technological singularity would lead to a rapid development of a Kardashev Type I civilization where a Kardashev Type I civilization has achieved mastery of the resources of its home planet, Type II of its planetary system, and Type III of its galaxy. Given the fact that, depending on the calculations used, humans on Earth will reach 0.7 on the Kardashev scale by 2040 or sooner, a technological singularity between now and then would push us rapidly over that limit.

Criticism

Some critics assert that no computer or machine will ever achieve human intelligence while others do not rule out the possibility. Theodore Modis and Jonathan Huebner argue that the rate of technological innovation has not only ceased to rise, but is actually now declining. John Smart criticizes Huebner's analysis. Some evidence for this decline is that the rise in computer clock speeds is slowing, even while Moore's prediction of exponentially increasing circuit density continues to hold. Although clock speeds in the past were advertised as the main source of speed from a processor, that's no longer true. Today's processors use the circuits for different, more efficient purposes than pushing raw clock speed. For instance, a Core i7 at 2 GHz is far more powerful than a Pentium 4 at 4 GHz.

Others propose that other "singularities" can be found through analysis of trends in world population, world GDP, and other indices. Andrey Korotayev and others argue that historical hyperbolic growth curves can be attributed to feedback loops that ceased to affect global trends in the 1970s, and thus hyperbolic growth should not be expected in the future.

In The Progress of Computing, William Nordhaus argued that, prior to 1940, computers followed the much slower growth of a traditional industrial economy, thus rejecting extrapolations of Moore's Law to 19th-century computers. Schmidhuber (2006) suggests differences in memory of recent and distant events create an illusion of accelerating change, and that such phenomena may be responsible for past apocalyptic predictions.

A recent study of patents per thousand persons shows that human creativity does not show accelerating returns, but in fact—as suggested by Joseph Tainter in his seminal The Collapse of Complex Societies—a law of diminishing returns. The number of patents per thousand peaked in the period from 1850–1900, and has been declining since. The growth of complexity eventually becomes self-limiting, and leads to a wide spread "general systems collapse". Thomas Homer Dixon in The Upside of Down: Catastrophe, Creativity and the Renewal of Civilization shows that the declining energy returns on investment has led to the collapse of civilizations. Jared Diamond in Collapse: How Societies Choose to Fail or Succeed also shows that cultures self-limit when they exceed the sustainable carrying capacity of their environment, and the consumption of strategic resources (frequently timber, soils or water) creates a deleterious positive feedback loop that leads eventually to social collapse and technological retrogression.

Popular culture

While discussing the singularity's growing recognition, Vinge (1993) writes that "it was the science-fiction writers who felt the first concrete impact." In addition to his own short story "Bookworm, Run!", whose protagonist is a chimpanzee with intelligence augmented by a government experiment, he cites Greg Bear's novel Blood Music (1983) as an example of the singularity in fiction. In William Gibson's 1984 novel Neuromancer, AIs capable of improving their own programs are strictly regulated by special "Turing police" to ensure they never exceed a certain level of intelligence, and the plot centers on the efforts of one such AI to circumvent their control. The 1994 novel The Metamorphosis of Prime Intellect features an AI that augments itself so quickly as to gain low-level control of all matter in the Universe in a matter of hours. A more malevolent AI achieves similar levels of omnipotence in Harlan Ellison's short story I Have No Mouth, and I Must Scream (1967). William Thomas Quick's novels Dreams of Flesh and Sand (1988), Dreams of Gods and Men (1989), and Singularities (1990) present an account of the transition through the singularity; in the latter novel, one of the characters states that it is necessary for Mankind's survival that they achieve an integration with the emerging machine intelligences, or it will be crushed under the dominance of the machines – the greatest risk to the survival of a species reaching this point (and alluding to large numbers of other species that either survived or failed this test, although no actual contact with alien species occurs in the novels).

The singularity is sometimes addressed in fictional works to explain the event's absence. Neal Asher's Gridlinked series features a future where humans living in the Polity are governed by AIs and while some are resentful, most believe that they are far better governors than any human. In the fourth novel, Polity Agent, it is mentioned that the singularity is far overdue yet most AIs have decided not to partake in it for reasons that only they know. A flashback character in Ken MacLeod's 1998 novel The Cassini Division dismissively refers to the singularity as "the Rapture for nerds", though the singularity goes on to happen anyway.

Popular movies in which computers become intelligent and overpower the human race include Colossus: The Forbin Project, the Terminator series, I, Robot, and The Matrix series. The television series Battlestar Galactica also explores these themes.

Isaac Asimov expressed ideas similar to a post-Kurzweilian singularity in his short story The Last Question. Asimov's future envisions a reality where a combination of strong artificial intelligence and post-humans consume the cosmos, during a time Kurzweil describes as when "the universe wakes up", the last of his six stages of cosmic evolution as described in The Singularity is Near. Post-human entities throughout various time periods of the story inquire of the artificial intelligence within the story as to how entropy death will be avoided. The AI responds that it lacks sufficient information to come to a conclusion, until the end of the story when the AI does indeed arrive at a solution, and demonstrates it by re-creating the universe, in godlike speech and fashion, from scratch. Notably, it does so in order to fulfill its duty to answer the humans' question.

St. Edward's University chemist Eamonn Healy discusses accelerating change in the film Waking Life. He divides history into increasingly shorter periods, estimating "two billion years for life, six million years for the hominid, a hundred-thousand years for mankind as we know it". He proceeds to human cultural evolution, giving time scales of ten thousand years for agriculture, four hundred years for the scientific revolution, and one hundred fifty years for the industrial revolution. Information is emphasized as providing the basis for the new evolutionary paradigm, with artificial intelligence its culmination. He concludes we will eventually create "neohumans" which will usurp humanity’s present role in scientific and technological progress and allow the exponential trend of accelerating change to continue past the limits of human ability.

Accelerating progress features in some science fiction works, and is a central theme in Charles Stross's Accelerando. Other notable authors that address singularity-related issues include Karl Schroeder, Greg Egan, Ken MacLeod, David Brin, Iain M. Banks, Neal Stephenson, Tony Ballantyne, Bruce Sterling, Dan Simmons, Damien Broderick, Fredric Brown, Jacek Dukaj, Nagaru Tanigawa and Cory Doctorow. Another relevant work is Warren Ellis’ ongoing comic book series newuniversal.

In the episode "The Turk" of Terminator: The Sarah Connor Chronicles, John Connor mentions the singularity. The Terminator franchise is predicated on the concept of a human-designed computer system becoming self-aware and deciding to destroy humankind. It eventually achieves superintelligence.

In the film Screamers—based on Philip K. Dick's short story Second Variety—mankind's own weapons begin to design and assemble themselves. Self replicating machines (here, the screamers) are often considered to be a significant prerequisite "final phase"—almost like a catalyst to the accelerating progress leading to a singularity. Interestingly, screamers develop to a level where they will kill each other and one even professes her love for the human. This idea is common in Dick's stories, that explore beyond the simplistic "man vs machine" scenario in which our creations consider us a threat.

The feature-length documentary film Transcendent Man is based on Ray Kurzweil and his book The Singularity Is Near. The film documents Kurzweil's quest to reveal what he believes to be mankind's destiny.

On his album People of Earth, Dr. Steel has a song by the title of "The Singularity."

Applications of artificial intelligence

2009-03-15T09:11:00.000-07:00

Artificial intelligence has been used in a wide range of fields including medical diagnosis, stock trading, robot control, law, scientific discovery and toys. However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 90s and early 21st century, AI technology became widely used as elements of larger systems, but the field is rarely credited for these successes.

Computer science

AI researchers have created many tools to solve the most difficult problems in computer science. Many of their inventions have been adopted by mainstream computer science and are no longer considered a part of AI. (See AI effect). According to Russell & Norvig (2003, p. 15), all of the following were originally developed in AI laboratories:

Interactive interpreters
Graphical user interfaces and the computer mouse
Rapid development environments

The linked list data type
Automatic storage management

Dynamic programming
Object-oriented programming

Finance

Banks use artificial intelligence systems to organize operations, invest in stocks, and manage properties. In August 2001, robots beat humans in a simulated financial trading competition.

Financial institutions have long used artificial neural network systems to detect charges or claims outside of the norm, flagging these for human investigation.

Medicine

A medical clinic can use artificial intelligence systems to organize bed schedules, make a staff rotation, and provide medical information.

They may also be used for medical diagnosis,

Artificial neural networks are used for medical diagnosis (such as in Concept Processing technology in EMR software), functioning as machine differential diagnosis.

Heavy industry

Robots have become common in many industries. They are often given jobs that are considered dangerous to humans. Robots have proven effective in jobs that are very repetitive which may lead to mistakes or accidents due to a lapse in concentration and other jobs which humans may find degrading. General Motors uses around 16,000 robots for tasks such as painting, welding, and assembly. Japan is the leader in using and producing robots in the world. In 1995, 700,000 robots were in use worldwide; over 500,000 of which were from Japan.

Transportation

Fuzzy logic controllers have been developed for automatic gearboxes in automobiles (the 2006 Audi TT, VW Toureg and VW Caravell feature the DSP transmission which utilizes Fuzzy logic, a number of Škoda variants (Škoda Fabia) also currently include a Fuzzy Logic based controller).

Telecommunications

Many telecommunications companies make use of heuristic search in the management of their workforces, for example BT Group has deployed heuristic search in a scheduling application that provides the work schedules of 20000 engineers.

Toys and games

The 1990s saw some of the first attempts to mass-produce domestically aimed types of basic Artificial Intelligence for education, or leisure. This prospered greatly with the Digital Revolution, and helped introduce people, especially children, to a life of dealing with various types of AI, specifically in the form of Tamagotchis and Giga Pets, the Internet (example: basic search engine interfaces are one simple form), and the first widely released robot, Furby. A mere year later an improved type of domestic robot was released in the form of Aibo, a robotic dog with intelligent features and autonomy. AI has also been applied to video games.

Music

The evolution of music has always been affected by technology. With AI, scientists are trying to make the computer emulate the activities of the skillful musician. Composition, performance, music theory, sound processing are some of the major areas on which research in Music and Artificial Intelligence are focusing on.

Aviation

The Air Operations Division , AOD, uses for the rule based expert systems. The AOD has use for artificial intelligence for surrogate operators for combat and training simulators, mission management aids, support systems for tactical decision making, and post processing of the simulator data into symbolic summaries.

The use of artificial intelligence in simulators is proving to be very useful for the AOD. Airplane simulators are using artificial intelligence in order to process the data taken from simulated flights. Other than simulated flying, there is also simulated aircraft warfare. The computers are able to come up with the best success scenarios in these situations. The computers can also create strategies based on the placement, size, speed, and strength of the forces and counter forces. Pilots may be given assistance in the air during combat by computers. The artificial intelligent programs can sort the information and provide the pilot with the best possible maneuvers, not to mention getting rid of certain maneuvers that would be impossible for a sentient being to perform. Multiple aircraft are needed to get good approximations for some calculations so computer simulated pilots are used to gather data. These computer simulated pilots are also used to train future air traffic controllers.

The system used by the AOD in order to measure performance was the Interactive Fault Diagnosis and Isolation System, or IFDIS. It is a rule based expert system put together by collecting information from TF-30 documents and the expert advice from mechanics that work on the TF-30. This system was designed to be used to for the development of the TF-30 for the RAAF F-111C. The performance system was also used to replace specialized workers. The system allowed the regular workers to communicate with the system and avoid mistakes, miscalculations, or having to speak to one of the specialized workers.

The AOD also uses artificial intelligence in speech recognition software. The air traffic controllers are giving directions to the artificial pilots and the AOD wants to the pilots to respond to the ATC’s with simple responses. The programs that incorporate the speech software must be trained, which means they use neural networks. The program used, the Verbex 7000, is still a very early program that has plenty of room for improvement. The improvements are imperative because ATCs use very specific dialog and the software needs to be able to communicate correctly and promptly every time.

The Artificial Intelligence supported Design of Aircraft, or AIDA, is used to help designers in the process of creating conceptual designs of aircraft. This program allows the designers to focus more on the design itself and less on the design process. The software also allows the user to focus less on the software tools. The AIDA uses rule based systems to compute its data. This is a diagram of the arrangement of the AIDA modules. Although simple, the program is proving effective.

In 2003, NASA’s Dryden Flight Research Center, and many other companies, created software that could enable a damaged aircraft to continue flight until a safe landing zone can be reached. The Intelligent Flight Control System was tested on an F-15, which was heavily modified by NASA. The software compensates for all the damaged components by relying on the undamaged components. The neural network used in the software proved to be effective and marked a triumph for artificial intelligence.

The Integrated Vehicle Health Management system, also used by NASA, on board an aircraft must process and interpret data taken from the various sensors on the aircraft. The system needs to be able to determine the structural integrity of the aircraft. The system also needs to implement protocols in case of any damage taken the vehicle.

Other

Neural networks are also being widely deployed in homeland security, speech and text recognition, data mining, and e-mail spam filtering.

List of applications

Typical problems to which AI methods are applied

Pattern recognition
Optical character recognition
- Handwriting recognition

Computer vision, Virtual reality and Image processing
Game theory and Strategic planning
Game artificial intelligence and Computer game bot
Natural language processing, Translation and Chatterbots
Nonlinear control and Robotics

Other fields in which AI methods are implemented

Artificial life
Automated reasoning
Automation
Biologically-inspired computing
Concept mining
Data mining
Knowledge representation
Semantic Web
E-mail spam filtering

Behavior-based roboticc

Behavior-based robotics
Cognitive
Cybernetics
Developmental robotics
Epigenetic robotics
Evolutionary robotics
Hybrid intelligent system
Intelligent agent
Intelligent control
Litigation

Progress in artificial intelligence

2009-03-15T09:09:00.000-07:00

Artificial intelligence can be evaluated on constrained and well-defined problems that allow comparison with human performance. Such tests have been termed subject matter expert Turing tests. Smaller problems provide more achievable goals and there are an ever-increasing number of positive results.

Performance evaluation

The broad classes of outcome for an AI test are:

optimal: it is not possible to perform better
strong super-human: performs better than all humans
super-human: performs better than most humans
sub-human: performs worse than most humans

Optimal

Super-human

Backgammon: strong super-human
Bridge: nearing strong super-human
Chess: nearing strong super-human
Reversi: strong super-human
Scrabble: strong super-human

Sub-human

Most everyday tasks performed by humans.

Intelligent control

2009-03-15T09:07:00.000-07:00

Intelligent control is a class of control techniques, that use various AI computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, evolutionary computation and genetic algorithms.

Overview

Intelligent control can be divided into the following major sub-domains:

Neural network control
Bayesian control
Fuzzy (logic) control
Neuro-fuzzy control
Genetic control
Intelligent agents (Cognitive/Conscious control)

New control techniques are created continuously as new models of intelligent behavior are created and computational methods developed to support them.

Neural network controllers

Neural networks have been used to solve problems in almost all spheres of science and technology. Neural network control basically involves two steps:

System identification
Control

It has been shown that a feedforward network with nonlinear, continuous and differentiable activation functions have universal approximation capability. Recurrent networks have also been used for system identification. Given, a set of input-output data pairs, system identification aims to form a mapping among these data pairs. Such a network is supposed to capture the dynamics of a system.

Bayesian controllers

Bayesian probability has produced a number of algorithms that are in common use in many advanced control systems, serving as state space estimators of some variables that are used in the controller.

The Kalman filter and the Particle filter are two examples of popular Bayesian control components. The Bayesian approach to controller design requires often an important effort in deriving the so-called system model and measurement model, which are the mathematical relationships linking the state variables to the sensor measurements available in the controlled system. In this respect, it is very closely linked to the system-theoretic approach to control design.

Connectionism

2009-03-15T09:04:00.000-07:00

Connectionism is a set of approaches in the fields of artificial intelligence, cognitive psychology, cognitive science, neuroscience and philosophy of mind, that models mental or behavioral phenomena as the emergent processes of interconnected networks of simple units. There are many forms of connectionism, but the most common forms use neural network models.

Basic principles

The central connectionist principle is that mental phenomena can be described by interconnected networks of simple and often uniform units. The form of the connections and the units can vary from model to model. For example, units in the network could represent neurons and the connections could represent synapses. Another model might make each unit in the network a word, and each connection an indication of semantic similarity.

Spreading activation

In most connectionist models, networks change over time. A closely related and very common aspect of connectionist models is activation. At any time, a unit in the network has an activation, which is a numerical value intended to represent some aspect of the unit. For example, if the units in the model are neurons, the activation could represent the probability that the neuron would generate an action potential spike. If the model is a spreading activation model, then over time a unit's activation spreads to all the other units connected to it. Spreading activation is always a feature of neural network models, and it is very common in connectionist models used by cognitive psychologists.

Neural networks

Neural networks are by far the most commonly used connectionist model today. Much research using neural networks is done under the more general name "connectionist". Though there is a large variety of neural network models, they almost always follow two basic principles regarding the mind:

Any mental state can be described as an (N)-dimensional vector of numeric activation values over neural units in a network.
Memory is created by modifying the strength of the connections between neural units. The connection strengths, or "weights", are generally represented as an (N×N)-dimensional matrix.

Most of the variety among neural network models comes from:

Interpretation of units: units can be interpreted as neurons or groups of neurons.
Definition of activation: activation can be defined in a variety of ways. For example, in a Boltzmann machine, the activation is interpreted as the probability of generating an action potential spike, and is determined via a logistic function on the sum of the inputs to a unit.
Learning algorithm: different networks modify their connections differently. Generally, any mathematically defined change in connection weights over time is referred to as the "learning algorithm".

Connectionists are in agreement that recurrent neural networks (networks wherein connections of the network can form a directed cycle) are a better model of the brain than feedforward neural networks (networks with no directed cycles). Many recurrent connectionist models also incorporate dynamical systems theory. Many researchers, such as the connectionist Paul Smolensky, have argued that connectionist models will evolve towards fully continuous, high-dimensional, non-linear, dynamic systems approaches.

Biological realism

The neural network branch of connectionism suggests that the study of mental activity is really the study of neural systems. This links connectionism to neuroscience, and models involve varying degrees of biological realism. Connectionist work in general need not be biologically realistic, but some neural network researchers, computational neuroscientists, try to model the biological aspects of natural neural systems very closely in so-called "neuromorphic networks". And, many authors find the clear link between neural activity and cognition to be an appealing aspect of connectionism. But some have criticized this as reductionism.

Learning

Connectionists generally stress the importance of learning in their models. Thus, connectionists have created many sophisticated learning procedures for neural networks. Learning always involves modifying the connection weights. These generally involve mathematical formulas to determine the change in weights when given sets of data consisting of activation vectors for some subset of the neural units.

By formalizing learning in such a way, connectionists have many tools. A very common strategy in connectionist learning methods is to incorporate gradient descent over an error surface in a space defined by the weight matrix. All gradient descent learning in connectionist models involves changing each weight by the partial derivative of the error surface with respect to the weight. Backpropagation, first made popular in the 1980s, is probably the most commonly known connectionist gradient descent algorithm today.

History

Connectionism can be traced to ideas more than a century old, which were little more than speculation until the mid-to-late 20th century. It wasn't until the 1980s that connectionism became a popular perspective among scientists.

Parallel distributed processing

The prevailing connectionist approach today was originally known as parallel distributed processing (PDP). It was a neural network approach that stressed the parallel nature of neural processing, and the distributed nature of neural representations. It provided a general mathematical framework for researchers to operate in. The framework involved eight major aspects:

A set of processing units, represented by a set of integers.
An activation for each unit, represented by a vector of time-dependent functions.
An output function for each unit, represented by a vector of functions on the activations.
A pattern of connectivity among units, represented by a matrix of real numbers indicating connection strength.
A propagation rule spreading the activations via the connections, represented by a function on the output of the units.
An activation rule for combining inputs to a unit to determine its new activation, represented by a function on the current activation and propagation.
A learning rule for modifying connections based on experience, represented by a change in the weights based on any number of variables.
An environment which provides the system with experience, represented by sets of activation vectors for some subset of the units.

These aspects are now the foundation for almost all connectionist models. A perceived limitation of PDP is that it is reductionistic. That is, all cognitive processes are explained by neural firing and communication. According to this view there is no room for rational thinking or emotion.

A lot of the research that led to the development of PDP was done in the 1970s, but PDP became popular in the 1980s with the release of the books Parallel Distributed Processing: Explorations in the Microstructure of Cognition - Volume 1 (foundations) and Volume 2 (Psychological and Biological Models), by James L. McClelland, David E. Rumelhart and the PDP Research Group. The books are now considered seminal connectionist works, and it is now common to fully equate PDP and connectionism, although the term "connectionism" is not used in the books.

Earlier work

PDP's direct roots were the perceptron theories of researchers such as Frank Rosenblatt from the 1950s and 1960s. But perceptron models were made very unpopular by the book Perceptrons by Marvin Minsky and Seymour Papert, published in 1969. It elegantly demonstrated the limits on the sorts of functions which perceptrons can calculate, showing that even simple functions like the exclusive disjunction could not be handled properly. The PDP books overcame this limitation by showing that multi-level, non-linear neural networks were far more robust and could be used for a vast array of functions.

Many earlier researchers advocated connectionist style models, for example in the 1940s and 1950s, Warren McCulloch, Walter Pitts, Donald Olding Hebb, and Karl Lashley. McCulloch and Pitts showed how neural systems could implement first-order logic: their classic paper "A Logical Calculus of Ideas Immanent in Nervous Activity" (1943) is important in this development here. They were influenced by the important work of Nicolas Rashevsky in the 1930s. Hebb contributed greatly to speculations about neural functioning, and proposed a learning principle, Hebbian learning, that is still used today. Lashley argued for distributed representations as a result of his failure to find anything like a localized engram in years of lesion experiments.

Connectionism apart from PDP

Though PDP is the dominant form of connectionism, other theoretical work should also be classified as connectionist.

Many connectionist principles can be traced to early work in psychology, such as that of William James. Psychological theories based on knowledge about the human brain were fashionable in the late 19th century. As early as 1869, the neurologist John Hughlings Jackson argued for multi-level, distributed systems. Following from this lead, Herbert Spencer's Principles of Psychology, 3rd edition (1872), and Sigmund Freud's Project for a Scientific Psychology (composed 1895) propounded connectionist or proto-connectionist theories. These tended to be speculative theories. But by the early 20th century, Edward Thorndike was experimenting on learning that posited a connectionist type network.

In the 1950s, Friedrich Hayek proposed that spontaneous order in the brain arose out of decentralized networks of simple units. Hayek's work was rarely cited in the PDP literature until recently.

Another form of connectionist model was the relational network framework developed by the linguist Sydney Lamb in the 1960s. Relational networks have been only used by linguists, and were never unified with the PDP approach. As a result, they are now used by very few researchers.

There are also hybrid connectionist models, mostly mixing symbolic representations with neural network models. The hybrid approach has been advocated by some researchers (such as Ron Sun).

Connectionism vs. computationalism debate

As connectionism became increasingly popular in the late 1980s, there was a reaction to it by some researchers, including Jerry Fodor, Steven Pinker and others. They argued that connectionism, as it was being developed, was in danger of obliterating what they saw as the progress being made in the fields of cognitive science and psychology by the classical approach of computationalism. Computationalism is a specific form of cognitivism which argues that mental activity is computational, that is, that the mind operates by performing purely formal operations on symbols, like a Turing machine. Some researchers argued that the trend in connectionism was a reversion towards associationism and the abandonment of the idea of a language of thought, something they felt was mistaken. In contrast, it was those very tendencies that made connectionism attractive for other researchers.

Connectionism and computationalism need not be at odds, but the debate in the late 1980s and early 1990s led to opposition between the two approaches. Throughout the debate some researchers have argued that connectionism and computationalism are fully compatible, though no consensus has been reached. The differences between the two approaches that are usually cited are the following:

Computationalists posit symbolic models that do not resemble underlying brain structure at all, whereas connectionists engage in "low level" modeling, trying to ensure that their models resemble neurological structures.
Computationalists generally focus on the structure of explicit symbols (mental models) and syntactical rules for their internal manipulation, whereas connectionists focus on learning from environmental stimuli and storing this information in a form of connections between neurons.
Computationalists believe that internal mental activity consists of manipulation of explicit symbols, whereas connectionists believe that the manipulation of explicit symbols is a poor model of mental activity.
Computationalists often posit domain specific symbolic sub-systems designed to support learning in specific areas of cognition (e.g. language, intentionality, number), while connectionists posit one or a small set of very general learning mechanisms.

But despite these differences, the two approaches may be compatible. For example, it is well known that connectionist models can implement symbol manipulation systems of the kind used in computationalist models. Hence the differences might be a matter of the personal choices that some connectionist researchers make rather than anything fundamental to connectionism.

The recent popularity of dynamical systems in philosophy of mind (due to the works of authors such as Tim van Gelder) have added a new perspective on the debate; some authors now argue that any split between connectionism and computationalism is just a split between computationalism and dynamical systems, suggesting that the original debate was wholly misguided.

All of these views have led to considerable discussion on the issue among researchers that is likely to continue.

Neural network

2009-03-15T08:58:00.000-07:00

Simplified view of a feedforward artificial neural network

Traditionally, the term neural network had been used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes. Thus the term has two distinct usages:

Biological neural networks are made up of real biological neurons that are connected or functionally related in the peripheral nervous system or the central nervous system. In the field of neuroscience, they are often identified as groups of neurons that perform a specific physiological function in laboratory analysis.
Artificial neural networks are made up of interconnecting artificial neurons (programming constructs that mimic the properties of biological neurons). Artificial neural networks may either be used to gain an understanding of biological neural networks, or for solving artificial intelligence problems without necessarily creating a model of a real biological system. The real, biological nervous system is highly complex and includes some features that may seem superfluous based on an understanding of artificial networks.

This article focuses on the relationship between the two concepts; for detailed coverage of the two different concepts refer to the separate articles: Biological neural network and Artificial neural network.

Overview

In general a biological neural network is composed of a group or groups of chemically connected or functionally associated neurons. A single neuron may be connected to many other neurons and the total number of neurons and connections in a network may be extensive. Connections, called synapses, are usually formed from axons to dendrites, though dendrodendritic microcircuits and other connections are possible. Apart from the electrical signaling, there are other forms of signaling that arise from neurotransmitter diffusion, which have an effect on electrical signaling. As such, neural networks are extremely complex.

Artificial intelligence and cognitive modeling try to simulate some properties of neural networks. While similar in their techniques, the former has the aim of solving particular tasks, while the latter aims to build mathematical models of biological neural systems.

In the artificial intelligence field, artificial neural networks have been applied successfully to speech recognition, image analysis and adaptive control, in order to construct software agents (in computer and video games) or autonomous robots. Most of the currently employed artificial neural networks for artificial intelligence are based on statistical estimation, optimization and control theory.

The cognitive modelling field involves the physical or mathematical modeling of the behaviour of neural systems; ranging from the individual neural level (e.g. modelling the spike response curves of neurons to a stimulus), through the neural cluster level (e.g. modelling the release and effects of dopamine in the basal ganglia) to the complete organism (e.g. behavioural modelling of the organism's response to stimuli).

History of the neural network analogy

The concept of neural networks started in the late-1800s as an effort to describe how the human mind performed. These ideas started being applied to computational models with Turing's B-type machines and the perceptron.

In early 1950s Friedrich Hayek was one of the first to posit the idea of spontaneous order in the brain arising out of decentralized networks of simple units (neurons). In the late 1940s, Donald Hebb made one of the first hypotheses for a mechanism of neural plasticity (i.e. learning), Hebbian learning. Hebbian learning is considered to be a 'typical' unsupervised learning rule and it (and variants of it) was an early model for long term potentiation.

The Perceptron is essentially a linear classifier for classifying data specified by parameters and an output function $f = w' x + b$ . Its parameters are adapted with an ad-hoc rule similar to stochastic steepest gradient descent. Because the inner product is a linear operator in the input space, the Perceptron can only perfectly classify a set of data for which different classes are linearly separable in the input space, while it often fails completely for non-separable data. While the development of the algorithm initially generated some enthusiasm, partly because of its apparent relation to biological mechanisms, the later discovery of this inadequacy caused such models to be abandoned until the introduction of non-linear models into the field.

The Cognitron (1975) was an early multilayered neural network with a training algorithm. The actual structure of the network and the methods used to set the interconnection weights change from one neural strategy to another, each with its advantages and disadvantages. Networks can propagate information in one direction only, or they can bounce back and forth until self-activation at a node occurs and the network settles on a final state. The ability for bi-directional flow of inputs between neurons/nodes was produced with the Hopfield's network (1982), and specialization of these node layers for specific purposes was introduced through the first hybrid network.

The parallel distributed processing of the mid-1980s became popular under the name connectionism.

The rediscovery of the backpropagation algorithm was probably the main reason behind the repopularisation of neural networks after the publication of "Learning Internal Representations by Error Propagation" in 1986 (Though backpropagation itself dates from 1974). The original network utilised multiple layers of weight-sum units of the type f = g(w'x + b), where g was a sigmoid function or logistic function such as used in logistic regression. Training was done by a form of stochastic steepest gradient descent. The employment of the chain rule of differentiation in deriving the appropriate parameter updates results in an algorithm that seems to 'backpropagate errors', hence the nomenclature. However it is essentially a form of gradient descent. Determining the optimal parameters in a model of this type is not trivial, and steepest gradient descent methods cannot be relied upon to give the solution without a good starting point. In recent times, networks with the same architecture as the backpropagation network are referred to as Multi-Layer Perceptrons. This name does not impose any limitations on the type of algorithm used for learning.

The backpropagation network generated much enthusiasm at the time and there was much controversy about whether such learning could be implemented in the brain or not, partly because a mechanism for reverse signalling was not obvious at the time, but most importantly because there was no plausible source for the 'teaching' or 'target' signal.

The brain, neural networks and computers

Neural networks, as used in artificial intelligence, have traditionally been viewed as simplified models of neural processing in the brain, even though the relation between this model and brain biological architecture is debated.

A subject of current research in theoretical neuroscience is the question surrounding the degree of complexity and the properties that individual neural elements should have to reproduce something resembling animal intelligence.

Historically, computers evolved from the von Neumann architecture, which is based on sequential processing and execution of explicit instructions. On the other hand, the origins of neural networks are based on efforts to model information processing in biological systems, which may rely largely on parallel processing as well as implicit instructions based on recognition of patterns of 'sensory' input from external sources. In other words, at its very heart a neural network is a complex statistical processor (as opposed to being tasked to sequentially process and execute).

Neural networks and artificial intelligence

An artificial neural network (ANN), also called a simulated neural network (SNN) or commonly just neural network (NN) is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionistic approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network.

In more practical terms neural networks are non-linear statistical data modeling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data.

Background

An artificial neural network involves a network of simple processing elements (artificial neurons) which can exhibit complex global behaviour, determined by the connections between the processing elements and element parameters. Artificial neurons were first proposed in 1943 by Warren McCulloch, a neurophysiologist, and Walter Pitts, an MIT logician. One classical type of artificial neural network is the Hopfield net.

In a neural network model simple nodes, which can be called variously "neurons", "neurodes", "Processing Elements" (PE) or "units", are connected together to form a network of nodes — hence the term "neural network". While a neural network does not have to be adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow.

In modern software implementations of artificial neural networks the approach inspired by biology has more or less been abandoned for a more practical approach based on statistics and signal processing. In some of these systems neural networks, or parts of neural networks (such as artificial neurons) are used as components in larger systems that combine both adaptive and non-adaptive elements.

The concept of a neural network appears to have first been proposed by Alan Turing in his 1948 paper "Intelligent Machinery".

Applications

The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations and also to use it. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.

Real life applications

The tasks to which artificial neural networks are applied tend to fall within the following broad categories:

Function approximation, or regression analysis, including time series prediction and modelling.
Classification, including pattern and sequence recognition, novelty detection and sequential decision making.
Data processing, including filtering, clustering, blind signal separation and compression.

Application areas include system identification and control (vehicle control, process control), game-playing and decision making (backgammon, chess, racing), pattern recognition (radar systems, face identification, object recognition, etc.), sequence recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial applications, data mining (or knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering.

Neural network software

Neural network software is used to simulate, research, develop and apply artificial neural networks, biological neural networks and in some cases a wider array of adaptive systems.

Learning paradigms

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks.

Supervised learning

In supervised learning, we are given a set of example pairs and the aim is to find a function $f$ in the allowed class of functions that matches the examples. In other words, we wish to infer how the mapping implied by the data and the cost function is related to the mismatch between our mapping and the data.

Unsupervised learning

In unsupervised learning we are given some data x, and a cost function which is to be minimized which can be any function of x and the network's output, f. The cost function is determined by the task formulation. Most applications fall within the domain of estimation problems such as statistical modeling, compression, filtering, blind source separation and clustering.

Reinforcement learning

In reinforcement learning, data $x$ is usually not given, but generated by an agent's interactions with the environment. At each point in time $t$ , the agent performs an action $y t$ and the environment generates an observation $x t$ and an instantaneous cost $c t$ , according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimises some measure of a long-term cost, i.e. the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.

Learning algorithms

There are many algorithms for training neural networks; most of them can be viewed as a straightforward application of optimization theory and statistical estimation.

Evolutionary computation methods, simulated annealing, expectation maximization and non-parametric methods are among other commonly used methods for training neural networks. See also machine learning.

Recent developments in this field also saw the use of particle swarm optimization and other swarm intelligence techniques used in the training of neural networks.

Neural networks and neuroscience

Theoretical and computational neuroscience is the field concerned with the theoretical analysis and computational modeling of biological neural systems. Since neural systems are intimately related to cognitive processes and behaviour, the field is closely related to cognitive and behavioural modeling.

The aim of the field is to create models of biological neural systems in order to understand how biological systems work. To gain this understanding, neuroscientists strive to make a link between observed biological processes (data), biologically plausible mechanisms for neural processing and learning (biological neural network models) and theory (statistical learning theory and information theory).

Types of models

Many models are used in the field, each defined at a different level of abstraction and trying to model different aspects of neural systems. They range from models of the short-term behaviour of individual neurons, through models of how the dynamics of neural circuitry arise from interactions between individual neurons, to models of how behaviour can arise from abstract neural modules that represent complete subsystems. These include models of the long-term and short-term plasticity of neural systems and its relation to learning and memory, from the individual neuron to the system level.

Current research

While initially research had been concerned mostly with the electrical characteristics of neurons, a particularly important part of the investigation in recent years has been the exploration of the role of neuromodulators such as dopamine, acetylcholine, and serotonin on behaviour and learning.

Biophysical models, such as BCM theory, have been important in understanding mechanisms for synaptic plasticity, and have had applications in both computer science and neuroscience. Research is ongoing in understanding the computational algorithms used in the brain, with some recent biological evidence for radial basis networks and neural backpropagation as mechanisms for processing data.

Criticism

A common criticism of neural networks, particularly in robotics, is that they require a large diversity of training for real-world operation. Dean Pomerleau, in his research presented in the paper "Knowledge-based Training of Artificial Neural Networks for Autonomous Robot Driving," uses a neural network to train a robotic vehicle to drive on multiple types of roads (single lane, multi-lane, dirt, etc.). A large amount of his research is devoted to (1) extrapolating multiple training scenarios from a single training experience, and (2) preserving past training diversity so that the system does not become overtrained (if, for example, it is presented with a series of right turns – it should not learn to always turn right). These issues are common in neural networks that must decide from amongst a wide variety of responses.

A. K. Dewdney, a former Scientific American columnist, wrote in 1997, "Although neural nets do solve a few toy problems, their powers of computation are so limited that I am surprised anyone takes them seriously as a general problem-solving tool." (Dewdney, p.82)

Arguments against Dewdney's position are that neural nets have been successfully used to solve many complex and diverse tasks, ranging from autonomously flying aircraft to detecting credit card fraud.

Technology writer Roger Bridgman commented on Dewdney's statements about neural nets:

Neural networks, for instance, are in the dock not only because they have been hyped to high heaven, (what hasn't?) but also because you could create a successful net without understanding how it worked: the bunch of numbers that captures its behaviour would in all probability be "an opaque, unreadable table...valueless as a scientific resource". In spite of his emphatic declaration that science is not technology, Dewdney seems here to pillory neural nets as bad science when most of those devising them are just trying to be good engineers. An unreadable table that a useful machine could read would still be well worth having.

Some other criticisms came from believers of hybrid models (combining neural networks and symbolic approaches). They advocate the intermix of these two approaches and believe that hybrid models can better capture the mechanisms of the human mind (Sun and Bookman 1994).

Automated reasoning

2009-03-15T08:57:00.000-07:00

Automated reasoning is an area of computer science dedicated to understanding different aspects of reasoning in a way that allows the creation of software which allows computers to reason completely or nearly completely automatically. As such, it is usually considered a subfield of artificial intelligence, but it also has strong connections to theoretical computer science and even philosophy.

The most developed subareas of automated reasoning probably are automated theorem proving (and the less automated but more pragmatic subfield of interactive theorem proving) and automated proof checking (viewed as guaranteed correct reasoning under fixed assumptions), but extensive work has also been done in reasoning by analogy induction and abduction. Other important topics are reasoning under uncertainty and non-monotonic reasoning. An important part of the uncertainty field is that of argumentation, where further constraints of minimality and consistency are applied on top of the more standard automated deduction. John Pollock's Oscar system is an example of an automated argumentation system that is more specific than being just an automated theorem prover. Formal argumentation is subfield of artificial intelligence.

Tools and techniques include the classical logics and calculi from automated theorem proving, but also fuzzy logic, Bayesian inference, reasoning with maximal entropy and a large number of less formal ad-hoc techniques.

Evolutionary computation

2009-03-15T08:53:00.000-07:00

In computer science evolutionary computation is a subfield of artificial intelligence (more particularly computational intelligence) that involves combinatorial optimization problems.

Evolutionary computation uses iterative progress, such as growth or development in a population. This population is then selected in a guided random search using parallel processing to achieve the desired end. Such processes are often inspired by biological mechanisms of evolution.

History

The use of Darwinian principles for automated problem solving originated in the fifties. It was not until the sixties that three distinct interpretations of this idea started to be developed in three different places.

Evolutionary programming was introduced by Lawrence J. Fogel in the USA, while John Henry Holland called his method a genetic algorithm. In Germany Ingo Rechenberg and Hans-Paul Schwefel introduced evolution strategies. These areas developed separately for about 15 years. From the early nineties on they are unified as different representatives (“dialects”) of one technology, called evolutionary computing. Also in the early nineties, a fourth stream following the general ideas had emerged – genetic programming.

These terminologies denote the field of evolutionary computing and consider evolutionary programming, evolution strategies, genetic algorithms, and genetic programming as sub-areas.

Techniques

Evolutionary techniques mostly involve metaheuristic optimization algorithms such as:

evolutionary algorithms (comprising genetic algorithms, evolutionary programming, evolution strategy, genetic programming and learning classifier systems)
swarm intelligence (comprising ant colony optimization and particle swarm optimization)

and in a lesser extent also:

self-organization such as self-organizing maps, growing neural gas, competitive learning demo applet
harmony search algorithm

Evolutionary algorithms

Evolutionary algorithms form a subset of evolutionary computation in that they generally only involve techniques implementing mechanisms inspired by biological evolution such as reproduction, mutation, recombination, natural selection and survival of the fittest. Candidate solutions to the optimization problem play the role of individuals in a population, and the cost function determines the environment within which the solutions "live" (see also fitness function). Evolution of the population then takes place after the repeated application of the above operators.

In this process, there are two main forces that form the basis of evolutionary systems: Recombination and mutation create the necessary diversity and thereby facilitate novelty, while selection acts as a force increasing quality.

Many aspects of such an evolutionary process are stochastic. Changed pieces of information due to recombination and mutation are randomly chosen. On the other hand, selection operators can be either deterministic, or stochastic. In the latter case, individuals with a higher fitness have a higher chance to be selected than individuals with a lower fitness, but typically even the weak individuals have a chance to become a parent or to survive.

Evolutionary computation practitioners

Ingo Rechenberg

GOFAI

2009-03-09T09:15:00.001-07:00

In artificial intelligence research, GOFAI ("Good Old-Fashioned Artificial Intelligence") is an approach to achieving artificial intelligence. In the robotics research, the term is extended as GOFAIR ("Good Old Fashioned Artificial Intelligence and Robotics"). The approach is based on the assumption that many aspects of intelligence can be achieved by the manipulation of symbols, an assumption defined as the "physical symbol systems hypothesis" by Alan Newell and Herbert Simon in the middle 1960s. The term "GOFAI" was coined by John Haugeland in his 1986 book Artificial Intelligence: The Very Idea, which explored the philosophical implications of artificial intelligence research.

GOFAI was the dominant paradigm of AI research from the middle fifties until the late 1980s. After that time, newer sub-symbolic approaches to AI became popular. Now, both approaches are in common use, often applied to different problems.

Opponents of the symbolic approach include roboticists such as Rodney Brooks, who aims to produce autonomous robots without symbolic representation (or with only minimal representation) and computational intelligence researchers, who apply techniques such as neural networks and optimization to solve problems in machine learning and control engineering.

Computational neuroscience

2009-03-09T09:11:00.000-07:00

Computational neuroscience is an interdisciplinary science that links the diverse fields of neuroscience, cognitive science, electrical engineering, computer science, physics and mathematics. Historically, the term was introduced by Eric L. Schwartz, who organized a conference, held in 1985 in Carmel, California at the request of the Systems Development Foundation, to provide a summary of the current status of a field which until that point was referred to by a variety of names, such as neural modeling, brain theory and neural networks. The proceedings of this definitional meeting were later published as the book "Computational Neuroscience" (1990). The early historical roots of the field can be traced to the work of people such as Hodgkin & Huxley, Hubel & Wiesel, and David Marr, to name but a few. Hodgkin & Huxley developed the voltage clamp and created the first mathematical model of the action potential. Hubel & Wiesel discovered that neurons in primary visual cortex, the first cortical area to process information coming from the retina, have oriented receptive fields and are organized in columns. David Marr's work focused on the interactions between neurons, suggesting computational approaches to the study of how functional groups of neurons within the hippocampus and neocortex interact, store, process, and transmit information. Computational modeling of biophysically realistic neurons and dendrites began with the work of Wilfrid Rall, with the first multicompartmental model using cable theory.

Computational neuroscience is distinct from psychological connectionism and theories of learning from disciplines such as machine learning, neural networks and statistical learning theory in that it emphasizes descriptions of functional and biologically realistic neurons (and neural systems) and their physiology and dynamics. These models capture the essential features of the biological system at multiple spatial-temporal scales, from membrane currents, protein and chemical coupling to network oscillations, columnar and topographic architecture and learning and memory. These computational models are used to test hypotheses that can be directly verified by current or future biological experiments.

Currently, the field is undergoing a rapid expansion. There are many software packages, such as GENESIS and NEURON, that allow rapid and systematic in silico modeling of realistic neurons. Blue Brain, a collaboration between IBM and École Polytechnique Fédérale de Lausanne, aims to construct a biophysically detailed simulation of a cortical column on the Blue Gene supercomputer.

Organizations

The Organization for Computational Neuroscience is a non-profit organization one of whose tasks is to organize the annual international Computational Neuroscience meeting.

Major topics

Research in computational neuroscience can be roughly categorized into several lines of inquiries. Most computational neuroscientists collaborate closely with experimentalists in analyzing novel data and synthesizing new models of biological phenomena.

Single-neuron modeling

Even single neurons have complex biophysical characteristics. Hodgkin and Huxley's original model only employed two voltage-sensitive currents, the fast-acting sodium and the inward-rectifying potassium. Though successful in predicting the timing and qualitative features of the action potential, it nevertheless failed to predict a number of important features such as adaptation and shunting. Scientists now believe that there are a wide variety of voltage-sensitive currents, and the implications of the differing dynamics, modulations and sensitivity of these currents is an important topic of computational neuroscience.

The computational functions of complex dendrites are also under intense investigation. There is a large body of literature regarding how different currents interact with geometric properties of neurons.

Some models are also tracking biochemical pathways at very small scales such as spines or synaptic clefts.

Development, axonal patterning and guidance

How do axons and dendrites form during development? How do axons know where to target and how to reach these targets? How do neurons migrate to the proper position in the central and peripheral systems? How do synapses form? We know from molecular biology that distinct parts of the nervous system release distinct chemical cues, from growth factors to hormones that modulate and influence the growth and development of functional connections between neurons.

Theoretical investigations into the formation and patterning of synaptic connection and morphology is still nascent. One hypothesis that has recently garnered some attention is the minimal wiring hypothesis, which postulates that the formation of axons and dendrites effectively minimizes resource allocation while maintaining maximal information storage.

Sensory processing

Early models of sensory processing understood within a theoretical framework is credited to Horace Barlow. Somewhat similar to the minimal wiring hypothesis described in the preceding section, Barlow understood the processing of the early sensory systems to be a form of efficient coding, where the neurons encoded information which minimized the number of spikes. Experimental and computational work have since supported this hypothesis in one form or another.

Current research in sensory processing is divided among biophysical modelling of different subsystems and more theoretical modelling function of perception. Current models of perception have suggested that the brain performs some form of Bayesian inference and integration of different sensory information in generating our perception of the physical world.

Memory and synaptic plasticity

Earlier models of memory are primarily based on the postulates of Hebbian learning. Biologically relevant models such as Hopfield net have been developed to address the properties of associative, rather than content-addressable style of memory that occur in biological systems. These attempts are primarily focusing on the formation of medium-term and long-term memory, localizing in the hippocampus. Models of working memory, relying on theories of network oscillations and persistent activity, have been built to capture some features of the prefrontal cortex in context-related memory.

One of the major problems in biological memory is how it is maintained and changed through multiple time scales. Unstable synapses are easy to train but also prone to stochastic disruption. Stable synapses forget less easily, but they are also harder to consolidate. One recent computational hypothesis involves cascades of plasticity that allow synapses to function at multiple time scales. Stereochemically detailed models of the acetylcholine receptor-based synapse with Monte Carlo method, working at the time scale of microseconds, have been built. It is likely that computational tools will contribute greatly to our understanding of how synapses function and change in relation to external stimulus in the coming decades.

Behaviors of networks

Biological neurons are connected to each other in a complex, recurrent fashion. These connections are, unlike most artificial neural networks, sparse and most likely, specific. It is not known how information is transmitted through such sparsely connected networks. It is also unknown what the computational functions, if any, of these specific connectivity patterns are.

The interactions of neurons in a small network can be often reduced to simple models such as the Ising model. The statistical mechanics of such simple systems are well-characterized theoretically. There have been some recent evidence that suggests that dynamics of arbitrary neuronal networks can be reduced to pairwise interactions.(Schneidman et al, 2006; Shlens et al, 2006.) It's unknown, however, whether such descriptive dynamics impart any important computational function. With the emergence of two-photon microscopy and calcium imaging, we now have powerful experimental methods with which to test the new theories regarding neuronal networks.

While many neuro-theorists prefer models with reduced complexity, others argue that uncovering structure function relations depends on including as much neuronal and network structure as possible. Models of this type are typically built in large simulations platforms like GENESIS or Neuron. There have been some attempts to provide unified methods that bridge, and integrate, these levels of complexity.

Cognition, discrimination and learning

Computational modeling of higher cognitive functions has only begun recently. Experimental data comes primarily from single-unit recording in primates. The frontal lobe and parietal lobe function as integrators of information from multiple sensory modalities. There are some tentative ideas regarding how simple mutually inhibitory functional circuits in these areas may carry out biologically relevant computation.

The brain seems to be able to discriminate and adapt particularly well in certain contexts. For instance, human beings seem to have an enormous capacity for memorizing and recognizing faces. One of the key goals of computational neuroscience is to dissect how biological systems carry out these complex computations efficiently and potentially replicate these processes in building intelligent machines.

Consciousness

The ultimate goal of neuroscience is to be able to explain the every day experience of conscious life. Francis Crick and Christof Koch made some attempts in formulating a consistent framework for future work in neural correlates of consciousness (NCC), though much of the work in this field remains speculative.

AI-complete

2009-03-09T09:09:00.000-07:00

In the field of artificial intelligence, the most difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems is equivalent to solving the central artificial intelligence problem—making computers as intelligent as people, or strong AI.

The term was coined by Fanya Montalvo by analogy with NP-complete and NP-hard in complexity theory, which formally describes the most famous class of difficult problems. (Mallery 1988) Early uses of the term are in Erik Mueller's 1987 Ph.D. dissertation and in Eric Raymond's 1991 jargon file.

To call a problem AI-complete reflects an attitude that it won't be solved by a simple algorithm, such as those used in ELIZA. Such problems are hypothesised to include:

Computer vision (and subproblems such as object recognition)
Natural language understanding (and subproblems such as text mining, machine translation, and word sense disambiguation)
Dealing with unexpected circumstances while solving any real world problem, whether it's navigation or planning or even the kind of reasoning done by expert systems.

Examples

For example, consider a straight-forward, limited and specific task: machine translation. To translate accurately, a machine must be able to understand the text. It must be able to follow the author's argument, so it must have some ability to reason. It must have extensive world knowledge so that it knows what is being discussed — it must at least be familiar with all the same commonsense facts that the average human translator knows. Some of this knowledge is in the form of facts that can be explicitly represented, but some knowledge is unconscious and closely tied to the human body: for example, the machine may need to understand how an ocean makes one feel to accurately translate a specific metaphor in the text. It must also model the authors' goals, intentions, and emotional states to accurately reproduce them in a new language. In short, the machine is required to have wide variety of human intellectual skills, including reason, commonsense knowledge and the intuitions that underly motion and manipulation, perception, and social intelligence. Machine translation, therefore, is believed to be AI-complete: it may require strong AI to be done as well as humans can do it.

AI systems can solve very simple restricted versions of AI-complete problems, but never in their full generality. When AI researchers attempt to "scale up" their systems to handle more complicated, real world situations, the programs tend to become excessively brittle without commonsense knowledge or a rudimentary understanding of the situation: they fail as unexpected circumstances outside of it's original problem context begin to appear. When human beings are dealing with new situations in the world, they are helped immensely by the fact that they know what to expect: they know what all things around them are, why they are there, what they are likely to do and so on. They can recognize unusual situations and adjust accordingly. A machine without strong AI has no other skills to fall back on. (Lenat & Guha 1989, pp. 1-5)

Formalisation

Computational complexity theory deals with the relative computational difficulty of computable functions. By definition it does not cover problems whose solution are unknown or have not been characterised formally. Since many AI problems have no formalisation yet, conventional complexity theory does not allow the definition of AI-completeness.

To address this problem, a complexity theory for AI has been proposed. It is based on a model of computation that splits the computational burden between a computer and a human: one part is solved by computer and the other part solved by human. This is formalised by a human-assisted Turing machine. The formalisation defines algorithm complexity, problem complexity and reducibility which in turn allows equivalence classes to be defined.

The complexity of executing an algorithm with a human-assisted Turing machine is given by a pair:

where the first element represents the complexity of the human's part and the second element is the complexity of the machine's part.

Results

The complexity of solving the following problems with a human-assisted Turing machine is:

Optical character recognition for printed text:
Turing test:
- for an $n$ -sentence conversation where the oracle remembers the conversation history (persistent oracle):
- for an $n$ -sentence conversation where the conversation history must be retransmitted:
- for an $n$ -sentence conversation where the conversation history must be retransmitted and the person takes linear time to read the query
ESP game:
Image labelling (based on the Arthur–Merlin protocol):
Image classification: human only: , and with less reliance on the human: .

Strong AI

2009-03-09T09:03:00.000-07:00

Strong AI is artificial intelligence that matches or exceeds human intelligence—the intelligence of a machine that can successfully perform any intellectual task that a human being can. It is a primary goal of artificial intelligence research and an important topic for science fiction writers and futurists. Strong AI is also referred to as "artificial general intelligence" or as the ability to perform "general intelligent action". Science fiction, associates strong AI with such human traits as consciousness, sentience, sapience and self-awareness.

Some references emphasize a distinction between strong AI and "applied AI" (also called "narrow AI" or "weak AI"): the use of software to study or accomplish specific problem solving or reasoning tasks that do not encompass (or in some cases are completely outside of) the full range of human cognitive abilities.

Requirements of strong AI

Many different definitions of intelligence have been proposed (such as being able to pass the Turing test) but there is to date no definition that satisfies everyone. However, there is wide agreement among artificial intelligence researchers that all of the following traits require intelligence:

reason, use strategy, solve puzzles, and make judgements under uncertainty;
represent knowledge, including commonsense knowledge;
plan;
learn;
communicate in natural language;
and the ability to integrate all these skills towards common goals.

Work is underway to design machines that have these abilities and it is expected that strong AI would have most if not all of these capabilities.

There are other aspects of the human mind besides intelligence that also bear on the concept of strong AI:

consciousness: To be responsive to the environment.
self-awareness: To be aware of oneself as a separate individual, especially to be aware of one's own thoughts.
sentience: The ability to "feel."
sapience: the capacity for wisdom.

None of these are necessary for strong AI—for example, it is not clear if consciousness is necessary for a machine to reason as well as human beings can. It is also not clear whether any of these traits are sufficient for intelligence: if a machine is built with a device that simulates the neural correlates of consciousness, would it automatically have the ability to represent knowledge or use natural language? It is also possible that some of these properties, such as sentience, naturally emerge from a fully intelligent machine, or that it becomes natural to ascribe these properties to machines once they begin to act in a way that is clearly intelligent. For example, intelligent action may be sufficient for sentience, rather than the other way around.

Research approaches

History of mainstream AI research

Modern AI research began in the middle 50s. The first generation of AI researchers were convinced that strong AI was possible and that it would exist in just a few decades. As AI pioneer Herbert Simon wrote in 1965: "machines will be capable, within twenty years, of doing any work a man can do." Their predictions were the inspiration for Stanley Kubrick and Arthur C. Clarke's character HAL 9000, who accurately embodied what AI researchers believed they could create by the year 2001. Of note is the fact that AI pioneer Marvin Minsky was a consultant on the project of making HAL 9000 as realistic as possible according to the consensus predictions of the time, having himself said on the subject in 1967, "Within a generation...the problem of creating 'artificial intelligence' will substantially be solved."

However, in the early 70s, it became obvious that researchers had grossly underestimated the difficulty of the project. The agencies that funded AI became skeptical of strong AI and put researchers under increasing pressure to produce useful technology, or "applied AI". As the eighties began, Japan's fifth generation computer project revived interest in strong AI, setting out a ten year timeline that included strong AI goals like "carry on a casual conversation". In response to this and the success of expert systems, both industry and government pumped money back into the field. However, the market for AI spectacularly collapsed in the late 80s and the goals of the fifth generation computer project were never fulfilled. For the second time in 20 years, AI researchers who had predicted the imminent arrival of strong AI had been shown to be fundamentally mistaken about what they could accomplish.

By the 1990s, AI researchers had gained a reputation for making promises they could not keep. AI researchers became reluctant to make any kind of prediction at all and avoid any mention of "human level" artificial intelligence, for fear of being labeled a "wild-eyed dreamer."

Mainstream AI research

For the most part, researchers today choose to focus on specific sub-problems where they can produce verifiable results and commercial applications, such as neural nets, computer vision or data mining.

Most mainstream AI researchers hope that strong AI can be developed by combining the programs that solve various subproblems using an integrated agent architecture, cognitive architecture or subsumption architecture. Hans Moravec wrote in 1988 "I am confident that this bottom-up route to artificial intelligence will one day meet the traditional top-down route more than half way, ready to provide the real world competence and the commonsense knowledge that has been so frustratingly elusive in reasoning programs. Fully intelligent machines will result when the metaphorical golden spike is driven uniting the two efforts."

Artificial general intelligence

Artificial General Intelligence research aims to create AI that can replicate human-level intelligence completely, often called an Artificial General Intelligence (AGI) to distinguish from less ambitious AI projects. (The concept is derived from the psychometric notion of natural general intelligence (often denoted "g"), though no adherence to any particular theory of g is implied.) As yet, researchers have devoted little attention to AGI, with some claiming that intelligence is too complex to be completely replicated in the near term. Some small groups of computer scientists are doing AGI research, however. Organizations pursuing AGI include the Adaptive AI, Artificial General Intelligence Research Institute (AGIRI), Singularity Institute for Artificial Intelligence and Texai. One recent addition is Numenta, a project based on the theories of Jeff Hawkins, the creator of the Palm Pilot. While Numenta takes a computational approach to general intelligence, Hawkins is also the founder of the RedWood Neuroscience Institute, which explores conscious thought from a biological perspective. AND Corporation has been active in this field since 1990, and has developed machine intelligence processes based on phase coherence principles, having strong similarities to digital holography and QM with respect to quantum collapse of the wave function.

Simulated human brain model

Simulated human brain model could be one of the quickest means of achieving strong AI, as it doesn't require complete understanding of how intelligence works. Basically, a very powerful computer would simulate a human brain, often in the form of a network of neurons. For example, given a map of all (or most) of the neurons in a functional human brain, and a good understanding of how a single neuron works, it is theoretically possible for a computer program to simulate the working brain over time. Given some method of communication, this simulated brain might then be shown to be fully intelligent. The exact form of the simulation varies: instead of neurons, a simulation might use groups of neurons, or alternatively, individual molecules might be simulated. It's also unclear which portions of the human brain would need to be modeled: humans can still function while missing portions of their brains, and areas of the brain are associated with activities (such as breathing) that might not be necessary to think.

Speculation: human brains have developed to accommodate certain necessities, such as breathing and interpreting sensory input from a variety of sources. Without adequate simulations of these necessities (such as, for example, input that simulates the sensation of sufficient oxygen levels in the body), it is possible that an artificial brain could have difficulty functioning. In addition, human brains are reliant for stability on a number of mediating factors, including stages of development and external training. An artificial duplicate of the human brain, without input of mediation, could conceivably suffer from a number of cognitive and functional difficulties. In addition, the construction and sustenance of an artificial brain raises moral questions, namely regarding personhood, freedom, and death. Does a "brain in a box" constitute a person? What rights would such an entity have, under law or otherwise? Once activated, would human beings have the obligation to continue its operation? Would the shutdown of an artificial brain constitute death, sleep, unconsciousness, or some other state for which no human description exists? After all, an artificial brain is not subject to post-mortem cellular decay (and associated loss of function) as human brains are, so an artificial brain could, theoretically, resume functionality exactly as it would if it was before it was shut down?

This approach would require three things:

Hardware. An extremely powerful computer would be required for such a model. Futurist Ray Kurzweil estimates 10 million MIPS, or ten petaflops. At least one special-purpose petaflops computer has already been built (the Riken MDGRAPE-3) and there are nine current computing projects (such as Blue Gene/P) to build more general purpose petaflops computers all of which should be completed by 2008, if not sooner. Most other attempted estimates of the brain's computational power equivalent have been rather higher, ranging from 100 million MIPS (100 petaflops) to 100 billion MIPS (100,000 petaflops). Using accurate Top500 projections, it might be estimated that such levels of computing power might be reached using the top-performing CPU-based supercomputers to be by ~2015 (for 100 petaflops), up to a more conservative estimate of ~2025 (for 100,000 petaflops). However, considering that GPU processing and Stream Processing power appears to double every year, these estimates will be reached much sooner using GPGPU processing as high-end GPU's set to arrive in early 2008 are already going to be able to process over 1 teraflop, which is 20x more powerful than a standard quad-core CPU. It should also be noted, however, that the overhead introduced by the modeling of the biological, chemical, and physical details of neural behaviour (especially on a molecular scale) might require a simulator to have access to computational power much greater than that of the brain itself and that current simulations and estimates do not account for the importance of Glial cells which outnumber neurons 10:1.
Software. Software to simulate the function of a brain would be required. This assumes that the human mind is the central nervous system and is governed by physical laws. Constructing the simulation would require a great deal of knowledge about the physical and functional operation of the human brain, and might require detailed information about a particular human brain's structure. Information would be required both of the function of different types of neurons, and of how they are connected. Note that the particular form of the software dictates the hardware necessary to run it. For example, an extremely detailed simulation including molecules or small groups of molecules would require enormously more processing power than a simulation that models neurons using a simple equation, and a more accurate model of a neuron would be expected to be much more expensive computationally than a simple model. The more neurons in the simulation, the more processing power it would require.
Understanding. Finally, it requires sufficient understanding thereof to be able to model it mathematically. This could be done either by understanding the central nervous system, or by mapping and copying it. Neuroimaging technologies are improving rapidly, and Kurzweil predicts that a map of sufficient quality will become available on a similar timescale to the required computing power. However, the simulation would also have to capture the detailed cellular behaviour of neurons and glial cells, presently only understood in the broadest of outlines.

Once such a model is built, it will be easily altered and thus open to trial-and-error experimentation. This is likely to lead to huge advances in understanding, allowing the model's intelligence to be improved/motivations altered.

The Blue Brain project aims to use one of the fastest supercomputer architectures in the world, IBM's Blue Gene platform, to simulate a single neocortical column consisting of approximately 60,000 neurons and 5km of interconnecting synapses. The eventual goal of the project is to use supercomputers to replicate an entire brain.

The brain gets its power from performing many parallel operations, a standard computer from performing operations very quickly. It should be noted, however, that supercomputers also perform many operations in parallel. Good examples of this are the Cray and NEC vector computers which operate as a single machine but perform many calculations at once. Another example is any form of cluster computing, where multiple single computers operate as one.

The human brain has roughly 100 billion neurons operating simultaneously, connected by roughly 100 trillion synapses. By comparison, a modern computer microprocessor uses only 1.7 billion transistors. Although estimates of the brain's processing power put it at around 1014 neuron updates per second, it is expected that the first unoptimized simulations of a human brain in real time will require a computer capable of 1018 FLOPS. Non-real time simulations of the human brain (10¹¹ neurons) were performed in 2005 and it took 50 days on a cluster of 27 processors to simulate 1 second of a model. By comparison a general purpose CPU (circa 2006) operates at a few GFLOPS (10⁹ FLOPS). (each FLOP may require as many as 20,000 logic operations).

However, a neuron is estimated to spike 200 times per second (this giving an upper limit on the number of operations). Signals between them are transmitted at a maximum speed of 150 meters per second. A modern 2GHz processor operates at 2 billion cycles per second, or 10,000,000 times faster than a human neuron, and signals in electronic computers travel at roughly half the speed of light; faster than signals in humans by a factor of 1,000,000. The brain consumes about 20W of power whereas supercomputers may use as much as 1MW or an order of 100,000 more (note: Landauer limit is 3.5x10²⁰ op/sec/watt at room temperature).

Neuro-silicon interfaces have also been proposed.

Critics of this approach believe that it is possible to achieve AI directly without imitating nature and often have used the analogy that early attempts to construct flying machines modeled them after birds, but that modern aircraft do not look like birds. The direct approach is used in AI - What is this, where it is shown that if we have a formal definition of AI, it can be found by enumerating all possible programs and then testing each of them to see whether it has produced Artificial Intelligence, or has not.

Artificial consciousness research

Artificial consciousness research aims to create and study artificially conscious systems. Igor Aleksander argues that the principles for creating a conscious machine already existed but that it would take forty years to train such a machine to understand language.

Franklin’s Intelligent Distribution Agent

Stan Franklin (1995, 2003) defines an autonomous agent as possessing functional consciousness when it is capable of several of the functions of consciousness as identified by Bernard Baars’ Global Workspace Theory (GWT). His brain child IDA (Intelligent Distribution Agent) is a software implementation of GWT, which makes it functionally conscious by definition. IDA’s task is to negotiate new assignments for sailors in the US Navy after they end a tour of duty, by matching each individual’s skills and preferences with the Navy’s needs. IDA interacts with Navy databases and communicates with the sailors via natural language email dialog while obeying a large set of Navy policies. The IDA computational model was developed during 1996-2001 at Stan Franklin’s "Conscious" Software Research Group at the University of Memphis. It "consists of approximately a quarter-million lines of Java code, and almost completely consumes the resources of a 2001 high-end workstation." It relies heavily on codelets, which are "special purpose, relatively independent, mini-agent[s] typically implemented as a small piece of code running as a separate thread." In IDA’s top-down architecture, high-level cognitive functions are explicitly modeled; see Franklin (1995) and Franklin (2003) for details. While IDA is functionally conscious by definition, Franklin does “not attribute phenomenal consciousness to [his] own 'conscious' software agent, IDA, in spite of her many human-like behaviours. This in spite of watching several US Navy detailers repeatedly nodding their heads saying 'Yes, that’s how I do it' while watching IDA’s internal and external actions as she performs her task."

Haikonen’s cognitive architecture

Pentti Haikonen (2003) considers classical rule-based computing inadequate for achieving AC: "the brain is definitely not a computer. Thinking is not an execution of programmed strings of commands. The brain is not a numerical calculator either. We do not think by numbers." Rather than trying to achieve mind and consciousness by identifying and implementing their underlying computational rules, Haikonen proposes "a special cognitive architecture to reproduce the processes of perception, inner imagery, inner speech, pain, pleasure, emotions and the cognitive functions behind these. This bottom-up architecture would produce higher-level functions by the power of the elementary processing units, the artificial neurons, without algorithms or programs". Haikonen believes that, when implemented with sufficient complexity, this architecture will develop consciousness, which he considers to be "a style and way of operation, characterized by distributed signal representation, perception process, cross-modality reporting and availability for retrospection." Haikonen is not alone in this process view of consciousness, or the view that AC will spontaneously emerge in autonomous agents that have a suitable neuro-inspired architecture of complexity; these are shared by many, e.g. Freeman (1999) and Cotterill (2003). A low-complexity implementation of the architecture proposed by Haikonen (2004) was reportedly not capable of AC, but did exhibit emotions as expected.

Emergence

Some have suggested that intelligence can arise as an emergent quality from the convergence of random, man-made technologies. Human sentience — or any other biological and naturally occurring intelligence — arises out of the natural process of species evolution and an individual's experiences.

Origin of the term: John Searle's strong AI

The term "strong AI" was adopted from the name of an argument in the philosophy of artificial intelligence first identified by John Searle as part of his Chinese room argument in 1980. He wanted to distinguish between two different hypotheses about artificial intelligence:

An artificial intelligence system can think and have a mind.
An artificial intelligence system can (only) act like it thinks and has a mind.

The first one is called "the strong AI hypothesis" and the second is "the weak AI hypothesis" because the first one makes the stronger statement: it assumes something special has happened to the machine that goes beyond all its abilities that we can test. Searle referred to the "strong AI hypothesis" as "strong AI". This usage, which is fundamentally different than the subject of this article, is common in academic AI research and textbooks.

The term "strong AI" is now used to describe any artificial intelligence system that acts like it has a mind, regardless of whether a philosopher would be able to determine if it actually has a mind or not. Dijkstra has been quoted as saying, "The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." As Russell and Norvig write: "Most AI researchers take the weak AI hypothesis for granted, and don't care about the strong AI hypothesis." AI researchers are interested in a related statement (that some sources confusingly call "the strong AI hypothesis"):

An artificial intelligence system can think (or act like it thinks) as well or better than people do.

This assertion, which hinges on the breadth and power of machine intelligence, is the subject of this article.

Computational creativity

2009-03-09T08:57:00.000-07:00

Computational creativity (also known as artificial creativity, mechanical creativity or creative computation) is a multidisciplinary endeavour that is located at the intersection of the fields of artificial intelligence, cognitive psychology, philosophy, and the arts.

The goal of computational creativity is to model, simulate or replicate creativity using a computer, to achieve one of several ends:

to construct a program or computer capable of human-level creativity
to better understand human creativity and to formulate an algorithmic perspective on creative behavior in humans
to design programs that can enhance human creativity without necessarily being creative themselves

The field of computational creativity concerns itself with theoretical and practical issues in the study of creativity. Theoretical work on the nature and proper definition of creativity is performed in parallel with practical work on the implementation of systems that exhibit creativity, with one strand of work informing the other.

Theoretical issues

As measured by the amount of activity in the field (e.g., publications, conferences and workshops), computational creativity is a growing area of research. But the field is still hampered by a number of fundamental problems:

Creativity is very difficult, perhaps even impossible, to define in objective terms.
Creativity takes many forms in human activity, some eminent (meaning "recognized" or "ingenious", e.g., Einstein's creativity; sometimes referred to as "Creativity" with a capital C) and some mundane.
Creativity can mean different things in different contexts: Is it a state of mind, a talent or ability, or a process? Does it describe a person, an activity or an end-product? Can collaborative work in which exceptional products emerge from simple interactions be considered creative?

These are problems that complicate the study of creativity in general, but certain problems attach themselves specifically to computational creativity:

Can creativity be hard-wired? In existing systems to which creativity is attributed, is the creativity that of the system or that of the system's programmer or designer?
How do we evaluate computational creativity? What counts as creativity in a computational system? Are natural language generation systems creative? Are machine translation systems creative? What distinguishes research in computational creativity from research in artificial intelligence generally?
If eminent creativity is about rule-breaking or the disavowal of convention, how is it possible for an algorithmic system to be creative? In essence, this is a variant of the Ada Lovelace objection to machine intelligence, as recapitulated by modern theorists such as Teresa Amabile: If a machine can do only what it was programmed to do, how can its behavior ever be called creative?

Defining creativity in computational terms

Since no single perspective or definition seems to offer a complete picture of creativity, the AI researchers Newell, Shaw and Simon developed the combination of novelty and usefulness into the corner-stone of a multi-pronged view of creativity, one that uses the following four criteria to categorize a given answer or solution as creative:

The answer is novel and useful (either for the individual or for society)
The answer demands that we reject ideas we had previously accepted
The answer results from intense motivation and persistence
The answer comes from clarifying a problem that was originally vague

Notice how these criteria touch on many of the stereotypical themes that are typically associated with creativity: newness and value (1), transformation and revolution (2), passion and drive (3), vision and insight (4). These four criteria also combine elements of the producer-perspective and the product-perspective described earlier: criterion (1) characterizes the two most important qualities of a creative product, while criteria (2) – (4) characterize the attitude and actions of the producer of such a product. A given product may satisfy all or none of these criteria, but we should expect products that exhibit all four to be widely perceived as creative, while products that exhibit just some of these criteria will be judged with greater subjectivity and variation. Though no criterion is likely to be either necessary or sufficient, criterion (1) is perhaps the most common hallmark of creativity and thus serves to anchor the others. From a computational perspective, then, one can consider (1) to be a must-have feature, and (2) – (4) as desirable extras.

Newell and Simon are best known for their contribution to the search-in-a-state-space paradigm of AI, sometimes caricatured as Good Old Fashioned AI (GOFAI), and it is interesting to consider how the GOFAI paradigm can incorporate these criteria. From a search perspective, criterion (1) characterizes the goal or end-state of a computational search, criterion (4) characterizes the starting state from which the search is launched, criterion (3) characterizes the scale of the search, suggesting that many dead-ends are likely to be encountered, while criterion (2) suggests that well-worn pathways through the search space are best avoided if a creative end-state is to be reached.

Key ideas

Some high-level and philosophical themes recur throughout the field of computational creativity.

P-creativity and H-creativity

Margaret Boden refers to creativity that is novel merely to the agent that produces it as "P-creativity" (or "psychological creativity"), and refers to creativity that is recognized as novel by society at large as "H-creativity" (or "historical creativity").

Exploratory and transformational creativity

Boden also distinguishes between the creativity that arises from an exploration within an established conceptual space, and the creativity that arises from a deliberate transformation or transcendence of this space. She labels the former as "exploratory creativity" and the latter as "transformational creativity", seeing the latter as a form of creativity far more radical, challenging, and rare than the former. Following Newell and Simon’s criteria, we can see that both forms of creativity should produce results that are appreciably novel and useful (criterion 1), but exploratory creativity is more likely to arise from a thorough and persistent search of a well-understood space (criterion 3) while transformational creativity should involve the rejection of some of the constraints that define this space (criterion 2) or some of the assumptions that define the problem itself (criterion 4).

Boden’s insights have guided work in computational creativity at a very general level, providing more an inspirational touchstone for development work than a technical framework of algorithmic substance. However, Boden’s insights are the subject of formalization, most notably in the work by Geraint Wiggins.

Generation and evaluation

The criterion that creative products should be novel and useful means that creative computational systems are typically structured into two phases, generation and evaluation. In the first phase, novel (to the system itself, thus P-Creative) constructs are generated; unoriginal constructs that are already known to the system are filtered at this stage. This body of potentially creative constructs are then evaluated, to determine which are meaningful and useful and which are not. This two-phase structure conforms to the Geneplore model of Finke, Ward and Smith, which is a psychological model of creative generation based on empirical observation of human creativity.

Combinatorial creativity

A great deal, perhaps all, of human creativity can be understood as a novel combination of pre-existing ideas or objects. Common strategies for combinatorial creativity include:

placing a familiar object in an unfamiliar setting (e.g., Marcel Duchamp's Fountain) or an unfamiliar object in a familiar setting (e.g., a fish-out-of-water story such as The Beverly Hillbillies)
Blending two superficially different objects or genres (e.g., a sci-fi story set in the Wild West, with robot cowboys, as in Westworld; Jewish haiku poems, etc.)
Comparing a familiar object to a superficially unrelated and semantically distant concept (e.g., "Makeup is the Western burka"; "A zoo is a gallery with living exhibits")
Adding a new and unexpected feature to an existing concept (e.g., adding a scalpel to a Swiss Army knife; adding a camera to a mobile phone)
Compressing two incongruous scenarios into the same narrative to get a joke (e.g., the Emo Philips joke “Women are always using me to advance their careers. Damned anthropologists!”)
Using an iconic image from one domain in a domain for an unrelated or incongruous idea or product (e.g., using the Marlboro Man image to sell cars, or to advertise the dangers of smoking-related impotence).

The combinatorial perspective allows us to model creativity as a search process through the space of possible combinations. The combinations can arise from composition or concatenation of different representations, or through a rule-based or stochastic transformation of initial and intermediate representations. Genetic algorithms and neural networks can be used to generate blended or crossover representations that capture a combination of different inputs.

Bisociation

Arthur Koestler proposes a very general model of creative combination in his 1964 book The Act of Creation, claiming that scientific discovery, art and humour are all linked by a common mechanism called "bisociation". Koestler lacked a formal, computational vocabulary for describing bisociation, which he defined as a reconciliation of two orthogonal matrices of thought (conceptual structures, mental spaces).

Conceptual blending

Mark Turner and Gilles Fauconnier propose a model called Conceptual Integration Networks that elaborates upon the ideas of Koestler by synthesizing ideas from Cognitive Linguistic research into mental spaces and conceptual metaphors. Their basic model defines an integration network as four connected spaces:

A first input space (contains one conceptual structure or mental space)
A second input space (to be blended with the first input)
A generic space of stock conventions and image-schemas that allow the input spaces to be understood from an integrated perspective
A blend space in which a selected projection of elements from both input spaces are combined; inferences arising from this combination also reside here, sometimes leading to emergent structures that conflict with the inputs.

Fauconnier and Turner describe a collection of optimality principles that are claimed to guide the construction of a well-formed integration network. In essence, they see blending as a compression mechanism in which two or more input structures are compressed into a single blend structure. This compression operates on the level of conceptual relations. For example, a series of similarity relations between the input spaces can be compressed into a single identity relationship in the blend.

Blending theory is an elaborate framework that provides a rich terminology for describing the products of creative thinking, from metaphors to jokes to neologisms to adverts. It is most typically applied retrospectively, to describe how a blended conceptual structure could have arisen from a particular pair of input structures. These conceptual structures are often good examples of human creativity, but blending theory is not a theory of creativity, nor – despite its authors’ claims – does it describe a mechanism for creativity. The theory lacks an explanation for how a creative individual chooses the input spaces that should be blended to generate a desired result.

Nonetheless, some computational success has been achieved with the blending model by extending pre-existing computational models of analogical mapping that are compatible by virtue of their emphasis on connected semantic structures. More recently, Francisco Câmara Pereira presented an implementation of blending theory that employs ideas both from GOFAI and from genetic algorithms to realize some aspects of blending theory in a practical form; his example domains range from the linguistic to the visual, and the latter most notably includes the creation of mythical monsters by combining 3-D graphical models.

Linguistic creativity

Language provides continuous opportunity for creativity, evident in the generation of novel sentences, phrasings, puns, neologisms, rhymes, allusions, sarcasm, irony, similes, metaphors, analogies, witticisms, and jokes. Native speakers of morphologically rich languages (including all Slavic languages) frequently create new word-forms that are easily understood, although they will never find their way to the dictionary. The area of natural language generation has been well studied, but these creative aspects of everyday language have yet to be incorporated with any robustness or scale.

Story generation

Substantial work has been conducted in this area of linguistic creation since the 1970s, with the development of James Meehan's TALE-SPIN system. TALE-SPIN viewed stories as narrative descriptions of a problem-solving effort, and created stories by first establishing a goal for the story’s characters so that their search for a solution could be tracked and recorded. The MINSTREL system represents a complex elaboration of this basis approach, distinguishing a range of character-level goals in the story from a range of author-level goals for the story. Systems like Bringsjord's BRUTUS elaborate these ideas further to create stories with complex inter-personal themes like betrayal. Nonetheless, MINSTREL explicitly models the creative process with a set of Transform Recall Adapt Methods (TRAMs) to create novel scenes from old. The MEXICA model of Rafael Pérez y Pérez and Mike Sharples is more explicitly interested in the creative process of storytelling, and implements a version of the engagement-reflection cognitive model of creative writing.

Metaphor and simile

The computational study of these phenomena has mainly focussed on interpretation as a knowledge-based process. Computationalists such as Yorick Wilks, James Martin, Dan Fass, John Barnden, and Mark Lee have developed knowledge-based approaches to the processing of metaphors, either at a linguistic level or a logical level. Tony Veale and Yanfen Hao have developed a system, called Sardonicus, that acquires a comprehensive database of explicit similes from the web; these similes are then tagged as bona-fide (e.g., "as hard as steel") or ironic (e.g., "as hairy as a bowling ball", "as pleasant as a root canal"); similes of either type can be retrieved on demand for any given adjective. They use these similes as the basis of an on-line metaphor generation system called Aristotle that can suggest lexical metaphors for a given descriptive goal (e.g., to describe a supermodel as skinny, the source terms “pencil”, “whip”, “whippet”, “rope”, “stick-insect” and “snake” are suggested).

Analogy

The process of analogical reasoning has been studied from both a mapping and a retrieval perspective, the latter being key to the generation of novel analogies. The dominant school of research, as advanced by Dedre Gentner. views analogy as a structure-preserving process; this view has been implemented in the structure mapping engine or SME, the MAC/FAC retrieval engine (Many Are Called, Few Are Chosen), ACME (Analogical Constraint Mapping Engine) and ARCS (Analogical Retrieval Constraint System). Other mapping-based approaches include Sapper, which situates the mapping process in a semantic-network model of memory. Analogy is a very active sub-area of creative computation and creative cognition; active figures in this sub-area include Douglas Hofstadter, Paul Thagard, and Keith Holyoak. Also worthy of note here is Peter Turney and Michael Littman's machine learning approach to the solving of SAT-style analogy problems; their approach achieves a score that compares well with average scores achieved by humans on these tests.

Joke generation

Humour is an especially knowledge-hungry process, and the most successful joke-generation systems to date have focussed on pun-generation, as exemplified by the work of Kim Binsted and Graeme Ritchie. This work includes the JAPE system, which can generate a wide range of puns that are consistently evaluated as novel and humorous by young children. An improved version of JAPE has been developed in the guise of the STANDUP system, which has been experimentally deployed as a means of enhancing linguistic interaction with children with communication disabilities. Some limited progress has been made in generating humour that involves other aspects of natural language, such as the deliberate misunderstanding of pronominal reference (in the work of Hans Wim Tinholt and Anton Nijholt), as well as in the generation of humorous acronyms in the HAHAcronym system of Oliviero Stock and Carlo Strapparava.

Neologisms

The blending of multiple word forms is a dominant force for new word creation in language; these new words are commonly called "blends" or "portmanteau words" (after Lewis Carroll). Tony Veale has developed a system called ZeitGeist that harvests neological headwords from Wikipedia and interprets them relative to their local context in Wikipedia and relative to specific word senses in WordNet. ZeitGeist has been extended to generate neologisms of its own; the approach combines elements from an inventory of word parts that are harvested from WordNet, and simultaneously determines likely glosses for these new words (e.g., "food traveller" for "gastronaut" and "time traveller" for "chrononaut"). It then uses Web search to determine which glosses are meaningful and which neologisms have not been used before; this search identifies the subset of generated words that are both novel ("H-creative") and useful. Neurolinguistic inspirations have been used to analyze the process of novel word creation in the brain, understand neurocognitive processes responsible for intuition, insight, imagination and creativity and to create a server that invents novel names for products, based on their description.

Poetry

More than iron, more than lead, more than gold I need electricity.
I need it more than I need lamb or pork or lettuce or cucumber.
I need it for my dreams. Racter, from The Policeman's Beard Is Half Constructed

Like jokes, poems involve a complex interaction of different constraints, and no general-purpose poem generator adequately combines the meaning, phrasing, structure and rhyme aspects of poetry. Nonetheless, Pablo Gervás has developed a noteworthy system called ASPERA that employs a case-based reasoning (CBR) approach to generating poetic formulations of a given input text via a composition of poetic fragments that are retrieved from a case-base of existing poems. Each poem fragment in the ASPERA case-base is annotated with a prose string that expresses the meaning of the fragment, and this prose string is used as the retrieval key for each fragment. Metrical rules are then used to combine these fragments into a well-formed poetic structure. Example software projects include:

Flowerewolf automatic poetry generator

And poetry collections include:

Musical creativity

Computational creativity in the music domain has focussed both on the generation of musical scores for use by human musicians, and on the generation of music for performance by computers. The domain of generation has included classical music (with software that generates music in the style of Mozart and Bach) and jazz. Most notably, David Cope has written a software system called "Experiments in Musical Intelligence" (or "EMI") that is capable of analyzing and generalizing from existing music by a human composer to generate novel musical compositions in the same style. EMI's output is convincing enough to persuade human listeners that its music is human-generated to a high level of competence.

Creativity research in jazz has focussed on the process of improvisation and the cognitive demands that this places on a musical agent: reasoning about time, remembering and conceptualizing what has already been played, and planning ahead for what might be played next.

Visual and artistic creativity

Computational creativity in the generation of visual art has had some notable successes in the creation of both abstract art and representational art. The most famous program in this domain is Harold Cohen's AARON, which has been continuously developed and augmented since 1973. Though formulaic, Aaron exhibits a range of outputs, generating black-and-white drawings or colour paintings that incorporate human figures (such as dancers), potted plants, rocks, and other elements of background imagery. These images are of a sufficiently high quality to be displayed in reputable galleries.

Other software artists of note include the NEvAr system (for "Neuro-Evolutionary Art") of Penousal Machado. NEvAr uses a genetic algorithm to derive a mathematical function that is then used to generate a coloured three-dimensional surface. A human user is allowed to select the best pictures after each phase of the genetic algorithm, and these preferences are used to guide successive phases, thereby pushing NEvAr’s search into pockets of the search space that are considered most appealing to the user.

The Painting Fool is an ambitious system developed by Simon Colton that originated as a system for overpainting digital images of a given scene in a choice of different painting styles, colour palettes and brush types. Given its dependence on an input source image to work with, the earliest iterations of the Painting Fool raised as questions about the extent of, or lack of, creativity in a computational art system. Nonetheless, in more recent work, The Painting Fool has been extended to create novel images, much as AARON does, from its own limited imagination. Images in this vein include cityscapes and forests, which are generated by a process of constraint satisfaction from some basic scenarios provided by the user (e.g., these scenarios allow the system to infer that objects closer to the viewing plane should be larger and more color-saturated, while those further away should be less saturated and appear smaller). Artistically, the images now created by the Painting Fool appear on a par with those created by Aaron, though the extensible mechanisms employed by the former (constraint satisfaction, etc.) may well allow it to develop into a more elaborate and sophisticated painter.

Events

The community of computational creativity has held a dedicated workshop, the International Joint Workshop on Computational Creativity, every year since 1999. Usually held as part of a larger conference event, this workshop series is now autonomous; the most recent stand-alone event was held in September 2008 at the Facultad de Informática, Universidad Complutense de Madrid, Spain. Previous events in this series include:

IJWCC 2003, Acapulco, Mexico, as part of IJCAI'2003
IJWCC 2004, Madrid, Spain, as part of ECCBR'2004
IJWCC 2005, Edinburgh, UK, as part of IJCAI'2005
IJWCC 2006, Riva del Garda, Italy, as part of ECAI'2006
IJWCC 2007, London, UK, a stand-alone event

The steering committee for these events comprises the following researchers:

Amilcar Cardoso (University of Coimbra, Portugal)
Simon Colton (Imperial College London, UK)
Pablo Gervás (Universidad Complutense de Madrid, Spain)
Francisco C Pereira (University of Coimbra, Portugal)
Tony Veale (University College Dublin, Ireland)
Geraint A. Wiggins (Goldsmiths, University of London, UK)

Publication forums

In addition to the proceedings of these workshops, the computational creativity community has thus far produced two special journal issues dedicated to the topic:

Journal of Knowledge-Based Systems, volume 9, issue 7, November 2006
New Generation Computing, volume 24, issue 6, 2006

Affective computing

2009-03-09T08:52:00.000-07:00

Affective computing is a branch of the study and development of artificial intelligence that deals with the design of systems and devices that can recognize, interpret, and process human emotions. It is an interdisciplinary field spanning computer sciences, psychology, and cognitive science. While the origins of the field may be traced as far back as to early philosophical enquiries into emotion, the more modern branch of computer science originated with Rosalind Picard's 1995 paper on affective computing. A motivation for the research is the ability to simulate empathy. The machine should interpret the emotional state of humans and adapt its behaviour to them, giving an appropriate response for those emotions.

Areas of affective computing

Detecting and recognizing emotional information

Detecting emotional information begins with passive sensors which capture data about the user's physical state or behavior without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance.

Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done by parsing the data through various processes such as speech recognition, natural language processing, or facial expression detection, all of which are dependent on the human factor vis-a-vis programming.

Emotion in machines

Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents. The goal of such simulation is to enrich and facilitate interactivity between human and machine. While human emotions are often associated with surges in hormones and other neuropeptides, emotions in machines might be associated with abstract states associated with progress (or lack of progress) in autonomous learning systems. In this view, affective emotional states correspond to time-derivatives (perturbations) in the learning curve of an arbitrary learning system.

Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in The Emotion Machine that emotion is "not especially different from the processes that we call 'thinking.'"

Technologies of affective computing

Emotional speech

Emotional speech processing recognizes the user's emotional state by analyzing speech patterns. Vocal parameters and prosody features such as pitch variables and speech rate are analyzed through pattern recognition.

Emotional inflection and modulation in synthesized speech, either through phrasing or acoustic features is useful in human-computer interaction. Such capability makes speech natural and expressive. For example a dialog system might modulate its speech to be more puerile if it deems the emotional model of its current user is that of a child.

Facial expression

The detection and processing of facial expression is achieved through various methods such as optical flow, hidden Markov model, neural network processing or active appearance model.

Body gesture

Body gesture is the position and the changes of the body. There are many proposed methods to detect the body gesture. Hand gestures have been a common focus of body gesture detection, apparentness methods and 3-D modeling methods are traditionally used.

Potential applications

In e-learning applications, affective computing can be used to adjust the presentation style of a computerized tutor when a learner is bored, interested, frustrated, or pleased. Psychological health services, i.e. counseling, benefit from affective computing applications when determining a client's emotional state. Affective computing sends a message via color or sound to express an emotional state to others.

Robotic systems capable of processing affective information exhibit higher flexibility while one works in uncertain or complex environments. Companion devices, such as digital pets, use affective computing abilities to enhance realism and provide a higher degree of autonomy.

Other potential applications are centered around social monitoring. For example, a car can monitor the emotion of all occupants and engage in additional safety measures, such as alerting other vehicles if it detects the driver to be angry. Affective computing has potential applications in human computer interaction, such as affective mirrors allowing the user to see how he or she performs; emotion monitoring agents sending a warning before one sends an angry email; or even music players selecting tracks based on mood.

Affective computing is also being applied to the development of communicative technologies for use by people with autism.

Application examples

Wearable computer applications make use of affective technologies, such as detection of biosignals
Human–computer interaction
Kismet

Speech recognition

2009-03-09T08:45:00.000-07:00

The display of the Speech Recognition screensaver on a Toshiba laptop, in which the character responds to questions, e.g. "Where are you?" or statements, e.g. "Hello."

Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to machine-readable input (for example, to key presses, using the binary code for a string of character codes). The term "voice recognition" is sometimes incorrectly used to refer to speech recognition, when actually referring to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said. Confusingly, journalists and manufacturers of devices that use speech recognition for control commonly use the term Voice Recognition when they mean Speech Recognition.

Speech recognition applications include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), domotic appliance control and content-based spoken audio search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors or emails), and in aircraft cockpits (usually termed Direct Voice Input).

History

One of the most notable domains for the commercial application of speech recognition in the United States has been health care and in particular the work of the medical transcriptionist (MT). According to industry experts, at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted. It was also the case that SR at that time was often technically deficient. Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do. The biggest limitation to speech recognition automating transcription, however, is seen as the software. The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system. Another limitation has been the extensive amount of time required by the user and/or system provider to train the software.

A distinction in ASR is often made between "artificial syntax systems" which are usually domain-specific and "natural language processing" which is usually language-specific. Each of these types of application presents its own particular goals and challenges.

Applications

Health care

In the health care domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete. Many experts in the fieldanticipate that with increased use of speech recognition technology, the services provided may be redistributed rather than replaced.

Speech recognition can be implemented in front-end or back-end of the medical documentation process.

Front-End SR is where the provider dictates into a speech-recognition engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and signing off on the document. It never goes through an MT/editor.

Back-End SR or Deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report. Deferred SR is being widely used in the industry currently.

Many Electronic Medical Records (EMR) applications can be more effective and may be performed more easily when deployed in conjunction with a speech-recognition engine. Searches, queries, and form filling may all be faster to perform by voice than by using a keyboard.

Military

High-performance fighter aircraft

Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), the program in France on installing speech recognition systems on Mirage aircraft, and programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays. Generally, only very limited, constrained vocabularies have been used successfully, and a major effort has been devoted to integration of the speech recognizer with the avionics system.

Some important conclusions from the work were as follows:

Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently.
Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system.
More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained.

Laboratory research in robust speech recognition for military environments has produced promising results which, if extendable to the cockpit, should improve the utility of speech recognition in high-performance aircraft.

Working with Swedish pilots flying in the JAS-39 Gripen cockpit, Englund (2004) found recognition deteriorated with increasing G-loads. It was also concluded that adaptation greatly improved the results in all cases and introducing models for breathing was shown to improve recognition scores significantly. Contrary to what might be expected, no effects of the broken English of the speakers were found. It was evident that spontaneous speech caused problems for the recognizer, as could be expected. A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially.

The Eurofighter Typhoon currently in service with the UK RAF employs a speaker-dependent system, i.e. it requires each pilot to create a template. The system is not used for any safety critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other cockpit functions. Voice commands are confirmed by visual and/or aural feedback. The system is seen as a major design feature in the reduction of pilot workload, and even allows the pilot to assign targets to himself with two simple voice commands or to any of his wingmen with only five commands.

Helicopters

The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone. Substantial test and evaluation programs have been carried out in the past decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in the UK. Work in France has included speech recognition in the Puma helicopter. There has also been much useful work in Canada. Results have been encouraging, and voice applications have included: control of communication radios; setting of navigation systems; and control of an automated target handover system.

As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness. Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment. Much remains to be done both in speech recognition and in overall speech recognition technology, in order to consistently achieve performance improvements in operational settings.

Battle management

Battle management command centres generally require rapid access to and control of large, rapidly changing information databases. Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in a display format. Human machine interaction by voice has the potential to be very useful in these environments. A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments. In one feasibility study, speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications. Users were very optimistic about the potential of the system, although capabilities were limited.

Speech understanding programs sponsored by the Defense Advanced Research Projects Agency (DARPA) in the U.S. has focused on this problem of natural speech interface.. Speech recognition efforts have focused on a database of continuous speech recognition (CSR), large-vocabulary speech which is designed to be representative of the naval resource management task. Significant advances in the state-of-the-art in CSR have been achieved, and current efforts are focused on integrating speech recognition and natural language processing to allow spoken language interaction with a naval resource management system.

Training air traffic controllers

Training for military (or civilian) air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation. Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel. Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task.

The U.S. Naval Training Equipment Center has sponsored a number of developments of prototype ATC trainers using speech recognition. Generally, the recognition accuracy falls short of providing graceful interaction between the trainee and the system. However, the prototype training systems have demonstrated a significant potential for voice interaction in these systems, and in other training applications. The U.S. Navy has sponsored a large-scale effort in ATC training systems, where a commercial speech recognition unit was integrated with a complex training system including displays and scenario creation. Although the recognizer was constrained in vocabulary, one of the goals of the training programs was to teach the controllers to speak in a constrained language, using specific vocabulary specifically designed for the ATC task. Research in France has focused on the application of speech recognition in ATC training systems, directed at issues both in speech recognition and in application of task-domain grammar constraints.

The USAF, USMC, US Army, and FAA are currently using ATC simulators with speech recognition from a number of different vendors, including UFA, Inc., and Adacel Systems Inc (ASI). This software uses speech recognition and synthetic speech to enable the trainee to control aircraft and ground vehicles in the simulation without the need for pseudo pilots.

Another approach to ATC simulation with speech recognition has been created by Supremis. The Supremis system is not constrained by rigid grammars imposed by the underlying limitations of other recognition strategies.

Telephony and other domains

ASR in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread. Despite the high level of integration with word processing in general personal computing, however, ASR in the field of document production has not seen the expected increases in use.

The improvement of mobile processor speeds made feasible the speech-enabled Symbian and Windows Mobile Smartphones. Current speech-to-text programs are too large and require too much CPU power to be practical for the Pocket PC. Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands. Leading software vendors in this field are: Microsoft Corporation (Microsoft Voice Command); Nuance Communications (Nuance Voice Control); Vito Technology (VITO Voice2Go); Speereo Software (Speereo Voice Translator).

People with Disabilities

People with disabilities are another part of the population that benefit from using speech recognition programs. It is especially useful for people who have difficulty with or are unable to use their hands, from mild repetitive stress injuries to involved disabilities that require alternative input for support with accessing the computer. In fact, people who used the keyboard a lot and developed RSI became an urgent early market for speech recognition. Speech recognition is used in deaf telephony, such as spinvox voice-to-text voicemail, relay services, and captioned telephone. Individuals with learning disabilities who have problems with thought to paper communication (essentially they think of an idea but it is processed incorrectly causing it to end up differently on paper) can benefit from the software as it helps to overlap that weakness.

Further applications

Automatic translation
Automotive speech recognition (e.g., Ford Sync)
Telematics (e.g. vehicle Navigation Systems)
Court reporting (Realtime Voice Writing)
Hands-free computing: voice command recognition computer user interface
Mobile telephony, including mobile email
Pronunciation evaluation in computer-aided language learning applications
Video Games, possible expansion into the RTS genre following Tom Clancy's EndWar
Transcription (digital speech-to-text).
Speech-to-Text (Transcription of speech into mobile text messages)SpinVox

Performance of speech recognition systems

The performance of speech recognition systems is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of performance accuracy which is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).

Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions. There is some confusion, however, over the interchangeability of the terms "speech recognition" and "dictation".

Commercially available speaker-dependent dictation systems usually require only a short period of training (sometimes also called `enrollment') and may successfully capture continuous speech with a large vocabulary at normal pace with a very high accuracy. Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy if operated under optimal conditions. `Optimal conditions' usually assume that users:

have speech characteristics which match the training data,
can achieve proper speaker adaptation, and
work in a clean noise environment (e.g. quiet office or laboratory space).

This explains why some users, especially those whose speech is heavily accented, might achieve recognition rates much lower than expected. Speech recognition in video has become a popular search technology used by several video search companies.

Limited vocabulary systems, requiring no training, can recognize a small number of words (for instance, the ten digits) as spoken by most speakers. Such systems are popular for routing incoming phone calls to their destinations in large organizations.

Both acoustic modeling and language modeling are important parts of modern statistically-based speech recognition algorithms. Hidden Markov models (HMMs) are widely used in many systems. Language modeling has many other applications such as smart keyboard and document classification.

Hidden Markov model (HMM)-based speech recognition

Modern general-purpose speech recognition systems are generally based on Hidden Markov Models. These are statistical models which output a sequence of symbols or quantities. One possible reason why HMMs are used in speech recognition is that a speech signal could be viewed as a piecewise stationary signal or a short-time stationary signal. That is, one could assume in a short-time in the range of 10 milliseconds, speech could be approximated as a stationary process. Speech could thus be thought of as a Markov model for many stochastic processes.

Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use. In speech recognition, the hidden Markov model would output a sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window of speech and decorrelating the spectrum using a cosine transform, then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes.

Described above are the core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semitied covariance transform (also known as maximum likelihood linear transform, or MLLT). Many systems use so-called discriminative training techniques which dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. Examples are maximum mutual information (MMI), minimum classification error (MCE) and minimum phone error (MPE).

Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the Viterbi algorithm to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model which includes both the acoustic and language model information, or combining it statically beforehand (the finite state transducer, or FST, approach).

Dynamic time warping (DTW)-based speech recognition

Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach. Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another they were walking more quickly, or even if there were accelerations and decelerations during the course of one observation. DTW has been applied to video, audio, and graphics – indeed, any data which can be turned into a linear representation can be analyzed with DTW.

A well known application has been automatic speech recognition, to cope with different speaking speeds. In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions, i.e. the sequences are "warped" non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models.

Further information

Popular speech recognition conferences held each year or two include ICASSP, Eurospeech/ICSLP (now named Interspeech) and the IEEE ASRU. Conferences in the field of Natural Language Processing, such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing. Important journals include the IEEE Transactions on Speech and Audio Processing (now named IEEE Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication. Books like "Fundamentals of Speech Recognition" by Lawrence Rabiner can be useful to acquire basic knowledge but may not be fully up to date (1993). Another good source can be "Statistical Methods for Speech Recognition" by Frederick Jelinek and "Spoken Language Processing (2001)" by Xuedong Huang etc. Even more up to date is "Computer Speech", by Manfred R. Schroeder, second edition published in 2004. A good insight into the techniques used in the best modern systems can be gained by paying attention to government sponsored evaluations such as those organised by DARPA (the largest speech recognition-related project ongoing as of 2007 is the GALE project, which involves both speech recognition and translation components).

In terms of freely available resources, the HTK book (and the accompanying HTK toolkit) is one place to start to both learn about speech recognition and to start experimenting. Another such resource is Carnegie Mellon University's SPHINX toolkit. The AT&T libraries GRM library, and DCD library are also general software libraries for large-vocabulary speech recognition.

A useful review of the area of robustness in ASR is provided by Junqua and Haton (1995).

Commercial software/middleware

Apart from over-the-counter available dictation software, speech recognition is mostly embedded/integrated in other software or hardware, which is why even the main players are usually not known to the general public. Some of the major makers are (listed with the brand name of their proprietary speech recognition software/engine:

Voice on the Go: Voice on the Go available in 8 language versions
SpinVox: Coverts speech to text
Asahi Kasei: Vorero
IBM: WebSphere Voice Server
Loquendo
Microsoft: Microsoft Speech Server
Nuance: VoCon
VoiceBox (the VoiceBox system is built around a licenced VoCon engine from Nuance)
mScriber: Indian Language Speech Recognition Solutions
Vangard Voice: Voice-enable existing and new mobile applications

Computer vision

2009-03-09T08:40:00.000-07:00

Computer vision is the science and technology of machines that see. As a scientific discipline, computer vision is concerned with the theory for building artificial systems that obtain information from images. The image data can take many forms, such as a video sequence, views from multiple cameras, or multi-dimensional data from a medical scanner.

As a technological discipline, computer vision seeks to apply the theories and models of computer vision to the construction of computer vision systems. Examples of applications of computer vision systems include systems for:

Controlling processes (e.g. an industrial robot or an autonomous vehicle).
Detecting events (e.g. for visual surveillance or people counting).
Organizing information (e.g. for indexing databases of images and image sequences).
Modeling objects or environments (e.g. industrial inspection, medical image analysis or topographical modeling).
Interaction (e.g. as the input to a device for computer-human interaction).

Computer vision can also be described as a complement (but not necessarily the opposite) of biological vision. In biological vision, the visual perception of humans and various animals are studied, resulting in models of how these systems operate in terms of physiological processes. Computer vision, on the other hand, studies and describes artificial vision system that are implemented in software and/or hardware. Interdisciplinary exchange between biological and computer vision has proven increasingly fruitful for both fields.

Sub-domains of computer vision include scene reconstruction, event detection, tracking, object recognition, learning, indexing, motion estimation, and image restoration.

State of the art

The field of computer vision can be characterized as immature and diverse. Even though earlier work exists, it was not until the late 1970s that a more focused study of the field started when computers could manage the processing of large data sets such as images. However, these studies usually originated from various other fields, and consequently there is no standard formulation of "the computer vision problem." Also, and to an even larger extent, there is no standard formulation of how computer vision problems should be solved. Instead, there exists an abundance of methods for solving various well-defined computer vision tasks, where the methods often are very task specific and seldom can be generalized over a wide range of applications. Many of the methods and applications are still in the state of basic research, but more and more methods have found their way into commercial products, where they often constitute a part of a larger system which can solve complex tasks (e.g., in the area of medical images, or quality control and measurements in industrial processes). In most practical computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common.

Related fields

Relation between computer vision and various other fields

A significant part of artificial intelligence deals with autonomous planning or deliberation for systems which can perform mechanical actions such as moving a robot through some environment. This type of processing typically needs input data provided by a computer vision system, acting as a vision sensor and providing high-level information about the environment and the robot. Other parts which sometimes are described as belonging to artificial intelligence and which are used in relation to computer vision is pattern recognition and learning techniques. As a consequence, computer vision is sometimes seen as a part of the artificial intelligence field or the computer science field in general.

Physics is another field that is strongly related to computer vision. A significant part of computer vision deals with methods which require a thorough understanding of the process in which electromagnetic radiation, typically in the visible or the infra-red range, is reflected by the surfaces of objects and finally is measured by the image sensor to produce the image data. This process is based on optics and solid-state physics. More sophisticated image sensors even require quantum mechanics to provide a complete comprehension of the image formation process. Also, various measurement problems in physics can be addressed using computer vision, for example motion in fluids. Consequently, computer vision can also be seen as an extension of physics.

A third field which plays an important role is neurobiology, specifically the study of the biological vision system. Over the last century, there has been an extensive study of eyes, neurons, and the brain structures devoted to processing of visual stimuli in both humans and various animals. This has led to a coarse, yet complicated, description of how "real" vision systems operate in order to solve certain vision related tasks. These results have led to a subfield within computer vision where artificial systems are designed to mimic the processing and behaviour of biological systems, at different levels of complexity. Also, some of the learning-based methods developed within computer vision have their background in biology.

Yet another field related to computer vision is signal processing. Many methods for processing of one-variable signals, typically temporal signals, can be extended in a natural way to processing of two-variable signals or multi-variable signals in computer vision. However, because of the specific nature of images there are many methods developed within computer vision which have no counterpart in the processing of one-variable signals. A distinct character of these methods is the fact that they are non-linear which, together with the multi-dimensionality of the signal, defines a subfield in signal processing as a part of computer vision.

Beside the above mentioned views on computer vision, many of the related research topics can also be studied from a purely mathematical point of view. For example, many methods in computer vision are based on statistics, optimization or geometry. Finally, a significant part of the field is devoted to the implementation aspect of computer vision; how existing methods can be realized in various combinations of software and hardware, or how these methods can be modified in order to gain processing speed without losing too much performance.

The fields, most closely related to computer vision, are image processing, image analysis, robot vision and machine vision. There is a significant overlap in terms of what techniques and applications they cover. This implies that the basic techniques that are used and developed in these fields are more or less identical, something which can be interpreted as there is only one field with different names. On the other hand, it appears to be necessary for research groups, scientific journals, conferences and companies to present or market themselves as belonging specifically to one of these fields and, hence, various characterizations which distinguish each of the fields from the others have been presented.

The following characterizations appear relevant but should not be taken as universally accepted:

Image processing and image analysis tend to focus on 2D images, how to transform one image to another, e.g., by pixel-wise operations such as contrast enhancement, local operations such as edge extraction or noise removal, or geometrical transformations such as rotating the image. This characterization implies that image processing/analysis neither require assumptions nor produce interpretations about the image content.

Computer vision tends to focus on the 3D scene projected onto one or several images, e.g., how to reconstruct structure or other information about the 3D scene from one or several images. Computer vision often relies on more or less complex assumptions about the scene depicted in an image.

Machine vision tends to focus on applications, mainly in industry, e.g., vision based autonomous robots and systems for vision based inspection or measurement. This implies that image sensor technologies and control theory often are integrated with the processing of image data to control a robot and that real-time processing is emphasized by means of efficient implementations in hardware and software. It also implies that the external conditions such as lighting can be and are often more controlled in machine vision than they are in general computer vision, which can enable the use of different algorithms.

There is also a field called imaging which primarily focus on the process of producing images, but sometimes also deals with processing and analysis of images. For example, medical imaging contains lots of work on the analysis of image data in medical applications.

Finally, pattern recognition is a field which uses various methods to extract information from signals in general, mainly based on statistical approaches. A significant part of this field is devoted to applying these methods to image data.

A consequence of this state of affairs is that you can be working in a lab related to one of these fields, apply methods from a second field to solve a problem in a third field and present the result at a conference related to a fourth field.

Applications for computer vision

One of the most prominent application fields is medical computer vision or medical image processing. This area is characterized by the extraction of information from image data for the purpose of making a medical diagnosis of a patient. Generally, image data is in the form of microscopy images, X-ray images, angiography images, ultrasonic images, and tomography images. An example of information which can be extracted from such image data is detection of tumours, arteriosclerosis or other malign changes. It can also be measurements of organ dimensions, blood flow, etc. This application area also supports medical research by providing new information, e.g., about the structure of the brain, or about the quality of medical treatments.

A second application area in computer vision is in industry. Here, information is extracted for the purpose of supporting a manufacturing process. One example is quality control where details or final products are being automatically inspected in order to find defects. Another example is measurement of position and orientation of details to be picked up by a robot arm.

Military applications are probably one of the largest areas for computer vision. The obvious examples are detection of enemy soldiers or vehicles and missile guidance. More advanced systems for missile guidance send the missile to an area rather than a specific target, and target selection is made when the missile reaches the area based on locally acquired image data. Modern military concepts, such as "battlefield awareness", imply that various sensors, including image sensors, provide a rich set of information about a combat scene which can be used to support strategic decisions. In this case, automatic processing of the data is used to reduce complexity and to fuse information from multiple sensors to increase reliability.

Artist's Concept of Rover on Mars, an example of an unmanned land-based vehicle. Notice the stereo cameras mounted on top of the Rover. (credit: Maas Digital LLC)

One of the newer application areas is autonomous vehicles, which include submersibles, land-based vehicles (small robots with wheels, cars or trucks), aerial vehicles, and unmanned aerial vehicles (UAV). The level of autonomy ranges from fully autonomous (unmanned) vehicles to vehicles where computer vision based systems support a driver or a pilot in various situations. Fully autonomous vehicles typically use computer vision for navigation, i.e. for knowing where it is, or for producing a map of its environment (SLAM) and for detecting obstacles. It can also be used for detecting certain task specific events, e. g., a UAV looking for forest fires. Examples of supporting systems are obstacle warning systems in cars, and systems for autonomous landing of aircraft. Several car manufacturers have demonstrated systems for autonomous driving of cars, but this technology has still not reached a level where it can be put on the market. There are ample examples of military autonomous vehicles ranging from advanced missiles, to UAVs for recon missions or missile guidance. Space exploration is already being made with autonomous vehicles using computer vision, e. g., NASA's Mars Exploration Rover.

Other application areas include:

Support of visual effects creation for cinema and broadcast, e.g., camera tracking (matchmoving).
Surveillance.

Typical tasks of computer vision

Each of the application areas described above employ a range of computer vision tasks; more or less well-defined measurement problems or processing problems, which can be solved using a variety of methods. Some examples of typical computer vision tasks are presented below.

Recognition

The classical problem in computer vision, image processing and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. This task can normally be solved robustly and without effort by a human, but is still not satisfactorily solved in computer vision for the general case: arbitrary objects in arbitrary situations. The existing methods for dealing with this problem can at best solve it only for specific objects, such as simple geometric objects (e.g., polyhedrons), human faces, printed or hand-written characters, or vehicles, and in specific situations, typically described in terms of well-defined illumination, background, and pose of the object relative to the camera.

Different varieties of the recognition problem are described in the literature:

Recognition: one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene.
Identification: An individual instance of an object is recognized. Examples: identification of a specific person's face or fingerprint, or identification of a specific vehicle.
Detection: the image data is scanned for a specific condition. Examples: detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.

Several specialized tasks based on recognition exist, such as:

Content-based image retrieval: finding all images in a larger set of images which have a specific content. The content can be specified in different ways, for example in terms of similarity relative a target image (give me all images similar to image X), or in terms of high-level search criteria given as text input (give me all images which contains many houses, are taken during winter, and have no cars in them).
Pose estimation: estimating the position or orientation of a specific object relative to the camera. An example application for this technique would be assisting a robot arm in retrieving objects from a conveyor belt in an assembly line situation.
Optical character recognition (or OCR): identifying characters in images of printed or handwritten text, usually with a view to encoding the text in a format more amenable to editing or indexing (e.g. ASCII).

Motion

Several tasks relate to motion estimation, in which an image sequence is processed to produce an estimate of the velocity either at each points in the image or in the 3D scene. Examples of such tasks are:

Egomotion: determining the 3D rigid motion of the camera.
Tracking: following the movements of objects (e.g. vehicles or humans).

Scene reconstruction

Given one or (typically) more images of a scene, or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. More sophisticated methods produce a complete 3D surface model.

Image restoration

The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images. The simplest possible approach for noise removal is various types of filters such as low-pass filters or median filters. More sophisticated methods assume a model of how the local image structures look like, a model which distinguishes them from the noise. By first analysing the image data in terms of the local image structures, such as lines or edges, and then controlling the filtering based on local information from the analysis step, a better level of noise removal is usually obtained compared to the simpler approaches.

Computer vision systems

The organization of a computer vision system is highly application dependent. Some systems are stand-alone applications which solve a specific measurement or detection problem, while other constitute a sub-system of a larger design which, for example, also contains sub-systems for control of mechanical actuators, planning, information databases, man-machine interfaces, etc. The specific implementation of a computer vision system also depends on if its functionality is pre-specified or if some part of it can be learned or modified during operation. There are, however, typical functions which are found in many computer vision systems.

Image acquisition: A digital image is produced by one or several image sensor which, besides various types of light-sensitive cameras, includes range sensors, tomography devices, radar, ultra-sonic cameras, etc. Depending on the type of sensor, the resulting image data is an ordinary 2D image, a 3D volume, or an image sequence. The pixel values typically correspond to light intensity in one or several spectral bands (gray images or colour images), but can also be related to various physical measures, such as depth, absorption or reflectance of sonic or electromagnetic waves, or nuclear magnetic resonance.
Pre-processing: Before a computer vision method can be applied to image data in order to extract some specific piece of information, it is usually necessary to process the data in order to assure that it satisfies certain assumptions implied by the method. Examples are
- Re-sampling in order to assure that the image coordinate system is correct.
- Noise reduction in order to assure that sensor noise does not introduce false information.
- Contrast enhancement to assure that relevant information can be detected.
- Scale-space representation to enhance image structures at locally appropriate scales.
Feature extraction: Image features at various levels of complexity are extracted from the image data. Typical examples of such features are
- Lines, edges and ridges.
- Localized interest points such as corners, blobs or points.

More complex features may be related to texture, shape or motion.

Detection/Segmentation: At some point in the processing a decision is made about which image points or regions of the image are relevant for further processing. Examples are
- Selection of a specific set of interest points
- Segmentation of one or multiple image regions which contain a specific object of interest.
High-level processing: At this step the input is typically a small set of data, for example a set of points or an image region which is assumed to contain a specific object. The remaining processing deals with, for example:
- Verification that the data satisfy model-based and application specific assumptions.
- Estimation of application specific parameters, such as object pose or object size.
- Classifying a detected object into different categories.

Machine perception

2009-03-09T08:38:00.000-07:00

In computing, machine perception is the ability of computing machines to sense and interpret images, sounds, or other contents of their environments, or of the contents of stored media.

Real-time perception of a machine's environment is useful in industrial processes, such as assembly, inspection, diagnosis, vehicle guidance, etc. Off-line perception of stored media is useful in medical and aerial photo interpretation, content-based indexing and retrieval of movies and images, etc.

Machine perception includes:

Natural language processing

2009-03-09T08:31:00.000-07:00

Natural language processing (NLP) is a field of computer science concerned with the interactions between computers and human (natural) languages. Natural language generation systems convert information from computer databases into readable human language. Natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate. Many problems within NLP apply to both generation and understanding; for example, a computer must be able to model morphology (the structure of words) in order to understand an English sentence, but a model of morphology is also needed for producing a grammatically correct English sentence.

NLP has significant overlap with the field of computational linguistics, and is often considered a sub-field of artificial intelligence. The term natural language is used to distinguish human languages (such as Spanish, Swahili or Swedish) from formal or computer languages (such as C++, Java or LISP). Although NLP may encompass both text and speech, work on speech processing has evolved into a separate field.

Tasks and limitations

In theory, natural-language processing is a very attractive method of human-computer interaction. Early systems such as SHRDLU, working in restricted "blocks worlds" with restricted vocabularies, worked extremely well, leading researchers to excessive optimism, which was soon lost when the systems were extended to more realistic situations with real-world ambiguity and complexity.

Natural-language understanding is sometimes referred to as an AI-complete problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it. The definition of "understanding" is one of the major problems in natural-language processing.

Subproblems

: In most spoken languages, the sounds representing successive letters blend into each other, so the conversion of the analog signal to discrete characters can be a very difficult process. Also, in natural speech there are hardly any pauses between successive words; the location of those boundaries usually must take into account grammatical and semantic constraints, as well as the context.

: Some written languages like Chinese, Japanese and Thai do not have single-word boundaries either, so any significant text parsing usually requires the identification of word boundaries, which is often a non-trivial task.

: Many words have more than one meaning; we have to select the meaning which makes the most sense in context.

: The grammar for natural languages is ambiguous, i.e. there are often multiple possible parse trees for a given sentence. Choosing the most appropriate one usually requires semantic and contextual information. Specific problem components of syntactic ambiguity include sentence boundary disambiguation.

Imperfect or irregular input: Foreign or regional accents and vocal impediments in speech; typing or grammatical errors, OCR errors in texts.

Speech acts and plans: A sentence can often be considered an action by the speaker. The sentence structure alone may not contain enough information to define this action. For instance, a question is actually the speaker requesting some sort of response from the listener. The desired response may be verbal, physical, or some combination. For example, "Can you pass the class?" is a request for a simple yes-or-no answer, while "Can you pass the salt?" is requesting a physical action to be performed. It is not appropriate to respond with "Yes, I can pass the salt," without the accompanying action (although "No" or "I can't reach the salt" would explain a lack of action).

Statistical NLP

Statistical natural-language processing uses stochastic, probabilistic and statistical methods to resolve some of the difficulties discussed above, especially those which arise because longer sentences are highly ambiguous when processed with realistic grammars, yielding thousands or millions of possible analyses. Methods for disambiguation often involve the use of corpora and Markov models. Statistical NLP comprises all quantitative approaches to automated language processing, including probabilistic modeling, information theory, and linear algebra. The technology for statistical NLP comes mainly from machine learning and data mining, both of which are fields of artificial intelligence that involve learning from data.

Major tasks in NLP

Automatic summarization -
Foreign language reading aid
Foreign language writing aid
Information extraction
Information retrieval (IR) - IR is concerned with storing, searching and retrieving information. It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.
Machine translation - Automatically translating from one human language to another.
Named entity recognition (NER) - Given a stream of text, determining which items in the text map to proper names, such as people or places. Although in English, named entities are marked with capitalized words, many other languages do not use capitalization to distinguish named entities.
Natural language generation
Natural language understanding
Optical character recognition
anaphora resolution
Question answering - Given a human language question, the task of producing a human-language answer. The question may be a closed-ended (such as "What is the capital of Canada?") or open-ended (such as "What is the meaning of life?").
Speech recognition - Given a sound clip of a person or people speaking, the task of producing a text dictation of the speaker(s). (The opposite of text to speech.)
Spoken dialogue system
Text simplification
Text-to-speech
Text-proofing

Concrete problems

Some examples of the problems faced by natural-language-understanding systems:

The sentences "We gave the monkeys the bananas because they were hungry" and "We gave the monkeys the bananas because they were over-ripe" have the same surface grammatical structure. However, the pronoun they refers to monkeys in one sentence and bananas in the other, and it is impossible to tell which without a knowledge of the properties of monkeys and bananas.
A string of words may be interpreted in different ways. For example, the string "Time flies like an arrow" may be interpreted in a variety of ways:
- The common simile: time moves quickly just like an arrow does;
- measure the speed of flies like you would measure that of an arrow (thus interpreted as an imperative) - i.e. (You should) time flies as you would (time) an arrow.;
- measure the speed of flies like an arrow would - i.e. Time flies in the same way that an arrow would (time them).;
- measure the speed of flies that are like arrows - i.e. Time those flies that are like arrows;
- all of a type of flying insect, "time-flies," collectively enjoys a single arrow (compare Fruit flies like a banana);
- each of a type of flying insect, "time-flies," individually enjoys a different arrow (similar comparison applies);
- A concrete object, for example the magazine, Time, travels through the air in an arrow-like manner.

English is particularly challenging in this regard because it has little inflectional morphology to distinguish between parts of speech.

English and several other languages don't specify which word an adjective applies to. For example, in the string "pretty little girls' school".
- Does the school look little?
- Do the girls look little?
- Do the girls look pretty?
- Does the school look pretty?

We will often imply additional information in spoken language by the way we place stress on words. The sentence "I never said she stole my money" demonstrates the importance stress can play in a sentence, and thus the inherent difficulty a natural language processor can have in parsing it. Depending on which word the speaker places the stress, this sentence could have several distinct meanings:
- "I never said she stole my money" - Someone else said it, but I didn't.
- "I never said she stole my money" - I simply didn't ever say it.
- "I never said she stole my money" - I might have implied it in some way, but I never explicitly said it.
- "I never said she stole my money" - I said someone took it; I didn't say it was she.
- "I never said she stole my money" - I just said she probably borrowed it.
- "I never said she stole my money" - I said she stole someone else's money.
- "I never said she stole my money" - I said she stole something, but not my money.

Evaluation of natural language processing

Objectives

The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system, in order to determine whether (or to what extent) the system answers the goals of its designers, or meets the needs of its users. Research in NLP evaluation has received considerable attention, because the definition of proper evaluation criteria is one way to specify precisely an NLP problem, going thus beyond the vagueness of tasks defined only as language understanding or language generation. A precise set of evaluation criteria, which includes mainly evaluation data and evaluation metrics, enables several teams to compare their solutions to a given NLP problem.

Short history of evaluation in NLP

The first evaluation campaign on written texts seems to be a campaign dedicated to message understanding in 1987 (Pallet 1998). Then, the Parseval/GEIG project compared phrase-structure grammars (Black 1991). A series of campaigns within Tipster project were realized on tasks like summarization, translation and searching (Hirshman 1998). In 1994, in Germany, the Morpholympics compared German taggers. Then, the Senseval and Romanseval campaigns were conducted with the objectives of semantic disambiguation. In 1996, the Sparkle campaign compared syntactic parsers in four different languages (English, French, German and Italian). In France, the Grace project compared a set of 21 taggers for French in 1997 (Adda 1999). In 2004, during the Technolangue/Easy project, 13 parsers for French were compared. Large-scale evaluation of dependency parsers were performed in the context of the CoNLL shared tasks in 2006 and 2007. In Italy, the evalita campaign was conducted in 2007 to compare various tools for Italian evalita web site. In France, within the ANR-Passage project (end of 2007), 10 parsers for French were compared passage web site.

Adda G., Mariani J., Paroubek P., Rajman M. 1999 L'action GRACE d'évaluation de l'assignation des parties du discours pour le français. Langues vol-2
Black E., Abney S., Flickinger D., Gdaniec C., Grishman R., Harrison P., Hindle D., Ingria R., Jelinek F., Klavans J., Liberman M., Marcus M., Reukos S., Santoni B., Strzalkowski T. 1991 A procedure for quantitatively comparing the syntactic coverage of English grammars. DARPA Speech and Natural Language Workshop
Hirshman L. 1998 Language understanding evaluation: lessons learned from MUC and ATIS. LREC Granada
Pallet D.S. 1998 The NIST role in automatic speech recognition benchmark tests. LREC Granada

Different types of evaluation

Depending on the evaluation procedures, a number of distinctions are traditionally made in NLP evaluation.

Intrinsic vs. extrinsic evaluation

Intrinsic evaluation considers an isolated NLP system and characterizes its performance mainly with respect to a gold standard result, pre-defined by the evaluators. Extrinsic evaluation, also called evaluation in use considers the NLP system in a more complex setting, either as an embedded system or serving a precise function for a human user. The extrinsic performance of the system is then characterized in terms of its utility with respect to the overall task of the complex system or the human user. For example, consider a syntactic parser that is based on the output of some new part of speech (POS) tagger. An intrinsic evaluation would run the POS tagger on some labelled data, and compare the system output of the POS tagger to the gold standard (correct) output. An extrinsic evaluation would run the parser with some other POS tagger, and then with the new POS tagger, and compare the parsing accuracy.

Black-box vs. glass-box evaluation

Black-box evaluation requires one to run an NLP system on a given data set and to measure a number of parameters related to the quality of the process (speed, reliability, resource consumption) and, most importantly, to the quality of the result (e.g. the accuracy of data annotation or the fidelity of a translation). Glass-box evaluation looks at the design of the system, the algorithms that are implemented, the linguistic resources it uses (e.g. vocabulary size), etc. Given the complexity of NLP problems, it is often difficult to predict performance only on the basis of glass-box evaluation, but this type of evaluation is more informative with respect to error analysis or future developments of a system.

Automatic vs. manual evaluation

In many cases, automatic procedures can be defined to evaluate an NLP system by comparing its output with the gold standard (or desired) one. Although the cost of producing the gold standard can be quite high, automatic evaluation can be repeated as often as needed without much additional costs (on the same input data). However, for many NLP problems, the definition of a gold standard is a complex task, and can prove impossible when inter-annotator agreement is insufficient. Manual evaluation is performed by human judges, which are instructed to estimate the quality of a system, or most often of a sample of its output, based on a number of criteria. Although, thanks to their linguistic competence, human judges can be considered as the reference for a number of language processing tasks, there is also considerable variation across their ratings. This is why automatic evaluation is sometimes referred to as objective evaluation, while the human kind appears to be more subjective.

Shared tasks (Campaigns)

BioCreative
Message Understanding Conference
Technolangue/Easy
Text Retrieval Conference

Standardization in NLP

An ISO sub-committee is working in order to ease interoperability between Lexical resources and NLP programs. The sub-committee is part of ISO/TC37 and is called ISO/TC37/SC4. Some ISO standards are already published but most of them are under construction, mainly on lexicon representation (see LMF), annotation and data category registry.

Journals

Linguistic Issues in Language Technology

Organizations and conferences

Associations

AFNLP - Asian Federation of Natural Language Processing Associations
Australasian Language Technology Association (ALTA)

Conferences

Software tools

Natural Language Toolkit (NLTK): a Python library suite
NLP Software Packages - Free software packages for NLP research, including a Semantic Role Labeler, Named Entity Tagger, Coreference Resolution, and more! This also the home of Learning-Based Java (Machine Learning Framework) and Sparse Network of Winnows (Learning Architecture).

Machine learning

2009-03-06T07:46:00.000-08:00

Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases. A major focus of machine learning research is to automatically produce (induce) models, such as rules and patterns, from data. Hence, machine learning is closely related to fields such as data mining, statistics, inductive reasoning, pattern recognition, and theoretical computer science.

Applications

Applications for machine learning include natural language processing, syntactic pattern recognition, search engines, medical diagnosis, bioinformatics, brain-machine interfaces and cheminformatics, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, object recognition in computer vision, game playing, software engineering and robot locomotion.

Human interaction

Some machine learning systems attempt to eliminate the need for human intuition in data analysis, while others adopt a collaborative approach between human and machine. But human intuition cannot be entirely eliminated, since the system's designer must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data. Machine learning can be viewed as an attempt to automate parts of the scientific method.

Some statistical machine learning researchers create methods within the framework of Bayesian statistics.

Algorithm types

Machine learning algorithms are organized into a taxonomy, based on the desired outcome of the algorithm. Common algorithm types include:

Supervised learning — in which the algorithm generates a function that maps inputs to desired outputs. One standard formulation of the supervised learning task is the classification problem: the learner is required to learn (to approximate) the behavior of a function which maps a vector into one of several classes by looking at several input-output examples of the function.
Unsupervised learning — An agent which models a set of inputs: labelled examples are not available.
Semi-supervised learning — which combines both labeled and unlabeled examples to generate an appropriate function or classifier.
Reinforcement learning — in which the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm.
Transduction — similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and test inputs which are available while training.
Learning to learn — in which the algorithm learns its own inductive bias based on previous experience.

Theory

The computational analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory. Because training sets are finite and the future is uncertain, learning theory usually does not yield absolute guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common.

In addition to performance bounds, computational learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity results. Positive results show that a certain class of functions can be learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.

Automated planning and scheduling

2009-03-06T07:45:00.000-08:00

Automated planning and scheduling is a branch of artificial intelligence that concerns the realisation of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. Unlike classical control and classification problems, the solutions are complex, unknown and have to be discovered and optimised in multidimensional space.

In known environments with available models planning can be done offline. Solutions can be found and evaluated prior to execution. In dynamically unknown environments the strategy often needs to be revised online. Models and policies need to be adapted. Solutions usually resort to iterative trial and error processes commonly seen in artificial intelligence. These include dynamic programming, reinforcement learning and combinatorial optimization.

A typical planner takes three inputs: a description of the initial state of the world, a description of the desired goal, and a set of possible actions, all encoded in a formal language such as STRIPS. The planner produces a sequence of actions that lead from the initial state to a state meeting the goal. An alternative language for describing planning problems is that of hierarchical task networks, in which a set of tasks is given, and each task can be either realized by a primitive action or decomposed in a set of other tasks.

The difficulty of planning is dependent on the simplifying assumptions employed, e.g. atomic time, deterministic time, complete observability, etc. Classical planners make all these assumptions and have been studied most fully. Some popular techniques include: forward chaining and backward chaining state-space search, possibly enhanced by the use of relationships among conditions (see graphplan) or heuristics synthesized from the problem, search through plan space, and translation to propositional satisfiability (satplan).

If the assumption of determinism is dropped and a probabilistic model of uncertainty is adopted, then this leads to the problem of policy generation for a Markov decision process (MDP) or (in the general case) Partially observable Markov decision process (POMDP).

Examples

The Hubble Space Telescope uses a short-term system called SPSS and a long-term planning system called Spike.

Commonsense knowledge base

2009-03-06T07:42:00.000-08:00

In artificial intelligence research, commonsense knowledge is the collection of facts and information that an ordinary person is expected to know. The commonsense knowledge problem is the ongoing project in the field of knowledge representation (a sub-field of artificial intelligence) to create a commonsense knowledge base: a database containing all the general knowledge that most people possess, represented in a way that so that it is available to artificial intelligence programs that use natural language or make inferences about the ordinary world. Such a database is a type of ontology of which the most general are called upper ontologies.

The problem is considered to be among the hardest in all of AI research because the breadth and detail of commonsense knowledge is enormous. Any task that requires commonsense knowledge is considered AI-complete: to be done as well as a human being does it, it requires the machine to appear as intelligent as a human being. These tasks include machine translation, object recognition, text mining and many others. To do these tasks perfectly, the machine simply has to know what the text is talking about or what objects it may be looking at, and this is impossible in general unless the machine is familiar with all the same concepts that an ordinary person is familiar with.

Information in a commonsense knowledge base may include, but is not limited to, the following:

An ontology of classes and individuals
Parts and materials of objects
Properties of objects (such as color and size)
Functions and uses of objects
Locations of objects and layouts of locations
Locations of actions and events
Durations of actions and events
Preconditions of actions and events
Effects (postconditions) of actions and events
Subjects and objects of actions
Behaviors of devices
Stereotypical situations or scripts
Human goals and needs
Emotions
Plans and strategies
Story themes
Contexts

Commonsense knowledge bases

Cyc
Open Mind Common Sense
ThoughtTreasure
WordNet
Basic Formal Ontology (BFO)
DOLCE and DnS
General Formal Ontology
Suggested Upper Merged Ontology
ConceptNet ,Mindpixel

Knowledge representation

2009-03-06T07:39:00.000-08:00

Knowledge representation is an area in artificial intelligence that is concerned with how to formally "think", that is, how to use a symbol system to represent "a domain of discourse" - that which can be talked about, along with functions that may or may not be within the domain of discourse that allow inference (formalized reasoning) about the objects within the domain of discourse to occur. Generally speaking, some kind of logic is used both to supply a formal semantics of how reasoning functions apply to symbols in the domain of discourse, as well as to supply (depending on the particulars of the logic), operators such as quantifiers, modal operators, etc. that along with an interpretation theory, give meaning to the sentences in the logic.

When we design a knowledge representation (and a knowledge representation system to interpret sentences in the logic in order to derive inferences from them) we have to make trades across a number of design spaces, described in the following sections. The single most important decision to be made, however is the expressivity of the KR. The more expressive, the easier (and more compact) it is to "say something". However, more expressive languages are harder to automatically derive inferences from. An example of a less expressive KR would be propositional logic. An example of a more expressive KR would be autoepistemic temporal modal logic. Less expressive KRs may be both complete and consistent (formally less expressive than set theory). More expressive KRs may be neither complete nor consistent.

The key problem is to find a KR (and a supporting reasoning system) that can make the inferences your application needs in time, that is, within the resource constraints appropriate to the problem at hand. This tension between the kinds of inferences an application "needs" and what counts as "in time" along with the cost to generate the representation itself makes knowledge representation engineering interesting.

Overview

There are representation techniques such as frames, rules and semantic networks which have originated from theories of human information processing. Since knowledge is used to achieve intelligent behavior, the fundamental goal of knowledge representation is to represent knowledge in a manner as to facilitate inferencing (i.e. drawing conclusions) from knowledge.

Some issues that arise in knowledge representation from an AI perspective are:

How do people represent knowledge?
What is the nature of knowledge and how do we represent it?
Should a representation scheme deal with a particular domain or should it be general purpose?
How expressive is a representation scheme or formal language?
Should the scheme be declarative or procedural?

There has been very little top-down discussion of the knowledge representation (KR) issues and research in this area is a well aged quiltwork. There are well known problems such as "spreading activation" (this is a problem in navigating a network of nodes), "subsumption" (this is concerned with selective inheritance; e.g. an ATV can be thought of as a specialization of a car but it inherits only particular characteristics) and "classification." For example a tomato could be classified both as a fruit and a vegetable.

In the field of artificial intelligence, problem solving can be simplified by an appropriate choice of knowledge representation. Representing knowledge in some ways makes certain problems easier to solve. For example, it is easier to divide numbers represented in Hindu-Arabic numerals than numbers represented as Roman numerals.

History of knowledge representation

In computer science, particularly artificial intelligence, a number of representations have been devised to structure information.

KR is most commonly used to refer to representations intended for processing by modern computers, and in particular, for representations consisting of explicit objects (the class of all elephants, or Clyde a certain individual), and of assertions or claims about them ('Clyde is an elephant', or 'all elephants are grey'). Representing knowledge in such explicit form enables computers to draw conclusions from knowledge already stored ('Clyde is grey').

Many KR methods were tried in the 1970s and early 1980s, such as heuristic question-answering, neural networks, theorem proving, and expert systems, with varying success. Medical diagnosis (e.g., Mycin) was a major application area, as were games such as chess.

In the 1980s formal computer knowledge representation languages and systems arose. Major projects attempted to encode wide bodies of general knowledge; for example the "Cyc" project (still ongoing) went through a large encyclopedia, encoding not the information itself, but the information a reader would need in order to understand the encyclopedia: naive physics; notions of time, causality, motivation; commonplace objects and classes of objects.

Through such work, the difficulty of KR came to be better appreciated. In computational linguistics, meanwhile, much larger databases of language information were being built, and these, along with great increases in computer speed and capacity, made deeper KR more feasible.

Several programming languages have been developed that are oriented to KR. Prolog developed in 1972, but popularized much later, represents propositions and basic logic, and can derive conclusions from known premises. KL-ONE (1980s) is more specifically aimed at knowledge representation itself. In 1995, the Dublin Core standard of metadata was conceived.

In the electronic document world, languages were being developed to represent the structure of documents, such as SGML (from which HTML descended) and later XML. These facilitated information retrieval and data mining efforts, which have in recent years begun to relate to knowledge representation.

Development of the Semantic Web, has included development of XML-based knowledge representation languages and standards, including RDF, RDF Schema, Topic Maps, DARPA Agent Markup Language (DAML), Ontology Inference Layer (OIL), and Web Ontology Language (OWL).

Topics in Knowledge representation

Language and notation

Some people think it would be best to represent knowledge in the same way that it is represented in the human mind, or to represent knowledge in the form of human language.

Psycholinguistics is investigating how the human mind stores and manipulates language. Other branches of cognitive science examine how human memory stores sounds, sights, smells, emotions, procedures, and abstract ideas. Science has not yet completely described the internal mechanisms of the brain to the point where they can simply be replicated by computer programmers.

Various artificial languages and notations have been proposed for representing knowledge. They are typically based on logic and mathematics, and have easily parsed grammars to ease machine processing. They usually fall into the broad domain of ontologies.

Ontology languages

After CycL, a number of ontology languages have been developed. Most are declarative languages, and are either frame languages, or are based on first-order logic. Most of these languages only define an upper ontology with generic concepts, whereas the domain concepts are not part of the language definition. Gellish English is an example of an ontological language that includes a full engineering English Dictionary.

Links and structures

While hyperlinks have come into widespread use, the closely related semantic link is not yet widely used. The mathematical table has been used since Babylonian times. More recently, these tables have been used to represent the outcomes of logic operations, such as truth tables, which were used to study and model Boolean logic, for example. Spreadsheets are yet another tabular representation of knowledge. Other knowledge representations are trees, by means of which the connections among fundamental concepts and derivative concepts can be shown.

Visual representations are relatively new in the field of knowledge management but give the user a way to visualise how one thought or idea is connected to other ideas enabling the possibility of moving from one thought to another in order to locate required information. The approach is not without its competitors.

Notation

The recent fashion in knowledge representation languages is to use XML as the low-level syntax. This tends to make the output of these KR languages easy for machines to parse, at the expense of human readability and often space-efficiency.

First-order predicate calculus is commonly used as a mathematical basis for these systems, to avoid excessive complexity. However, even simple systems based on this simple logic can be used to represent data that is well beyond the processing capability of current computer systems: see computability for reasons.

Examples of notations:

DATR is an example for representing lexical knowledge
RDF is a simple notation for representing relationships between and among objects

Storage and manipulation

One problem in knowledge representation consists of how to store and manipulate knowledge in an information system in a formal way so that it may be used by mechanisms to accomplish a given task. Examples of applications are expert systems, machine translation systems, computer-aided maintenance systems and information retrieval systems (including database front-ends).

Semantic networks may be used to represent knowledge. Each node represents a concept and arcs are used to define relations between the concepts. One of the most expressive and comprehensively described knowledge representation paradigms along the lines of semantic networks is MultiNet (an acronym for Multilayered Extended Semantic Networks).

From the 1960s, the knowledge frame or just frame has been used. Each frame has its own name and a set of attributes, or slots which contain values; for instance, the frame for house might contain a color slot, number of floors slot, etc.

Using frames for expert systems is an application of object-oriented programming, with inheritance of features described by the "is-a" link. However, there has been no small amount of inconsistency in the usage of the "is-a" link: Ronald J. Brachman wrote a paper titled "What IS-A is and isn't", wherein 29 different semantics were found in projects whose knowledge representation schemes involved an "is-a" link. Other links include the "has-part" link.

Frame structures are well-suited for the representation of schematic knowledge and stereotypical cognitive patterns. The elements of such schematic patterns are weighted unequally, attributing higher weights to the more typical elements of a schema. A pattern is activated by certain expectations: If a person sees a big bird, he or she will classify it rather as a sea eagle than a golden eagle, assuming that his or her "sea-scheme" is currently activated and his "land-scheme" is not.

Frame representations are object-centered in the same sense as semantic networks are: All the facts and properties connected with a concept are located in one place - there is no need for costly search processes in the database.

A behavioral script is a type of frame that describes what happens temporally; the usual example given is that of describing going to a restaurant. The steps include waiting to be seated, receiving a menu, ordering, etc. The different solutions can be arranged in a so-called semantic spectrum with respect to their semantic expressivity.

Philosophy of artificial intelligence

2009-03-06T07:29:00.000-08:00

The philosophy of artificial intelligence considers the relationship between machines and thought and attempts to answer such question as:

Can a machine act intelligently? Can it solve any problem that a person would solve by thinking?
Can a machine have a mind, mental states and consciousness in the same sense humans do? Can it feel?
Are human intelligence and machine intelligence the same? Is the human brain essentially a computer?

These three questions reflect the divergent interests of AI researchers, philosophers and cognitive scientists respectively. The answers to these questions depend on how one defines "intelligence" or "consciousness" and exactly which "machines" are under discussion.

Important propositions in the philosophy of AI include:

Turing's "polite convention": If a machine acts as intelligently as a human being, then it is as intelligent as a human being.
The Dartmouth proposal: "Every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it."
Newell and Simon's physical symbol system hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action."
Searle's strong AI hypothesis: "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds."
Hobbes' mechanism: "Reason is nothing but reckoning."

Can a machine display general intelligence?

Is it possible to create a machine that can solve all the problems humans solve using their intelligence? This is the question that AI researchers are most interested in answering. It defines the scope of what machines will be able to do in the future and guides the direction of AI research. It only concerns the behavior of machines and ignores the issues of interest to psychologists, cognitive scientists and philosophers; to answer this question, it doesn't matter whether a machine is really thinking (as a person thinks) or is just acting like it is thinking.

The basic position of most AI researchers is summed up in this statement, which appeared in the proposal for the Dartmouth Conferences of 1956:

Every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it.

Arguments against the basic premise must show that building a working AI system is impossible, because there is some practical limit to the abilities of computers or that there is some special quality of the human mind that is necessary for thinking and yet can't be duplicated by a machine (or by the methods of current AI research). Arguments in favor of the basic premise must show that such a system is possible.

The first step to answering the question is to clearly define "intelligence."

Intelligence

Turing test

Alan Turing, in a famous and seminal 1950 paper, reduced the problem of defining intelligence to a simple question about conversation. He suggests that: if a machine can answer any question put to it, using the same words that an ordinary person would, then we may call that machine intelligent. A modern version of his experimental design would use an online chat room, where one of the participants is a real person and one of the participants is a computer program. The program passes the test if no one can tell which of the two participants is human. Turing notes that no one (except philosophers) ever asks the question "can people think?" He writes "instead of arguing continually over this point, it is usual to have a polite convention that everyone thinks." Turing's test extends this polite convention to machines:

If a machine acts as intelligently as human being, then it is as intelligent as a human being.

Human intelligence vs. intelligence in general

One criticism of the Turing test is that it is explicitly anthropomorphic. If our ultimate goal is to create machines that are more intelligent than people, why should we insist that our machines must closely resemble people? Russell and Norvig write that "aeronautical engineering texts do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.'" Recent AI research defines intelligence in terms of intelligent agents. An "agent" is something which perceives and acts in an environment. A "performance measure" defines what counts as success for the agent.

If an agent acts so as maximize the expected value of a performance measure based on past experience and knowledge then it is intelligent.

Definitions like this one try to capture the essence of intelligence. They have the advantage that, unlike the Turing test, they don't also test for human traits that we may not want to consider intelligent, like the ability to be insulted or the temptation to lie. They have the disadvantage that they fail to make the commonsense differentiation between "things that think" and "things that don't". By this definition, even a thermostat has a rudimentary intelligence.

Arguments that a machine can display general intelligence

The brain can be simulated

An MRI scan of a normal adult human brain

Marvin Minsky writes that "if the nervous system obeys the laws of physics and chemistry, which we have every reason to suppose it does, then .... we ... ought to be able to reproduce the behavior of the nervous system with some physical device." This argument, first introduced as early as 1943 and vividly described by Hans Moravec in 1988, is now associated with futurist Ray Kurzweil, who estimates that computer power will be sufficient for a complete brain simulation by the year 2029.. A non-real-time simulation of a thalamocortical model that has the size of the human brain (10¹¹ neurons) was performed in 2005 and it took 50 days to simulate 1 second of brain dynamics on a cluster of 27 processors.

Few disagree that a brain simulation is possible in theory, even critics of AI such as Hubert Dreyfus and John Searle. However, Searle points out that, in principle, anything can be simulated by a computer, and so any process at all can be considered "computation", if you're willing to stretch the definition to the breaking point. "What we wanted to know is what distinguishes the mind from thermostats and livers," he writes. Any argument that involves simply copying a brain is an argument that admits that we know nothing about how intelligence works. "If we had to know how the brain worked to do AI, we wouldn't bother with AI."

Human thinking is symbol processing

In 1963, Alan Newell and Herbert Simon proposed that "symbol manipulation" was the essence of both human and machine intelligence. They wrote:

A physical symbol system has the necessary and sufficient means of general intelligent action.

This claim is very strong: it implies both that human thinking is a kind of symbol manipulation (because a symbol system is necessary for intelligence) and that machines can be intelligent (because a symbol system is sufficient for intelligence). Another version of this position was described by philosopher Hubert Dreyfus, who called it "the psychological assumption":

The mind can be viewed as a device operating on bits of information according to formal rules.

A distinction is usually made between the kind of high level symbols that directly correspond with objects in the world, such as and and the more complex "symbols" that are present in a machine like a neural network. Early research into AI, called "good old fashioned artificial intelligence" (GOFAI) by John Haugeland, focused on these kind of high level symbols.

Arguments against symbol processing

These arguments show that human thinking does not consist (solely) of high level symbol manipulation. They do not show that artificial intelligence is impossible, only that more than symbol processing is required.

Lucas, Penrose and Gödel

In 1931 Kurt Gödel proved that it is always possible to create statements that a formal system (such as an AI program) could not prove. A human being, however, can (with some thought) see the truth of these "Gödel statements". This proved to philosopher John Lucas that human reason would always be superior to machines. He wrote "Gödel's theorem seems to me to prove that mechanism is false, that is, that minds cannot be explained as machines." Roger Penrose expanded on this argument in his 1989 book The Emperor's New Mind, where he speculated that quantum mechanical processes inside individual neurons gave humans this special advantage over machines.

Douglas Hofstadter, in his Pulitzer prize winning book Gödel, Escher, Bach: An Eternal Golden Braid, explains that these "Gödel-statements" always refer to the system itself, similar to the way the Epimenides paradox uses statements that refer to themselves, such as "this statement is false" or "I am lying". But, of course, the Epimenides paradox applies to anything that makes statements, whether they are machines or humans, even Lucas himself. Consider:

Lucas can't assert the truth of this statement.

This statement is true but can't be asserted by Lucas. This shows that Lucas himself is subject to the same limits that he describes for machines, as are all people, and so Lucas's argument is pointless.

Further, Russell and Norvig note that Gödel's argument only applies to what can theoretically be proved, given an infinite amount of memory and time. In practice, real machines (including humans) have finite resources and will have difficulty proving many theorems. It is not necessary to prove everything in order to be intelligent.

Dreyfus: the primacy of unconscious skills

Hubert Dreyfus argued that human intelligence and expertise depended primarily on unconscious instincts rather than conscious symbolic manipulation, and argued that these unconscious skills would never be captured in formal rules.

Dreyfus's argument had been anticipated by Turing in his 1950 paper Computing machinery and intelligence, where he had classified this as the "argument from the informality of behavior." Turing argued in response that, just because we don't know the rules that govern a complex behavior, this does not mean that no such rules exist. He wrote: "we cannot so easily convince ourselves of the absence of complete laws of behaviour ... The only way we know of for finding such laws is scientific observation, and we certainly know of no circumstances under which we could say, 'We have searched enough. There are no such laws.'"

Russell and Norvig point out that, in the years since Dreyfus published his critique, progress has been made towards discovering the "rules" that govern unconscious reasoning. The situated movement in robotics research attempts to capture our unconscious skills at perception and attention. Computational intelligence paradigms, such as neural nets, evolutionary algorithms and so on are mostly directed at simulated unconscious reasoning and learning. Research into commonsense knowledge has focused on reproducing the "background" or context of knowledge. In fact, AI research in general has moved away from high level symbol manipulation or "GOFAI", towards new models that are intended to capture more of our unconscious reasoning. Historian and AI researcher Daniel Crevier wrote that "time has proven the accuracy and perceptiveness of some of Dreyfus's comments. Had he formulated them less aggressively, constructive actions they suggested might have been taken much earlier."

Can a machine have a mind, consciousness and mental states?

This is a philosophical question, related to the problem of other minds and the hard problem of consciousness. The question revolves around a position defined by John Searle as "strong AI":

A physical symbol system can have a mind and mental states.

Searle distinguished this position from what he called "weak AI":

A physical symbol system can act intelligently.

Searle introduced the terms to isolate strong AI from weak AI so he could focus on what he thought was the more interesting and debatable issue. He argued that even if we assume that we had a computer program that acted exactly like a human mind, there would still be a difficult philosophical question that needed to be answered.

Neither of Searle's two positions are of great concern to AI research, since they do not directly answer the question "can a machine display general intelligence?" (unless it can also be shown that consciousness is necessary for intelligence). Turing wrote "I do not wish to give the impression that I think there is no mystery about consciousness ... [b]ut I do not think these mysteries necessarily need to be solved before we can answer the question [of whether machines can think]." Russell and Norvig agree: "Most AI researchers take the weak AI hypothesis for granted, and don't care about the strong AI hypothesis."

Before we can answer this question, we must be clear what we mean by "minds", "mental states" and "consciousness".

Consciousness, minds, mental states, meaning

Representation of consciousness from the 17th century.

The words "mind" and "consciousness" are used by different communities in different ways. Some new age thinkers, for example, use the word "consciousness" to describe something similar to Bergson's "élan vital": an invisible, energetic fluid that permeates life and especially the mind. Science fiction writers use the word to describe some essential property that makes us human: a machine or alien that is "conscious" will be presented as a fully human character, with intelligence, desires, will, insight, pride and so on. (Science fiction writers also use the words "sentience", "sapience," "self-awareness" or "ghost" (as in the Ghost in the Shell manga and anime series) to describe this essential human property.) For others, the words "mind" or "consciousness" are used as a kind of secular synonym for the soul.

For philosophers, neuroscientists and cognitive scientists, the words are used in a way that is both more precise and more mundane: they refer to the familiar, everyday experience of having a "thought in your head", like a perception, a dream, an intention or a plan, and to the way we know something, or mean something or understand something. "It's not hard to give a commonsense definition of consciousness" observes philosopher John Searle. What is mysterious and fascinating is not so much what it is but how it is: how does a lump of fatty tissue and electricity give rise to this (familiar) experience of perceiving, meaning or thinking?

Philosophers call this the hard problem of consciousness. It is the latest version of a classic problem in the philosophy of mind called the "mind-body problem." A related problem is the problem of meaning or understanding (which philosophers call "intentionality"): what is the connection between our thoughts (i.e. patterns of neurons) and what we are thinking about (i.e. objects and situations out in the world)? A third issue is the problem of experience (or "phenomenology"): If two people see the same thing, do they have the same experience? Or are there things "inside their head" (called "qualia") that can be different from person to person?

Neurobiologists believe all these problems will be solved as we begin to identify the neural correlates of consciousness: the actual machinery in our heads that creates the mind, experience and understanding. Even the harshest critics of artificial intelligence agree that the brain is just a machine, and that consciousness and intelligence are the result of physical processes in the brain. The difficult philosophical question is this: can a computer program, running on a digital machine that shuffles the binary digits of zero and one, duplicate the ability of the neurons to create minds, with mental states (like understanding or perceiving), and ultimately, the experience of consciousness?

Arguments that a computer can't have a mind and mental states

Searle's Chinese room

John Searle asks us to consider a thought experiment: suppose we have written a computer program that passes the Turing Test and demonstrates "general intelligent action." Suppose, specifically that the program can converse in fluent Chinese. Write the program on 3x5 cards and give them to an ordinary person. Lock the person into a room and have him follow the instructions on the cards. He will copy out Chinese characters and pass them in and out of the room through a slot. From the outside, it will appear that the Chinese room contains a fully intelligent person who speaks Chinese. The question is this: is there anyone (or anything) in the room that understands Chinese? That is, is there anything that has the mental state of understanding, or which has conscious awareness of what is being discussed in Chinese? The man is clearly not aware. The room can't be aware. The cards certainly aren't aware. Searle concludes that the Chinese room, or any other physical symbol system, cannot have a mind.

Searle goes on to argue that actual mental states and consciousness require (yet to be described) "actual physical-chemical properties of actual human brains." He argues there are special "causal properties" of brains and neurons that gives rise to minds: in his words "brains cause minds."

Related arguments: Leibniz' mill, Block's telephone exchange and blockhead

Gottfried Leibniz made essentially the same argument as Searle in 1714, using the thought experiment of expanding the brain until it was the size of a mill. In 1974, Lawrence Davis imagined duplicating the brain using telephone lines and offices staffed by people, and in 1978 Ned Block envisioned the entire population of China involved in such a brain simulation. This thought experiment is called "the Chinese Nation" or "the Chinese Gym". Ned Block also proposed his "blockhead" argument, which is a version of the Chinese room in which the program has been re-factored into a simple set of rules of the form "see this, do that", removing all mystery from the program.

Responses to the Chinese Room

Responses to the Chinese room emphasize several different points.

The systems reply and the virtual mind reply: This reply argues that the system, including the man, the program, the room, and the cards, is what understands Chinese. Searle claims that the man in the room is the only thing which could possibly "have a mind" or "understand", but others disagree, arguing that it is possible for there to be two minds in the same physical place, similar to the way a computer can simultaneously "be" two machines at once: one physical (like a Macintosh) and one "virtual" (like a word processor).
Speed, power and complexity replies: Several critics point out that the man in the room would probably take millions of years to respond to a simple question, and would require "filing cabinets" of astronomical proportions. This brings the clarity of Searle's intuition into doubt.
Robot reply: To truly understand, some believe the Chinese Room needs eyes and hands. Hans Moravec writes: 'If we could graft a robot to a reasoning program, we wouldn't need a person to provide the meaning anymore: it would come from the physical world."
Brain simulator reply: What if the program simulates the sequence of nerve firings at the synapses of an actual brain of an actual Chinese speaker? The man in the room would be simulating an actual brain. This is a variation on the "systems reply" that appears more plausible because "the system" now clearly operates like a human brain, which strengthens the intuition that there is something besides the man in the room that could understand Chinese.
Other minds reply and the epiphenomena reply: Several people have noted that Searle's argument is just a version of the problem of other minds, applied to machines. Since it's difficult to decide if people are "actually" thinking, we shouldn't be surprised that it's difficult to answer the same question about machines. A related idea is that Searle's "causal properties" of neurons are epiphenomenal: they have no effect on the real world. Why would natural selection create them in the first place, if they make no difference to behavior?

Is thinking a kind of computation?

This issue is of primary importance to cognitive scientists, who study the nature of human thinking and problem solving.

The computational theory of mind or "computationalism" claims that the relationship between mind and body is similar (if not identical) to the relationship between a running program and a computer. The idea has philosophical roots in Hobbes (who claimed reasoning was "nothing more than reckoning"), Leibniz (who attempted to create a logical calculus of all human ideas), Hume (who thought perception could be reduced to "atomic impressions") and even Kant (who analyzed all experience as controlled by formal rules). The latest version is associated with philosophers Hilary Putnam and Jerry Fodor.

This question bears on our earlier questions: if the human brain is a kind of computer then computers can be both intelligent and conscious, answering both the practical and philosophical questions of AI. In terms of the practical question of AI ("Can a machine display general intelligence?"), some versions of computationalism make the claim that (as Hobbes wrote):

Reasoning is nothing but reckoning

In other words, our intelligence derives from a form of calculation, similar to arithmetic. This is the physical symbol system hypothesis discussed above, and it implies that artificial intelligence is possible. In terms of the philosophical question of AI ("Can a machine have mind, mental states and consciousness?"), most versions of computationalism claim that (as Stevan Harnad characterizes it):

Mental states are just implementations of (the right) computer programs

This is John Searle's "strong AI" discussed above, and it is the real target of the Chinese Room argument (according to Harnad).

Technological singularity

2009-03-06T07:23:00.000-08:00

Intelligence explosion

Good (1965) speculated on the consequences of machines smarter than humans:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

One other factor potentially hastening the singularity is the ongoing expansion of the community working on it, resulting from the increase in scientific research within developing countries.

Economic aspects

Potential dangers

Bostrom (2002) discusses human extinction scenarios, and lists superintelligence as a possible cause:

When we create the first superintelligent entity, we might make a mistake and give it goals that lead it to annihilate humankind, assuming its enormous intellectual advantage gives it the power to do so. For example, we could mistakenly elevate a subgoal to the status of a supergoal. We tell it to solve a mathematical problem, and it complies by turning all the matter in the solar system into a giant calculating device, in the process killing the person who asked the question.

Moravec (1992) argues that although superintelligence in the form of machines may make humans in some sense obsolete as the top intelligence, there will still be room in the ecology for humans.

Accelerating change

Kurzweil writes that, due to paradigm shifts, a trend of exponential growth extends from integrated circuits to earlier transistors, vacuum tubes, relays and electromechanical computers.

Various Kardashev scale projections through 2100. One results in a singularity.

One conversation centered on the ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.

Criticism

Some critics assert that no computer or machine will ever achieve human intelligence while others do not rule out the possibility. Theodore Modis and Jonathan Huebner argue that the rate of technological innovation has not only ceased to rise, but is actually now declining. Smart (2005) criticizes Huebner's analysis. Some evidence for this decline is that the rise in computer clock speeds is slowing, even while Moore's prediction of exponentially increasing circuit density continues to hold. Although clock speeds in the past were advertised as the main source of speed from a processor, that's no longer true. Today's processors use the circuits for different, more efficient purposes than pushing raw clock speed. For instance, a Core i7 at 2 GHz is far more powerful than a Pentium 4 at 4 GHz.

A recent study of patents per thousand persons shows that human creativity does not show accelerating returns, but in fact—as suggested by Joseph Tainter in his seminal The Collapse of Complex Societies—a law of diminishing returns. The number of patents per thousand peaked in the period from 1850–1900, and has been declining since. The growth of complexity eventually becomes self-limiting, and leads to a wide spread "general systems collapse". Thomas Homer Dixon in The Upside of Down: Catastrophe, Creativity and the Renewal of Civilization shows that the declining energy returns on investment has led to the collapse of civilizations. Jared Diamond in Collapse: How Societies Choose to Fail or Succeed also shows that cultures self-limit when they exceed the sustainable carrying capacity of their environment, and the consumption of strategic resources (frequently timber, soils or water) creates a deleterious positive feedback loop that leads eventually to social collapse and technological retrogression.