Knol

I find that Knol format is better suited for the stuff that I write, so the updates will now be posted there first. My knols are new edits of three posts here: Intelligence as a cognitive algorithm, Meta Evolution: evolution beyond biology, Cognitive focus: generalist vs specialist bias. See you there!

Cognition: hierarchical pattern discovery with incrementally scalable syntax.

Intelligence is a cognitive algorithm: it predicts/self-predicts (plans) by discovering & projecting patterns. This definition & the following opinions are mine, as the alternatives are scarce. For an excellent high-level discussion see "On Intelligence" by Jeff Hawkins, though consistency is lacking there.
General (scalable) intelligence must recursively self-improve: continuously develop new algorithms. This requires a criterion of improvement, & to be universal it must come from the very definition of intelligence. There is an opinion that intelligence can be recognized but not defined, which is absurd because recognition *is* a match between an input & a definition.
I think the lack of functional definition is the main reason for the failure of general AI attempts over the last half-century, although Algorithmic Information Theory and Bayesian logic are a good start.

We know of one mechanism that did produce an intelligence, although a pretty messed-up one: the evolution. Initially algorithmically very simple, evolution changes heritable traits at random & evaluates results for reproductive fitness.
But biological evolution is ludicrously inefficient because intelligence is only one element of reproductive fitness, & selection is extremely coarse-grained: on the level of a whole genome rather than of individual traits.

By my definition, a fitness function specific to intelligence is predictive correspondence of input patterns.
Correspondence is a representational analog of reproduction, maximized by an internalized evolution:
- the "heritable traits" for predictions are past inputs, "variation" is a change of their location & resolution, driven by:
- the "fitness": their cumulative match to the following inputs, produced by comparison which also drives variation by derivation.

Match (fitness) should be quantified on the lowest level of comparison,- this makes selection more incremental & efficient.
The lowest level is the comparison between two single-variable inputs, & the match is a partial identity: a complimentary of the difference, or the smaller of the variables. This is also a measure of analog compression: a sum of bitwise AND between uncompressed comparands (represented by strings of ones).
This adds a whole new dimension to Bayesian logic, - I quantify partial match or occurrence (a micro-dimension of prediction) just like Bayesian logic adds quantified partial probability (a macro-dimension) to classical logic.

To speedup, the search algorithm must incorporate increasingly more complex shortcuts to discover better predictions (the speed is what it’s all about, otherwise we can just sit back & let the biological evolution do the job).
These more complex predictions (patterns) & pattern discovery methods (algorithms) are derived from the past inputs of increasing comparison range & derivation depth.

The most basic shortcuts are based on the assumption that the environment is not random:
- Input patterns are decreasingly predictive with the distance.
- Pattern is increasingly predictive with the accumulated match, & decreasingly so with the difference between constituent inputs.

A core algorithm based on these assumptions would be an iterative step that selectively increases range & complexity of the patterns in proportion to their projected cumulative match:

The original inputs are single variables produced by senses, such as pixels of visual perception. Their subsequent comparison by iterative subtraction generates new variables: length & aggregate value for both partial match & miss (derivatives) for each variable of the comparands. The inputs are integrated into patterns (higher-level inputs) if the additional projected match is greater than the system's average for the computational resources necessary to record & compare additional syntactic complexity. This compressive syntax expansion repeats with every new level of search: each variable of an input pattern forms its own pattern.

On the other hand, if predictive value (projected match) falls below the systems' average, the input pattern is aggregated with adjacent "subcritical" patterns by iterative addition, into a lower-resolution input. Aggregation results in a "fractional" projection range for constituent inputs, as opposed to "multiple" range for matching inputs. By increasing magnitude of the input it increases its projected match: a subset of the magnitude. Aggregation also produces the averages to determine resolution of future inputs & evaluate their matches.
So, the alternative integrated/aggregated representations of inputs are produced by iterative subtraction/addition (the neural analogs are inhibition & excitation), both determined by comparison among the respective inputs.

Cognition is a form of evolution where variation does not proceed by altering the inputs directly: a prediction can only be derived from experience. Rather, it redefines coordinate & resolution of the inputs, & generates length & derivatives (higher-level inputs) by comparing them. It's not random: the range/resolution, & the syntactic complexity of inputs increases or decrease in proportion to their relative cumulative match: the selection criterion. I consider this to be a higher phase of meta-evolution (see related post). Cognition is driven by predictive fitness, where the patterns themselves are dispensable, compared to biological evolution driven by reproductive fitness, where the patterns (genome) are the end in themselves.

The biggest hangup people usually have is that this kind of algorithm is obviously very simple, while working intelligence is obviously very complex. But, as I tried to explain, additional complexity is learnable and should only improve speed, rather than change the "direction" of cognitive evolution (although it may save a few zillion years). The main requirement for such algorithm is that it continuously improves the ratio of benefit: predictive power, to cost: complexity.

I would summarize the algorithm as Comparison-Projection, - a more constructive analog to Jeff Hawkins' Memory-Prediction.
Hope this makes sense. I have a far more advanced work-in-progress, but if the core premises here (if correct) are already way ahead of any other approach that I know of.

Meta-Evolution: evolution beyond reproduction

This is the old version, the editing here is too difficult to bother. For a new one, see my knol.

This is an attempt to generalize the arrow of time: a common thread from the Second Law, through reproduction of the fittest, to technological progress, & beyond.
Evolution of "fitness": from Entropy in physics, to Reproduction in biology, to Prediction for cognitive systems. Many observers (such as Ray Kurzweil & Francis Heylighen) see matter, life, & cognition as the phases in the "evolution" of the known universe. But they seem to use an increasing scale & complexity of evolving systems as the main criterion for the trend across these phases. This presents a problem: galaxies are bigger than brains, & the only unambiguous definition of complexity is as the degree of randomness, which is hardly inspiring. I suggest that a system is defined by its conserved core: the common traits that it preserves & propagates. In my interpretation the complexity of a system itself is strictly instrumental, driven by functional differentiation. The ultimate criterion of progress is "correspondence concentration" of the core, which becomes more abstract on higher phases.

In the most general terms, evolution is a variation of heritable traits and selection of the results fit for a given evironment: preserved over time and reproduced across space. Meta evolution can be defined as the evolution for the mechanisms of variation & selection themselves, both of which are determined by the environment. Thus, the driver of meta evolution is the environmental change, resulting from selective propagation of previous variations. New levels of evolving systems emerge in an ecosystem consisting largely of previously evolved specimen. Evolution on such higher levels should have more constrained (less random) modes of variation, & more abstract (differentiated) criteria for selection.

Any evolving system (a unit of evolution) can be conceptually subdivided into a conserved core & an adaptive hierarchy of its environmental interface. A core is a subject of variation & selection, while an adaptive interface facilitates propagation of the core at the expense of its own specifics. In biology, genotype is a core & phenotype is an interface, & similar distinctions can be applied beyond biology. Other things equal, the reduction of a core relative to the adaptive interface makes it more "fit": easier to propagate. Besides, a core is more fit if it's also functional for its own propagation. That means a core evolves as an abstract (stable across space-time) representation of the interface hierarchy. These statements are tautological, but so are the very concepts of increasing entropy, Darwinian evolution, & algorithmic complexity, not to mention all of math.
Thus, the fitness value of meta evolution is abstractness of a core, supported by the expansion/differentiation of its adaptive hierarchy. A higher core evolves as a subset of an existing core, the rest of which becomes "adaptive": conservation & propagation of the specifics is overriden to maximize those of a new core.
Three of the most fundamental phases in such incremental abstraction during "the brief history of time" are:

Entropy growth: equalization of matter/energy levels over space-time, or continuous pattern expansion.
Evolution in biology: reproduction of differentiated patterns, by metabolizing external matter/energy.
Cognition: recognition of learnable/forgettable patterns to maximize predictive value of representations.

There's no beginning or end to meta evolution, so I'll start from the best-understood phase: reproducing core. There are at least six stages of adaptive expansion & differentiation within this phase, each sequentially increasing the stability of a core, such as genotype, and the complexity gap between genotype & an adaptive phenotype:

- Restoration: an ability of fermions/atoms/molecules to maintain their "pattern" in spite of exchanging energy & constituent particles. In a way, it's a spatially/temporally differentiated pattern preservation.
- Replication, or serial symmetrical restoration: the "replacement" atoms/molecules selected by the template from the environment subsequently form their own "replacements"(crystals), or get detached & then both sides "restore" themselves again (RNA).
- Differentiated Replication, or the most basic life: a reproducing genome consisting of cooperative genes that are sequentially activated to also function as mediated templates for proteins & so on: a multi-step process to accumulate resources for reproduction (4 nucleotides of DNA & 20 amino acids).
- Adaptive Reproduction, where the degree of this "alternative" reproduction (expression) for individual genes and their products is controlled by environmental feedback: an additional step of functional differentiation over environmental variation: "instinctive" internal reactions and external behaviour.
- Conditioned Adaptation, where inherited reactions & behaviour are suplemented/replaced by those proven to be instrumental to them. "Instrumental" means reliably preceding inherited or previously conditioned values: states maximized by the responses. This is a basic form of induction.
- Predictive Adaptation, via cognitive modeling, with adaptation & conditioning to projected vs experienced environment. Projection is a basic deduction, probably requiring some form of cortex.

The genomes of higher organisms are far more stable: the mutation rates are the highest in bacteria & decrease as the organisms become larger & longer-living. This is because of improved protection for genome, as well as reduced reproduction rates & expanded lifespan (mutations accumulate during reproduction). In spite of this slowdown in variation, the increase in complexity of phenotypical functional differentiation in the course of evolution appears to accellerate. This can be explained by a growing proportion of aquired, as opposed to inherited complexity.

"Complexity acquisition" occurs in the three "adaptive" stages listed above: reaction, induction, & deduction. These later stages are a gradual transition to representation-maximizing, vs reproduction-maximizing, phase of meta evolution. Most of the aquired complexity is procedural (behavioral) rather than structural, although the difference here is only in the speed of change (adaptation). Such complexity can be described as motivation: a set of patterns that define behavior. A motive is a stimulus or a pattern of stimuli that attract or repulse a subject: are maximized or minimized depending on their intensity relative to an optimal value. Such value patterns increase in generality during the process of evolution: phylogeny, & maturation: ontogeny.

In the last three stages of the "reproduction" phase, the conserved core of motivation is, correspondingly: Instincts: inherited adaptive responses, Reflexes: sequentially conditioned responses, & Goals: a conserved core of responses to predictions. Instinctive motivation is encoded by a genotype, thus selected by reproductive fitness only. Conditioned motives, on the other hand, are selected by their consistent precedence to the more primitive motives. The more general "instrumental" motives sequentially displace the original ones. Such displacement is necessary because the value patterns for reproductive fitness of higher animals change too fast to evolve with their genotype, & are far too complex to fit in it. In turn, goal-directed motivation greately accelerates such displacement because it's driven by theoretical predictions, rather than by actual experiences.

Cognition: evolution beyond reproduction.

The "value drift" during conditioning means that reproductive fitness may no longer be an ultimate core of motivation. For humans, the drive toward reproduction seems to be largely conditioned by culture, & is no longer decisive in the modern variety of it. This is obvious because birth rates & population actually decline in the wealthiest & the most educated countries or social groups. It's even more obvious that no one would spend his life savings to manufacture ever greater amounts of his DNA. So, the intrinsically conserved core for conditioned & goal-directed motivation is not reproduction, but sequential, & then projected, correspondence to previous motives. It may seem that aquired types of motivation don't belong in an ultimately conserved core of an evolving system, but in higher social animals they are also passed across generations. Human civilization is an extreme case, where cultural conditioning, & with the expanding learning phase & life span, increasingly individual conditioning, is a dominant carrier of motivation, which all that really matters on this stage. The genes, on the other hand, may soon go extict altogether. The only subjective value of the genes for humans is their capacity to maintain/propagate the phenotype, & we can develop better tools for the job.

Most basic cultural, or memetic, evolution is driven by a type of pattern reproduction, except these patterns are conditioned rather than inherited. In higher social animals conditioning accelerates learning by uncritical acceptance & imitation: a shortcut of authority for evidence. Thus, memes reproduce through upbringing & socialization, and can ultimately be legitimized by religion (probably uniquely human). Such "memetic" reproduction is a subject of group selection, which parallels genetic reproduction & selection of individuals. Some consider memetic evolution by group selection to be a higher phase of meta evolution, but I beg to differ. Value-loaded memes, just like genes, are still selected for their reproductive fitness, only on societal rather than individual level. Such selection is extremely coarse-grained & thus obscenely inefficient.
For a far more efficient mechanism of memetic selection we need to look at the originators of the memes in a society: authoritative individuals such as leaders, prophets, experts, & scientists. Such individuals use personal cognition or institutional science to derive memes they consider useful or accurate. This process is predictive: the memes don't initially come with any survival record. Social progress democratizes such meme generation: more people use their cognition to form or critically evaluate accuracy/utility of the memes, instead of blindly accepting them from an authority. Fittingly, cognition is "implemented" in the neocortex, vs. the more primitive limbic system for conditioning by authority or tradition.

Human motivation develops by conditioned self-identification with increasingly generalized instrumentals: from simple urges with inherited value: pain/pleasure, soft/hard, sweet/bitter, warm/cold, new/old.., to identification with the body as instrumental to these urges (resulting in self-preservation drive), & then to expanding social identification: family, including various stages of mating ) community ) country ) humanity. The drive for social status (formalized as money & power) seems to parallel social identity: the generality of its' instrumental aspect is a subset of the corresponding level of society.
This development of broader "self" is directed by specific inherited values: somatosensory feedback, the patterns of human beauty, sexual attraction, childcare, the drive for social status, & societal empathy. However, instrumental conditioning increases/decreases relative value of a motive, & seems necessary for more general motives to significantly displace competing lower motives.
Rough Freudian parallels here are ID for urges, Ego for a body, & Super Ego for social identification.

As instrumental values eventually displace the inherited ones, the conserved core of motivation shifts toward higher generality instruments. Childish impulsiveness is substantially displaced by adolescent egocentrism, which in turn is displaced by increasingly broad socialization. The speed & degree of displacement vary greatly in proportion to the subject's "attention span": developing more general instruments requires longer-term investment. This attention span expansion is impeded by the urgency of lower motives, which is a combination of their objective intensity & subjective sensitivity. Sensitivity is a stimuli response for a given motive. The variation in sensitivity for individuals is both inherited & conditioned: sensitized by deprivation or desensitized by addiction.
All motives compete via winner-take-all inhibition. Generally, the attention span expansion is reversely proportional to the stimuli decay rate as it propagates into association areas of neocortex. Such areas, especially anterior prefrontal cortex, represent higher-generality patterns, which become motives through conditioning. Lower decay rate will make higher-generality motives relatively stronger & better able to inhibit lower ones. The decay rate variation is likely partly inherited via cortical trade-offs & gene expression for dopamine & serotonin receptors. I've detailed my speculations on this in my "Generalist vs Specialist" knol. In most cases, the generalization of motives is only traceable through the end of adolescence, by which time an individual runs out of a neocortex to myelinate, & then neural plasticity decreases. Hopefully, we can find the means to expand the "adolescence".

This interpretation is similar to Maslow's hierarchy of needs & related ERG theory, except that they treat the higher "needs" as inherited, although latent, while I think they are largely instrumental, & gain additional value through conditioning.
The broadest inherited affinity drive seems to be social empathy / altruism. However, it's not independently definable: empathy works by recognizing a self-image in others. Such self-image changes radically during the growing-up process, & will change beyond recognition if this process extends indefinitely. Increasing cultural diversity makes social identification ever more tentative, & formalization /automation of interactions separates intrinsic affinity from the instrumental value of society for an individual.
Conditioning to theoretical projections, as opposed to past utility, should eventually displace social affinity by identification with broader, more abstract concepts instrumental to the creation of society/humanity itself: progress, evolution, or god for the believers (this is about motivation, not to be confused with intelligence).

Ultimately, curiosity is the only universally instrumental motive, - it drives cognition necessary to recognize what's "instrumental" in the first place. Therefore, it's the only ultimately conserved motive, & should eventually displace all others. Curiosity evolves from inherited novelty seeking to deeper pattern discovery drive, the conserved core for both being the increase in predictive correspondence of representations.
On its highest stage, intelligent life, pattern reproduction uses prediction (a higher phase of meta evolution), as a tool. But a purely cognitive system may have predictive power as an ultimate fitness criterion. A cognitive system develops a hierarchy of recognition/projection mechanisms that create increasingly predictive models. This hierarchy seems to parallel that of restoration/reproduction mechanisms (the most basic of which I listed above) that create increasingly stable patterns. However, that's is a subject for my "Intelligence" knol.

Entropy growth: evolution before reproduction.

The only universal trend in a physical world is the growth of entropy, or a trend toward equilibrium. For repulsve interactions, such growth is equalization of energy levels across Space-Time. For attractive interactions, the direction would be reversed in space, but not in time, which is a "macro-dimension".
At the first glance, entropy has nothing in common with (& antagonistic to) higher fitness values: reproduction & prediction. I think this is because physical entropy is confused with the informational one, which is a measure of disorder. Physical entropy is exactly the opposite: stability/homogeneity is synonymous with order, on a given level of organization (the definition must be restricted to specific level).
This confusion arises because in Information Theory the disorder is measured by the number of bits necessary for description, which is compressed, rather than the differences in actual physical values, which are not. S-T continuous causality means that variation within localized objects is lower than that between them. Thus, the interaction between the objects will reduce the difference between them, but increase internal differences within each. The resulting total variation within a "closed system" will be reduced in magnitude, but increased in the "number" of local smaller-magnitude differences. Overall, the record of all differences will require more bits because local homogeneity and greater magnitudes are more compressible.

So, both the growth of entropy & that of reproductive fitness represent increasing stability over time & homogeneity across space. The difference between them is in scope: entropy increases in all interacting objects, with no spatially distinct adaptive subsystem, while reproduction is restricted to a "genotype". Even so, the growth of entropy, while it drives all the biollogical processes, is vastly inferior to life in complexity & most wouldn't consider it to be part of meta-evolutionary process. On the opposite, the "heat death" seems be life's worst nightmare. But I would argue that reproduction at its ultimate extreme, "the grey goo" scenario, is just as regressive. And reproduction-driven evolution is just as inferior compared to prediction, judging by the technological civilization it already produced & "guesstimating" the potential.

Obviously, by utilizing increasing amounts of resources to maintain & advance adaptive systems, the higher forms of evolution accelerate the growth of entropy in the environment. More abstract conserved core, being correspondingly detached from the environment, requires more complex adaptive subsystems. Thus, meta-evolution can be a universal, rather than local, trend only if the best more-complex-systems are a priori more fit than the best less-complex ones. The less fit systems become an resource for the more fit ones. I believe that complexity generally wins because it allows for greater functional differentiation in the adaptive subsystem. Functional differentiation is the essence of progress & the cause of its acceleration: specialized mechanisms are by definition more efficient than general-purpose ones. This doesn't mean that any given system/environment will evolve forever, only that meta evolution will predominate on the average.

This assumes that the universe is open, otherwise the evolution will run out of resources & die a heat death. There are two ways to defend this assumption a priori:

First, there’s a Bayesian premise: given that our knowledge about any "physical" infinity is a priori infinitesimal, all possibilities should be assigned equal probability. Since "open universe" represents infinite number of possibilities, & "closed universe" only one, there's no contest.
Second, but more crucial, is a "projection decay" premise (my own, as far as I know):
Basically, "closed universe" assumption means that the known patterns can be projected into infinity, - no external impact is expected to terminate them. This position is popular among programmers & mathematicians because of heavily deductive nature of their work. It's also popular among physicists, because fundamental physical laws do appear to be absolute, - no limits/exceptions have been found for GR, QM, or Standard Model in a hundred years of intense research.But, however fundamental physics is, it covers only small part of human knowledge. In all other sciences patterns (laws) do have limits & do decay with a distance (in the frame of reference used to define them). A cross-science generalization seems to show that smaller-scale patterns are stronger (more evolved?). The laws of physics are at the extreme end of this trend, with no known maximal range. It might be logical to project these laws into infinity, except that this contradicts the cross-scale negative meta-pattern of decay with the distance, which can be overriden only by laws with a proven infinite range.

To recapitulate:

Equalization, or simple entropic averaging, forms continuous patterns by increasing similarity between interacting objects (macro), at the expense of decreasing similarity within each (micro).
Reproduction increases core/environment differentiation by selectively maximizing internally differentiated genotype pattern, & by introducing externally differentiated adaptive phenotype on a macro-level. The "ecosystem" for reproduction must be non-random: "stabilized" by the previous entropic equilibration.
Recognition further increases such differentiation by representing only inputs: the "part" of represented object that impacts a detector, & by maximizing only match: the "part" that's common among the inputs. The core here is prediction, patterns & algorithms themselves are adaptively "metabolized" to maximize it. The "ecosystem" for representations (brain) must be "stabilized" by the previous reproduction of memory & recognition processing "cells".

A higher-phase variation is directed by a lower-phase selection: random mutations in genotype are driven by the growth of entropy, & new cognitive algorithms are ultimately selected by their reproductive fitness. The more abstract core allows for more flexible & functionally differentiated adaptive hierarchy, which is better able to preserve/propagate the core. The competion among core levels is not apparent: life is based on matter & does not obsolete it, & cognition is based on life, or some form of pattern accumulation mechanism. But genetic pattern conservation is achieved at the expense of reduced entropy in its adaptive subsystem. And cognitive correspondence is maximized by the speedup of "pattern metabolism": less stable memory contents. In other words, maximization of a more-abstract core comes at the expense of reduced stability (fitness) of specifics in its adaptive hierarchy, resulting in local vs global correspondence concentration increase.

There must be lower selection criteria than entropy, and higher than prediction, along the line of increasing correspondence concentration of a relatively shrinking/abstract conserved core.
Lower than entropy may be quantum mechanics, where local matter/energy conservation does not apply. Quantum randomness effectively blocks entropic equalization/stabilization on a micro-scale.
More abstract than prediction is probably mathematics: computational compression. Such compression is not predictive per se, but is universally instrumental for prediction.

So, Ladies & Gentlemen, the meaning of life, universe, & everything: correspondence concentration growth for an increasingly differentiated/abstract conserved core: a fitness function of meta-evolution.

A mindset for AI discovery.

I've been utterly frustrated by Artificial Intelligentsia's apparent inability to comprehend the very nature of the problem they're trying to solve. This is what I've come up with in a way of explanation:

The process of learning is the only thing in common for all the knowledge it generates, which is why I believe that understanding it requires a primary focus on generalization over one's entire world model. In effect, this will result in meta-generalization, the only product of which is the process of generalization itself.

As distinct from a more conventional focus on specifics, the focus on high-level generalization (& especially meta-generalization) comes at the expense of precision & might be useful only over the time/scope greater than most peoples' attention span.
Moreover, formalization of the generalization process was pointless prior to the development of computers: we can't really control our own low-level learning because it is subconscious & highly distributed, so the result can only be trully useful as a computer simulation.

(In other words, the extreme reduction necessary to formalize the general intelligence makes it impossible to evaluate the resulting algorithm by trying to imagine how it can predict experience. The experience in case is everything we know & a theory/algorithm can produce meangfully complex predictions only through extremely long recursion. That's the difference between a theory of intelligence & most scientific theories, - it can't be "implemented" by a human doing consciously controlled sequental deduction.)

On the other hand, the trade-offs required for such focus are slow reaction/implementation (as analytical depth introduces delays of each level of analysis), & lower precision (as generalization is a necessarily lossy reduction). Needless to say, evolution had little use for such trade-offs in the Stone Age conditions, when one's very survival was never assured, or even in the "civilized" world, with its' social pressures.

The fact that AI is historicaly a subfield of computer science makes the situation even worse. It's a dark irony that the most general/theoretical problem concievable is tackled mostly by programmers, -
the ultimate specifics & implementation people. It's not an accident that many successful programmers (including Alan Turing, the patron saint of artificial intelligentsia) have milder forms of autism, which is basically an extreme focus on the detail & correspondingly impaired ability to generalize. (Physiologically, a necessary feature of autism spectrum disorder is a reduced intracortical communication, which is a key for the generalization process & intellectual integrity).

More generally, the people who attempt to implement AI originate in engineering, math, & hard sciences, - fields with a heavy deductive vs inductive bias. Such bias makes them less inclined to focus on the generalization (induction) process.
On the other hand, the inductive bias people, such as social scientists & philosophers, are interested in the products, rather than in the procedural implementation, of generalization. So, while I think an inductive bias is necessary to understand intelligence intuitvely, this mindset is ill suited for explicit algorithmic formalization.

Besides, social sciences & especially philosophy as social institutions have always been disfunctional fields, dominated by rhetorics, moral/religious "reasoning", & glorified subjectivism. This, of course, is of necessity: being the highest levels of generalization they don't have the intellectual content that a society can make sense/use of in acceptable time frame, so they must generously supplement with emotional appeal to "human interests” & legitimization by intellectual tradition or "higher authority".

Generalization can be meaningful only if combined with semantic clarity: consistent term definition (an introspective form of analysis by splitting a concept into its constituents). Without such consistency the concepts become incrementally more confused with each level of generalization. This confusion is apparent in philosophy, all the terms of which are relativistic (definable only through each other). However, doing the hard work of making the definitions explicit would mean destroying the mystique of philosophy (& of humanity), & nothing seems to be more abhorrent to the philosophers.

So, I think the real problem in AI research is the lack of focus & discipline rather than insufficient "raw" intelligence. It's hard to focus on the mechanics of generalization because of a global scope & off-putting lossiness of the process, as well as the social pressures & psychological urges driving one toward specific & immediate results.

This, of course, is mostly me looking in the mirror.
For my speculations on neurological details see Cognitive focus & cortical minicolumns.

Inductive & deductive cognitive phases.

This is a more "procedural" angle on Cognitive focus & cortical minicolumns & A mindset for AI discovery.
Induction is a generalization, or a pattern discovery process. It is lossy in proportion to selectiveness, - a "pattern" is a set of inputs with an acceptable degree of match. A higher threshold for match means more inputs are dismissed as "noise" & the search is longer.
Deduction, on the other hand, is an interactive projection of the resulting patterns, which forms new combinatorial patterns. Thus, it's a generative or creative "invention", rather than selective or reductionist "discovery" process.
A generalist bias would cause a longer & more selective induction prior to generative deduction. The proportion between the two will not necessarily change, but there will be greater delay of a deduction-heavy phase, which will then last relatively longer. This is because both the input patterns & their combinatorial secondary patterns will be relatively less detailed but longer-range. So, the difference between generalist & specialist is that the former will take longer to transition from induction (knowledge discovery) to deduction (knowledge application).

In social terms, less critical acceptance of second-hand knowledge speeds up the deductive phase of thinking. I use "critical" not only in reference to the consistency/correctness of the data, but also to its relative importance/ pattern strength in the grand scheme of things. Such "passive learning" is necessary for functioning in a differentiated society, where a large amount of data must be acquired from "mentors" prior to making an independent contribution. This is particularly important in math, engineering, & "hard sciences", where deduction is the name of the game & "lossy generalizations" are frowned upon.
Although induction is also used in math, programming, & engineering, it is secondary to deduction, which is a reversal of the "natural order of things". This may help to explain why these, apparently "hardcore rationalist" fields, produce a disproportional number of religious extremists (see Engineers of Jihad ).
On the other hand, a generalist will be more selective & intellectually independent, taking longer time as passive observer prior to reaching conclusions (deducing from observation). He will also be less likely to accept "authority", including peer pressure, & thus will be less "socialized". This delay & detachment in practically applying knowledge requires innate emotional security, - innate because it's not based on prior success & is less sensitive to conditions. It's very suggestive that Autism Phenotype (which I think is a good proxy for a Specialist Phenotype) seems to be partly caused by a low prenatal serotonin exposure (serotonin is a "peace of mind" neurotransmitter).

Generalist vs Specialist' neuroarchitectural bias.

This is an old edit, the Blogger is too difficult to bother updating. For a new edit see my knol.

This is related to “A mindset for AI discovery”, but with a dose of neuroscience.
By a "generalist" I mean someone focused on finding common patterns across different fields of knowledge, rather than a "serial specialist" who develops professional competence in each of these fields. I am not a neuroscientist & the following is very speculative.
Cognitive bias toward a certain degree of generalization, as opposed to to the level of detail in learning, appears to be partly inherited or developed prenatally /early postnatally. I did some literature search for possible mechanisms of variation in such bias, this is a result. Would appreciate informed comments.

In very general terms, a neocortex is a network of columns loosely organized in a hierarchy of generalization, from primary to association areas of both sensory & motor cortices (see "The columnar organization of the neocortex" by Vernon Montcastle, "Cortex & Mind" by Joaquin Fuster, "On intelligence" by Jeff Hawkins). Given a relatively fixed volume & resources, the neocortex must trade between the number & the range of connections in the network. In other words, this cortical hierarchy can be relatively dense or sparse.
Longer-range "sparse" connections have an exponentially greater number of possible targets, thus requiring longer time to find the best match, or to "wire" the network. Fewer total connections would reduce the amount of represented detail, but their increased range will improve match & generality of the resulting patterns. This is because learning, as distinct from passive recording, is selective: the inputs must be reinforced by matches to previous knowledge (as in coincidence detection). In a sparse hierarchy the choice of such reinforcement is greater, so the best match will be better, resulting in slower & more selective learning. So, the tradeoff is between learning speed & detail of a dense hierarchy, & the generality of patterns/concepts discoverable by a sparse hierarchy.

A functional unit of neocortex is a minicolumn: a group of ~100 neurons vertically connected across six layers of neocortex, & derived from the same group of progenitor cells during embrionic development. Although functional separation of individual minicolumns is disputed, they are ontogenetically distinctive & their vertical differentiation (algorithm) is genetically determined. I suggest that this algorithm is an atomic recognition / generalization function, iterated by vertically connecting multiple minicolumns. This is based on the assumption that main cortical function is cognition, which can be reduced to recursive generalization: comparison-projection steps (for more on that see my Intelligence knol). My half-educated guess is that lateral connections among minicolumns, from layer I to layers II & III, mostly mediate a winner-take all inhibition. On the other hand, the vertical connections, from layer V,VI of an source minicolumn to layer IV of a target, via thalamus, are across generality: the output should be a compressed generalization of the inputs.

I know of three levels of neuroarchitectural differences that seem to bias a cognitive focus: hemispheric assymmetry, differences among individuals, & cortical features that distinguish humans from other animals.

First, the evidence on the correlation between neuroarchitectural & cognitive bias in cortical hemispheric asymmetry. It seems that the left hemisphere represents higher-generality, especially semantic concepts, while the right hemisphere works mostly in the background, likely searching for contextual patterns (Cortex & Mind, p. 184, Split Brain, Michael Gazzaniga). The difference, of course, is mostly in degree. Accordingly, Jeffrey Hutsler and Ralf A.W. Galuske showed in "Hemispheric asymmetries in cerebral cortical networks" that macro-columns in the left hemisphere contain relatively fewer mini-columns than corresponding areas in the right hemisphere. The axons in the left hemisphere are better myelinated, even though the total volume & number of synapses is the same in corresponding areas of both hemispheres.This asymmetry seems to be greater in humans than in other animals. The hemispheres do not normally operate independently, they are densely interconnected by Corpus Callosum. Some of this connectivity is to provide simple fault-tolerance & sensory-motor field integration, as in animals. But because of the asymmetry ("lateralization") in humans, the transfer of data between hemispheres will likely be between different levels of generality. This mismatch means that the transfer will add another step of generalization to the hierarchy of the left hemisphere.

The best evidence for neuroarchitectural differences among individuals comes from research on autism spectrum disorder (ASD), or broader autism phenotype (BAP). Much of my info on this is via "A Shade of Gray" blog: an excellent review of relevant research, highly recommend. Among other things, BAP is known to increase a focus on specifics, at the expense of higher level generalization ability.
This bias seems to be partially caused by the fact that BAP individuals have greater number of smaller & more densely packed minicolumns per macrocolumn. Their minicolumns contain the same number of smaller-size neurons, which probably drive signals over shorter range between the macrocolumns, producing local vs global connectivity bias in BAP ( from Casanova - "Abnormalities Of Cortical Circuitry In The Brains Of Autistic Individuals", via A Shade of Gray). Weaker inter-macrocolumn signals likely result in inhibited transfer of information between the levels of generalization. This would leave higher levels (associative areas) under-utilized, & my personal guess is that they will re-specialize into more "primary" areas by re-orienting toward less mediated (attenuated) specific thalamocortical inputs. Suggestive research: Partially enhanced thalamocortical functional connectivity in autism. In other words, instead of differentiating by the generality of data, the areas will differentiate by its spatio-temporal & modality-specific origin.Very interesting study "Comparison of the Minicolumnar Morphometry of Three Distinguished Neuroscientists and Controls" by Dr. Casanova is reported in "Minicolumns, Genius, and Autism". The neural connectivity of the neuroscientists appears to be similar to autistics in the density & size of minicolumns, but differ in better inhibitory isolation between adjacent minicolumns. This should focus the output of minicolumns toward vertical vs lateral connections, increasing the vertical range even for smaller minicolumns. The other likely difference is in their corpus callosi, the structure that connects the left and right cerebral hemispheres, which have consistently shown to be smaller in autistics.

Yet another set of evidence is the difference in cortical architecture between humans (with obviously vastly greater generalization ability) & other animals. Beside much larger neocortex & hemispheric assymetry, the most salient such difference is the Spindle neurons , which are present only in humans &, to a far lesser extent, in other primates & whales. From Wikipedia, via "A Shade of Gray: "Spindle cells appear to play a central role in the development of intelligent behavior and adaptive response to changing conditions and cognitive dissonance. They emerge postnatally and eventually become widely connected with diverse parts of the brain, evidencing their essential contributions to the superior capacity of hominids to focus on difficult problems." Becuse they're much bigger, & their axons are longer & less branched than those of pyramidal neurons they replace, the spindle neurons should radically extend the range of vertical connections between the minicolumns. This increased range is probably not free, but comes at the expense of reduced density of connections.

The above discussion considered neuroarchitecturally determined trade-offs. Cognitive focus is also biased by the variation in temporal attention span, which probably also affects the architectural bias during cortical development. Attention span, or a stimuli "decay rate" in the neocortex, is probably determined by the speed of reuptake for excitatory neurotransmitters. Most likely candidates are dopamine & norepinephrine, "the pay attention" neurotransmitters, necessary for signal propagation from primary to higher association areas.
The evidence here is contradictory because there are many feedback loops. I suspect that during prenatal / early postnatal development high levels of cortisol / low levels of serotonin increase the levels of phasic dopamine, which in turn upregulates dopamine reuptake. This leads to greater fluctuations in the levels of tonic dopamine and increased novelty seeking as opposed to long term focus. A tantalizing hint can be found in this study: http://jcn.sagepub.com/cgi/content/abstract/9/2/18: "To advance our understanding of attention-deficit hyperactivity disorder and medication effects we draw upon the evidence for (1) a neurotransmitter imbalance between norepinephrine and dopamine in attention-deficit hyperactivity disorder and (2) an asymmetric neural control system that links the dopaminergic pathways to left hemispheric processing and links the noradrenergic pathways to right hemispheric processing. It appears that attention-deficit hyperactivity disorder may involve a bihemispheric dysfunction characterized by reduced dopaminergic and excessive noradrenergic functioning. In turn, favorable medication effects may be mediated by a restoration in neurotransmitter balance and by increased control over the allocation of attentional resources between hemispheres." (J Child Neurol 1994;9:181-189).
It's also known that ADHD sufferers have fewer dopamine autoreceptors, leading to greater variations in its levels. This probably causes lower sensitivity to to dopamine due to less efficient receptors, such as D1.
Faster dopamine reuptake should reduce "vertical" signal propagation, causing constant novelty seeking for "primary" stimulation to keep the neocortex busy. ADHD can be remedied by the use of stimulants, most efficiently by reuptake inhibitors such as Bupropion.

The generalist vs specialist trade-off is somewhat ambiguous in terms modern societal utility:
- On one hand, speed & precision was far more important for survival "in the wild", which probably explains why apes likely have a photographic memory, superior to humans: Chimps beat humans in memory test.
- On the other hand, more recent functional differentiation of modern society rewards specialization & precision, & speed, probably more so than a generalization ability on the opposite end of cognitive diversity spectrum.

IQ tests are inherently incapable of capturing high generalization ability because of their time limits. The tests are supposed to be background-neutral, which means they can only measure an ability to discover patterns within data given to a subject during relatively brief test (except for verbal & math IQ, which are not background-neutral). That means they’re biased toward the speed of learning, & "sparse & slow" subjects will be at disadvantage.
The same bias is built into an educational system: the detail-oriented "dense" subjects would be better at passive knowledge acquisition. "Sparse" architecture will excel at independent knowledge discovery & critical thinking, but this is far more difficult to evaluate. Also, modern science accumulated a very substantial body of knowledge, which must be "passively acquired" prior to being able to make a novel discovery. This is a disadvantage for a generalist, & may help to explain why we haven't had a "new Einstein" in a century.
Moreover, it's a lot easier to recognize competence of a specialist than that of a generalist: we all share lower generality levels, which is where we get the original data, but the effective generality of the top associative levels definitely differs among individuals. I would speculate that this is why the quality of work in social sciences, & especially philosophy, is so vastly inferior to that in "hard" sciences.

On "On Intelligence" (edited)

Derek Zahn, via AGI list, with my response:

> It seems like a reasonable and not uncommon idea that an AI could be built as a mostly-hierarchical autoassiciative memory.
> As you point out, it's not so different from Hawkins's ideas. Neighboring "pixels" will correlate in space and time;
>"features" such as edges should become principle components given enough data, and so on.
>There is a bunch of such work on self-organizing the early visual system like this.
>That overall concept doesn't get you very far though; the trick is to make it work past the first few rather
> obvious feature extraction stages of sensory data, and to account for things like episodic memory,
> language use, goal-directed behavior, and all other cognitive activity that is not just statistical categorization.
> I sympathize with your approach and wish you luck.
> If you think you have something that produce more than Hawkins has with his HTM,
> please explain it with enough precision that we can understand the details.

I agree with you on Hawkins & HTM, but his main problem is conceptual.
He seems to be profoundly confused as to what the hierarchy should select for: generality or novelty. He nominates both, apparently not realizing that they're mutually exclusive. This creates a difficulty in defining a quantitative criterion for selection, which is a key for my approach. This inconsistency leads to haphazard hacking in the HTM. For example, he starts by comparing 2D frames in a binary fashion, which is pretty perverse for an incremental approach. I start from the beginning by comparing pixels: the limit of resolution. I quantify the degree of match right there, as a distinct variable, & also record & compare explicit coordinates & derivatives, while he simply junks all that information. HTM doesn't scale because it's not consistent & incremental enough.

Both generality & novelty are valuable, but only because they both contribute to predictive power,- the ultimate value.
Generality is a macro-dimension of cortical hierarchy because it itself is a retrospective predictive power.
Besides, it takes an extended, hierarchically differentiated, search to recognize generality.
With novelty, there're two different aspects: proximity & change. Recent inputs are relatively more predictive than the old ones by the virtue of their proximity to future inputs. Thus, proximity is a micro-dimension: order of search within every level of generality. It's not hierarchical because range of search & the resulting complexity of match is lower for novel inputs.
Change, on the other hand, has a "contrast" effect: its value is determined by, & subtracted from, the recurrent pattern it interrupts. In other words, change has "negative" value, it's important only to the extent that it cancels positive predictive value of the interrupted pattern. The change within noise does not interrupt any pattern & has no independent value.

I disagree that we need to specifically code episodic memory, language, & action, - to me these are "emergent properties" (damn, I hate that word:)).

Related discussion on "DoxSpot".