If a computer program took the SAT verbal analogy test and scored as well as the average college bound human, it would raise some serious questions about the nature and measurement of intelligence. Guess what?
Artificial intelligence with human-level performance on SAT verbal analogy questions has been achieved (warning: PDF) using corpus-based machine learning of relational similarity. Peter D. Turney's Interactive Information Group, Institute for Information Technology of the National Research Council Canada, achieved this milestone.
The timing of this achievement is highly ironic since this is the first year that the College Board has given the SAT's without the verbal analogy questions.
For the last hundred years many researchers have claimed that analogy tests are among the best predictors of future performance via their strong correspondence with the g factor or general intelligence, while others claimed this is a mismeasure of man with severe political ramifications and questionable motivations.
Is this a true breakthrough in AI or is it just the mismeasure of machine?
Dr. Turney's group developed a technique called Latent Relational Analysis and used it to extract relational similarity from about a terabyte of natural language text. After reading a wide variety of documents, LRA achieved 56% on the 374 verbal analogy questions given in the 2002 SAT. The average college bound student score is 57%. These are statistically identical scores.
Everyone is familiar with attribute similarity -- when we say two objects are similar we usually mean they share many attributes such as employer, color, shape, cost, age, etc. An example of a statement about attribute similarity is "Mary has the same employer as Sally." Relational similarity -- when we say two pairs of objects have similar intra-pair relationships -- is only a little less familiar. An example of a statement about relational similarity is "John's relationship to Mary is Thor's relationship to Mjolnir." (Perhaps John was the unnamed 'employer' in the attributional statement.)
We can see two things from this example:
- Relational similarity underlies analogy.
- Relational similarity underlies metaphor.
The basic idea of Gentner's structure-mapping theory is that an analogy is a mapping of knowledge from one domain (the base) into another (the
target) which conveys that a system of relations which holds among the base objects also holds among the target objects. Thus an analogy is a way of noticing relational commonalties independently of the objects in which those relations are embedded.
But a mathematical theory of relational similarity was to have been the crowning achievement of the 1913 publication of the final volume of Principia Mathematica -- something Bertrand Russell called "relation arithmetic".
Russell was adamant that without relation arithmetic people are prone to misunderstand the concept of structure and thereby fail in the empirical sciences:
I think relation-arithmetic important, not only as an interesting generalization, but because it supplies a symbolic technique required for dealing with structure. It has seemed to me that those who are not familiar with mathematical logic find great difficulty in understanding what is meant by 'structure', and, owing to this difficulty, are apt to go astray in attempting to understand the empirical world. For this reason, if for no other, I am sorry that the theory of relation-arithmetic has been largely unnoticed. Bertrand Russell "My Philosophical Development"
Unfortunately, their formulation of relation arithmetic had a defect.
I've had a career-long interest in subsuming information systems in a relational paradigm. When contracted to work on Hewlett-Packard's E-Speak project, I was able to hire (only after threatening to resign when told I had to hire only h-1b's from India for this work -- but that's another story) a science philosopher named Tom Etter, whose work I had heard of from Paul Allen's Interval Research. I set Tom to the task of reformulating relation arithmetic for use in HP's E-Speak project. As a result of this work, lasting a few months before before the E-Speak project ran into trouble, he was able to produce a paper titled "Relation Arithmetic Revived" wherein he describes the new formulation:
Here is relation-arithmetic in a nutshell:
Relations A and B are called similar if A can be turned into B by a 1-1 replacement of the things to which A applies by the things to which B applies. Similarity in this sense is a generalization of the algebraic concept of isomorphism. If, for instance, we think of a group (as defined in group theory) as a three-term relation x = yz, then isomorphic groups are similar as relations. The relation-number of a relation is defined as that which it has in common with similar relations. Relation-arithmetic was to be the study of various operators on relation-numbers.
For reasons that will become clear below, we'll substitute the word shape for Russell's term relation-number. Thus, in our current language, the shape of a relation is what is invariant under similarity. Note that these three words have analogous meanings in geometry.
If we substitute congruence for similarity in the [Russell's - JAB] definition of relation-number, then operators like product and join can in fact be defined in an invariant way, and Russell's conception of relation-arithmetic makes sense. Since Russell's definition of these words is not in general usage, this substitution should not produce confusion, so let us hereby make it:
A relation-number is defined as an equivalence class of partial relations under congruence.
In other words, relational congruence provides relations in context that can be composed to yield new relations -- and relational similarity provides relational shapes whose importance is more abstract. Russell and Whitehead failed because they were trying to come up with a way of composing shapes out of context. (The context-dependent relation numbers of Etter's relation arithmetic are a more general form of "attribute similarity" described above.)
Given this understanding of Russell and Whitehead's work, Turney's group has, at the very least, made a major advance toward bringing practical natural language processing into greater consilience with a wide range of science and philosophy, and conversely, brought those ranges of science and philosophy closer to practice.
Controversy In 'g'
For the last century a controversy has raged over the significance of something cognitive psychologists call the "g factor" or "general intelligence". Indeed, Charles Spearman invented factor analysis to test for the existence of an hypothesized general factor underlying all of what we think of as intelligent behavior. Spearman used a variety of tests for intelligence and then looked for correlations between them. He invented factor analysis so he could find common factors between these correlations. Spearman was strongly influenced by Charles Darwin's cousin, Francis Galton. Galton was one of the earliest proponents of eugenics, and invented the statistical definition of correlation to study the degree of heritability of various phenotypes, including intelligence. Eugenics is a highly controversial field so we should be unsurprised that the g factor, originating as it did with such a controversial area of research, has resulted in a long-standing dispute.
What is not in dispute is that analogy tests correlate most strongly with Spearman's g. What is in dispute is whether verbal analogy tests are culturally neutral enough to be a fair measure of g independent of education. In other words, no one disputes that a high score on verbal analogy tests are evidence of high g -- they merely dispute whether low scores on verbal analogy tests imply low g.
Most objections to the use of analogies tests to measure general aptitude claim they are reducible to little more than "rote memory" tasks. Quoting the
Victoria, BC health site on autistic savants:
In all cases of savant syndrome, the skill is specific, limited and most often reliant on memory.
This sounds a lot like the objections raised by the opponents of the use of verbal analogies tests. Finding an autistic savant whose specialized skill was to do exceedingly well on verbal analogies would go a long way toward validating this view of verbal analogies and hence the view that Turney's accomplishment is not the AI breakthrough it might appear to be.
On the other hand we must remember that a sufficiently compressed "rote memory" might be indistinguishable from intelligence. A genuine AI program, assuming it could exist, itself can be seen merely as a compressed representation of all behavior patterns we consider "intelligent" and the Kolmogorov complexity of those behaviors might not be as great as we imagined them. Taxonomies of intellectual capacity which place analogy and metaphor along-side critical thinking are quite possibly compatible with a sufficiently compressed description of a very large "rote memory".
The most widely-read attack against the g theory to date has been Stephen J. Gould's The Mismeasure of Man. Gould summarizes the objections to g theory:
"[...] the abstraction of intelligence as a single entity, its location within the brain, its quantification as one number for each individual, and the use of these numbers to rank people in a single series of worthiness, invariably to find that oppressed and disadvantaged groups--races, classes, or sexes--are innately inferior and deserve their status" (pp. 24-25).
Recently this on-going controversy has boiled over into college admissions with the College Board removing verbal analogies from the SATs as of 2005. Ironically there is now an argument raging over whether this change biases the SATs against whites and for blacks and Hispanics or whether it biases the SATs against blacks and Hispanics and for whites. We can certainly expect this debate to continue without resolution since it seems rooted as much or more in ethnic politics than science.
And of course none of this has stemmed the century-long pattern of on-gong research indicating that analogy tests are highly predictive of future performance as well as disputations of the validity of such research.
It is precisely the sustained acrimoniousness of this debate that renders Turney's accomplishment so refreshing -- for regardless of your viewpoint, machines are not a voting bloc. Either this work shows itself to be a turning point in the progress of artificial intelligence, or it will merely lead to mundane benefits such as better search engine results. This is just the start of what will undoubtedly be a long series of measurements of artificial intelligence quality.
The question before us now is whether Latent Relational Analysis' human-level performance on verbal analogies truly represents an artificial intelligence breakthrough or whether it merely represents the mismeasure of machine.