Monday, October 13, 2008

The Loebner Prize from a judge's perspective

This year, for the first time in its history, the Loebner Prize competition was held in England, at the University of Reading to be precise. It was organised by Kevin Warwick and Huma Shah.

Independently of whether Turing might have been pleased (he was not well treated in this country, recall?), there was a satisfying sense of “coming home” of the Turing Test (henceforth TT). Expectations were high, and they very highly advertised too. The meeting was perfectly organised.

Having been invited to play the role of a judge, together with several other colleagues, including two members of the IEG, Mariarosaria Taddeo and Matteo Turilli (here are their pictures and Rosaria's interview), I enjoyed the opportunity to see from close-up the machinery and the TT. It was intriguing and great fun.

Because there were interviews with the BBC and other things going on, and because we were also supposed to take part in the parallel AISB Symposium on the TT, I had time to test only one couple, instead of the shortlisted four. It was sufficient to reassure me that our machines are not even close to resembling anything that might be open-mindedly called intelligent.

My first question was “if we shake hands, whose hand am I holding?”. The human, as expected, immediately answered, metalinguistically, that we should not talk about bodily interactions, signalling that he was human, as I hoped. Indeed, he turned out to be Andrew Hodges, recruited on the spot to interact with me on the other side of the screen. The computer failed to address the question entirely, and spoke about something else. It was the usual, give-away, tiring, Eliza-ish strategy, which we have now seen implemented for decades.

My next question was: “I have a jewellery box in my hand, how many CDs can I store in it?”, again, Andrew provided some explanation, the computer blew it badly. More Eliza. By then, we were running out of time, so I asked one last question to the computer: “The four capitals of the UK are three, Manchester and Liverpool. What’s wrong with this sentence?”. Once again, the computer went bananas.

During the Symposium, which was organised and moderated by Mark Bishop with his usual ability, several people, Andrew and myself included, defended the view that a serious TT would have to last much longer than five minutes. But this is as much because of the examined agents, and of the slow means of communication (you have to write/read everything on a screen), as because of the judges, and their lack of training. If you need to test, and I mean really test, an artefact, the higher the stakes are, the tougher the procedure should be. We do not have the same standards when it comes to testing the safety of a house’s central heating system or of an atomic power station. Why (artificial) intelligent behaviour should be left to be tested by the untrained “man in the street” remains a mystery to me. Unless that is the sort of dude you wish to fool. If the TT at Reading scored less badly than it could have, this is also because some of the judges were asking useless questions like “are you a computer?”. This means having missed two essential points of the whole exercise.

First, answers must be as informative as possible, which means that one must be able to maximise the useful evidence obtainable from the received message. It is the same rule applied in the 20 questions game: they have to make a difference to your previous state of information, and the bigger the difference the better. But in the example above, either “yes” or “no” will leave you absolutely unenlightened as to who is what, so that is a wasted bullet.

Second, questions must challenge the syntactic engine which is on the other side. The more a question can be answered only if one truly understand its meaning, the more that question has a chance of being a silver bullet. The first question I asked was already sufficient to discriminate between the human and the machine. It took a minute.

It might be that the Loebner Prize should be re-thought more like a chess tournament, where we could play imitation games with different levels of time control: long games (up to seven hours), short games (30/60 minutes), blitz games (three to fifteen minutes for each player), bullet games (under three minutes) and even one-question games (one minute). The computer I tested could not even pass the latter. I gave it a zero.

Parallel to the Turing Test, the AISB Symposium was meant to provide plenty of food for the biological minds around. I enjoyed the lively interactions, and found the first half of Oven Holland’s talk about the Ratio club interesting and informative.

I disagreed with several people, however, about the following issue. There seemed to be some coalescing consensus on the view that a machine will pass the TT only if it will be conscious. This is certainly not the case. The TT is a matter of semantics and understanding. And although we might never be able to build truly semantic machines – as I suspect – consciousness need not play any role.

Which is not to say that a conscious machine would not pass the TT. For it would, of course. Nor is it to say that some smart applications might never be able to deal successfully with semantic problems by other means. Some already do (isn’t it handy that Google knows better and tells you that your keywords are misspelled and should be so and so?). But then my dishwasher needs no intelligence (let alone consciousness) to do a better job than me.

What it does mean is that, after half a century of failures and zero progress, some serious reconsideration of the actual feasibility of true AI is a must, and making things immensely more difficult cannot help (although it might give some breathing space to a dying paradigm).

Instead, the argument seems to be that, since we do not have the faintest idea about how to build a machine that can answer a few intelligent questions or even win the one-question TT, the best strategy might be to go full-blown and try to build a machine that is conscious. As if things were not already impossibly difficult as they stand. It is like being told that if you cannot make it crawl, you should make it run the hundred metres under ten seconds, because then it will be able to crawl. Surely there must be better ways of spending our research funds.

The fact that nobody agrees on what consciousness is can help only insofar as it makes cheating and fooling the judges easier. If anything may count as consciousness, the game becomes easier. Turing, of course, knew better. He refused to define intelligence, so we should follow his advice and perhaps adopt a test for consciousness. I provided one in Consciousness, Agents and the Knowledge Game (Minds and Machines 2005, 15. 3-4, pp. 415-444), but I am sure other can be devised.

All in all, it was an instructive and entertaining experience, congratulations to all the Humans for passing the test of a successful meeting.

5 Comments:

Anonymous Anonymous said...

Check out this Web 2.0 approach to chatbots: http://chatbotgame.com.

Just as Deep Thought brute-forced it in chess with speed, the idea behind the Chatbot Game is to brute-force it with a huge number of user-submitted Google-like chat rules.

Tue Oct 14, 03:55:00 PM  
Blogger Luciano Floridi said...

Interesting, thank you!

Tue Oct 14, 04:28:00 PM  
Blogger Huma said...

Hi Professor Floridi:

The Loebner Prize has been held in the UK three times prior to the 18th manifestation of the contest in Reading.

In 2001, it was held at the Science Museum. In 2003 (my very first attendance) it was hosted by the University of Surrey. In 2006, it was held in UCL's Torrington campus (organised by Tim Childs, CEO of Televirtual, and myself).

Once again, thank you for agreeing to participate with such a busy schedule :-)

Thu Oct 23, 07:29:00 PM  
Blogger Huma said...

One other thing, Professor Floridi, it is Turing himself, in his 1950 paper, who used the term "average interrogator". There's a link to the paper on Hugh's Loebner Prize site:

http://www.loebner.net/Prizef/loebner-prize.html


I agree with you, re consciousness not being necessary in any machine that will pass Turing’s imitation game.


Finally, the machine that you interacted with, in Loebner 2008, was jabberwacky; it did not come first or second in the Reading University hosted, 18th Loebner Prize contest.

Eugene, the runner up to winning entry Elbot, managed to deceive one human interrogator (a Times newspaper journalist, no less), that it was human.

During the preliminary phase of Loebner 2008, in June, this is what one of judges wrote of Eugene: “You could ask it the following: "My car is red. What color is my car?" It gave the correct answer of "Red" whereas all but two other programs either couldn't comprehend the question (or that there was a question) or just took a random guess."

http://en.wikipedia.org/wiki/Eugene_Goostman

Fri Oct 24, 10:42:00 AM  
Blogger Jarno said...

Slowly slipping into the abyss of my own neurological network by means of @Dawn21stcentury, intrigued as I am by the impact of accelerating technology on society, coming to grips with the fact that it is time for me to turn my back on my Business Background and focus instead on the field of Philosophy of Information, I would like to say: Thank you for this post and this blog in general! I've recently written this rather elaborate post on Artificial Intelligence (if I only knew why..)and your perspective as a judge was therefore great food for thought!

Hartelijk bedankt & greetings from Holland

Mon Oct 27, 11:56:00 AM  

Post a Comment

Links to this post:

Create a Link

<< Home