IHT Article Print Page

'Chatterbots' Fail to Fool the Judges

Conrad de Aenlle International Herald Tribune
Monday, October 15, 2001

Contest Shows Computerspeak Is Still Far From Human Conversation

LONDON Distinguishing between a computer and a human being is easy: Present a test subject with a nearly unsolvable problem. If it performs millions of operations per second and comes up with the right answer, it's a computer. If it gives up in frustration, shouts obscenities, then grabs a beer from the fridge, it's a human being.

The nearly unsolvable problem for participants in the annual Loebner Prize competition, staged Saturday at the Science Museum in London, is to design an artificial-intelligence software program that mimics human thought and communication so closely that half of a panel of judges sitting at computer terminals, chatting with various programs, mistakes one for the real thing. Hidden human operators hold up the other end of some of the conversations.

The prize - a gold medal and $100,000 - is not likely to be awarded soon. One of the stipulations made by Hugh Loebner, an American business owner, when he created the contest 11 years ago was that a program must be able to respond verbally and visually; it must sound and look like a person as well as appearing to think like one.

The prize is based on the Turing Test, devised in the 1950s by Alan Turing, a British mathematician and World War II code-breaker. An optimist, Mr. Turing thought the test would have been passed by 2000.

There is a second Loebner prize, a silver medal and $25,000, for a program that can fool half the judges by conversing in text. No one has won that, either, but each year $2,000 is awarded to the program deemed most human by the judges, who this year included a cybernetics professor, a Science Museum director and a layman with no particular technological expertise. Mr. Loebner also served as a judge.

The design of the type of program being showcased, called a "chatterbot," or a robot that chats with people, is a burgeoning field. It is still mainly experimental, but commercial applications are anticipated once the technology improves. Chatterbots could be used to build more sophisticated Web search engines, for instance, or to serve as virtual secretaries or sales people.

The effort to perfect chatterbots reflects a fact confronting practitioners of artificial intelligence: Computers need to think and talk like people because people do not think and talk like computers. We do not always mean what we say, we express the same ideas in different ways, and we use the same words to express different ideas. And sometimes we say stuff that doesn't mean much of anything.

"Humans and robots communicate very differently," said Richard Wallace, chairman and co-founder of the Artificial Intelligence Foundation in San Francisco, whose project is known as ALICE. "We engage in idle chat with no real purpose. Computers give precise, truthful answers."

ALICE, which can be found at www.alicebot.org, won the $2,000 prize this year and last year. It is something of a cybercelebrity, serving on the Web site of the Steven Spielberg film "AI: Artificial Intelligence," a pretty good gig for a chatterbot. ALICE, short for Artificial Linguistic Internet Computer Entity, is free, or "open source" technology, as opposed to proprietary software.

Capturing human mannerisms - developing "the personality of the bot," as one designer called it - is the tricky part of building a chatterbot. Computers have lots of data stored in them, but they seem geeky and mechanical when they communicate it, like Spock on "Star Trek." Instead of saying it's about 12:30, a computer might say it's 12:29 and 18 seconds - more accurate, maybe, but not very human.

But success has been limited. Chatterbots are still more bot than chatter. That is why the text-based silver prize remains unwon. They were far too clumsy Saturday to fool the judges.

Some responses resembled infantile chatroom banter.

Judge: "Do you know what a non sequitur is?"

Program: "Yes, but I'm not going to tell you; trade secret."

There was a lot of this puerile coyness, no doubt intended as a more clever way to concede programming gaps.

Judge: "How are you feeling today?" Program: "I know you know, but I don't know."

Other responses could have been spoken by HAL in a low-budget sequel to "2001: A Space Odyssey." Program: "Hello, Judge 04, how are you today?"

And there was the bizarre and vaguely psychopathological.

Judge: "I'm a scientist and adhere to the principle of parsimony. How about you?" Program: "I am god."

One stab was taken at the very human trait of humor. When told: "You are making mistakes. Are you feeling OK?" the reply was, "I need oil." It turned out that that response came from a human operator.

"Some are giving very straight answers, some are giving very convoluted answers," said Lisa Jamieson, a student who served as the judge representing the nontechnical public. She wondered whether the human confederates could have been trying to fake her out by mimicking computers.

"They could be people who are too clever by half, or maybe it's the computers that are too clever by half," she said. "I don't know if they don't understand my questions or if they're trying to double-bluff me."

In the end, no one was clever enough to win either of the big prizes, and Mr. Loebner expressed disappointment with the overall quality of the programs on display. After the judging, he offered an uncomplimentary one-word description, then amended it, saying: "I see a lot of room for improvement."

Miss Jamieson said the ALICE program was clearly best, maintaining its conversation by giving complete, sensible answers and asking questions of its own. Mr. Wallace stood up for ALICE and its rival chatterbots, saying they had acted more human than the judges had given them credit for.

"The contest is flawed," he said. "The judges know they're trying to 'out' the robot. They ask aggressive questions. It's different online. People don't know they're talking to a robot."

Maybe they are better off not knowing. Mr. Wallace said he had received a letter from a woman who had threatened to sue after discovering that his bot, of which she had grown fond, was not what it seemed.

"She said she'd fallen in love with it," he said. "Our robot broke her heart."