My Experience with the 1994 Loebner Competition
by Thom Whalen
The Loebner Prize
Competition is a restricted Turing test to evaluate the "humaness"
computer programs which interact in natural language.
This is true story of my experience as a Loebner contestant in the 1994 Loebner Prize competition.
You can decide if this is tragedy, farce, or horror.
The first step in entering, (after writing the program), was to submit
application for the first cut. The implication was that only a
programs would be accepted for the contest. I suspect that any program that
close to reasonable was accepted, but I am not sure.
After being notified that my program was accepted as a contestant, I
a schedule of testing dates when my modem should be turned on and I
available by phone. These testing dates turned out to be only
approximate and I
was not contacted by phone (as near as I can remember; maybe once or
any event, about two weeks before the contest date, I was assured by
all was working fine, so I concentrated on final tune-ups to the natural
language program; a tricky business because there is a real danger that
get a wonderful idea at the last minute that you cannot resist trying to
implement and end up introducing a disasterous bug into your program.
The day before the contest was a disaster. I came into my office,
that my safest strategy was to leave the program untouched until the
over, but found a message in my email box which said that final testing
found that my program was failing to handshake with their program, using
simple and obvious protocol specified in the official rules, and that I
disqualified unless the problem was solved instantly.
The difficulty was that their turn-taking protocol sent two carriage
to signal the end of the judge's turn. And, by the way, they would
a line feed after every carriage return.
I assume that the logic was that this is the protocol that is
on human chat systems like IRC.
I discovered, after a couple of hours of frantic debugging, that the
was not in my program, but in UNIX. The UNIX shell automatically
line feed to a carriage return, so my program was receiving a plethora
newline characters and was responding badly to them. I missed the
solution, which was to modify my program to wait for four carriage
returns in a
row to signal an end of turn. Instead, I spent a frantic day running back and
my UNIX administrators trying to figure out if I had to hack the shell,
the termcaps file, or sacrifice a goat under a full moon to convince the
to let the line feeds remain as line feeds.
The full moon was not
dead goats later, we got the shell to stop converting the linefeeds by
the login script to set the right environment variables inside a
then spawned the program. Then, I had to modify my program to terminate
linefeeds instead of two carriage returns (in violation of the exact
the official protocol, which specified responding to carriage returns
I dwell on this, not only because it was so traumatic, but because I
a fact that I was not the only contestant who had difficulties with the
at the last minute. The problem is that we are applications-level
and the turn-taking protocol relies on transport level interactions over
we have little control. And that, as near as I can tell, the preliminary
was conducted by sending a bunch of empty carriage returns to the
without paying attention to the results, so that our failure to adhere strictly
to the turn-taking
protocol were not discovered until the final testing.
As a consequence, when the day of the contest arrived, I did not have
idea whether my program had survived the final testing or was
sat in my office, thankful that I had not traveled to California to
contest, but stayed in Ottawa in case there were last minute problems,
watched my modem for three hours, wondering if the contest organizers were going to log in.
Log in they did. And hung up. And logged in. And hung up. Finally, at
noon in my time zone, they phoned to tell me the schedule for the day. I
inferred that my program had not been disqualified and settled in for an
additional ten hours of nail biting over how badly my program would
I had adopted a high-risk strategy and fully expected my natural
system to crash and burn ignomiously.
For reasons unrelated to
contest, I had been putting all my effort into a sex information system and that was what
I had submitted to the contest.
Sex is a difficult topic linguistically. It is an especially broad topic, covering
everything from how to meet a girl to statistics about herpes
infections. It is
also a topic in which synonyms abound. You would be amazed at the number
synonyms for the female breast. Many of these synonyms are culture-,
gender-specific. It is also a topic in which oblique phrasings are de
Phrases like "do it" is common, both in a specific context where it
to a specific act, and in the general context where it may either refer
generally to sexual activity in or narrowly to sexual intercourse.
Its difficulty makes it the perfect topic to exercise a new natural
shell. It also makes it a terrible topic for a public competition and I
it would perform badly.
All the testing was done over the Internet.
I imagined the typical user as a young male computer scientist who has a
sexual fantasy life, but has never had an actual girlfriend. A typical
that my sex information system expected to answer was something like,
"How do I
find a girl who will rim me."
You don't have to be Einstein to know that
middle-aged woman judge is going to stand in front of a television
type that on a computer terminal.
I was painfully aware that the judges were from a different
probably had a lot more sexual experience, and were in a different
than my intended user population.
I rationalized my choice of sex as a topic by telling myself that, at
was the most human topic that I could imagine and that the judges might
impressed by its broad range of knowledge and wonderfully detailed,
generally politically correct, answers. But there was no way on earth
anyone would ever mistake my program for a live human being.
I had the additional worry that I had deliberately not told my
supervisor, senior management, or anyone else in the government that I
working on a sex information system; that I had let approximately 10,000
call a Canadian government computer and ask blunt questions about sex over the
four months; and that I was now displaying this system to the
press without their knowledge or permission.
Even the hint that a
be raised on the floor of the House by the Opposition as to why the Department of Industry
providing sex information to the public without the knowledge, consent,
participation of the Department of Health would have been sufficient to
project down and force my withdrawl from the contest. The Official
loses no opportunity to embarrass the government and the goverment never
hesitates to protect itself from potential embarassment.
Even though I managed
to get as far as the contest without being discovered and shut down, the
potential political fallout after the contest made me more than a little
So I sat and watched the transcripts scroll up my screen and waited
to see it
My most pessimistic predictions were dead-accurate.
In my laboratory, our rule of thumb is that natural language
systems are usable if they answer 50 per cent of questions
they will not be liked. If they exceed 65 per cent, they will be
if they exceed 75 per cent they will be very well-liked. In the Internet
testing, my sex system was exceeding 80 per cent and people were
indicating that they liked using it. When the competition started, it
below 20 per cent for the first judge.
I thought briefly about
the modem and pleading unresolvable technical difficulties, or at least
I did not take the insanity plea, but hung in there and watched its
performance come up to about 50 per cent by the end of the competition.
suspect that performance was improving during the course of the contest
the judges were learning how to ask questions that were more likely to
meaningful answers. Bad as overall performance was, a detailed look
even worse picture. As I expected, many of the questions were on the
of the topic, a clear consequence of the judges trying to avoid blunt
about sex in a public forum, so that fully a third of the system's
consisted of an appropriate, "I have no information about that." Overall
per cent of the questions typed by the judges elicited correct
Just when it looked like things could not get any worse, my program
lost its mind. In essence, my program navigates around a kind of
due to a programming error, it was able to navigate right off the edge.
longer recognized any words at all. Fortunately after a thirteen
responses of "I
cannot give you an answer to that," to simple, obvious questions from
different judges, the human referee recognized that it had gone brain
rebooted the system. I did not expect to win any points with those
The contest organizers had promised to phone and let the losers know
they had lost so that they would not have to spend the night waiting for
nothing. In my time zone, after 10:00 PM, it looked like the contest had
over an hour earlier, and I was already packed up and waiting for the
"Better-luck-next-time" call which would send me home to commiserate
Instead the caller told me that I had won. My first reaction was
rather graceless thought that the other programs must have really bombed
program's miserable performance was rated the highest. It did occur to
maybe I was the only computer contestant because all the other programs
disqualified for failing to respond properly to carriage returns.
with the immediate problem of trying to sound pleased and excited in the
press conference after spending two days and a night in black
Perversely, I was pleased when I got the final results and saw that
programs did not do too badly. My program was the winner by a technical
decision, not a knockout. Only three of the five judges ranked it
one judge ranked it as the worst of the bunch. There may be hope for AI
Monday morning, I met with my director and confessed that I had won an
international competition and been interviewed on CNN. Neither one of us
wanted to dwell on the topic of my entry. When the cheque arrived, I signed
it over to the government — my research is government property — but I kept
the bronze medal, which is the real treasure in my mind.
Through the rose-tinted filter of hindsight, entering the Loebner Prize Competition
was an adventure that
not have wanted to miss.
I expect to do a lot better next year.