Thom's Participation in the Loebner Competition 1995
Contest and Re-Evaluated My Concept of Humanity
by Thom Whalen
The Loebner Prize
Competition is a restricted Turing test to evaluate the "humaness"
computer programs which interact in natural language.
In 1994, I
won by a hair the
Loebner Prize Competition in San Diego, California. In
considering how poorly my program performed, I am still surprised that I
In 1995 the competition was held in New York on 16 December.
I felt that I could not help but do better.
The 1995 rules were changed from previous competitions. For the first
the judges would be permitted to ask any question that they wanted,
being restricted to a particular topic for each program. This was
make the competition more difficult.
In order to accomodate the "no-topic" rule, I decided that the best
would be to try to model a human being. I would not simply try to answer
questions, but would try to incorporate into my program a personality, a personal
history, and a
unique view of the world. In short, I would try to invent a person.
This may sound daunting until you realize that people have been
inventing human beings for centuries; every novel or a play is populated
invented people. For example, Sir Arthur Connan Doyal created a complete
personality, personal history, and unique world view for Sherlock Holmes
was so compelling that many people believe he was a historical figure.
The only difference was that I would have to make a character which
respond to a variety of inputs. I had done this before in a simulation
conversation with a university undergraduate. This time, with more
experience and newer, more powerful software, I could surely make an
To limit the scope of the conversation, I decided to create a character who had a
narrow world view; who was only marginally literate and, therefore, did
books or newspapers; and who worked nights, and, therefore, was unable
Furthermore, to provide some direction for the
conversation to develop to try to capture the judge's attention, I
minor mystery plot. He would be a janitor who was about to lose his job.
conversing with him, you could find out that he was actually the victim
deliberate slander and learn enough to tell him how to keep his job.
I spent three months writing the conversation and testing it on the
As the deadline approached, I had second thoughts about entering the
competition at all. Unlike the previous four competitions, in 1995,
could not participate through any kind of communications medium. They
required to run on site at the Salmugundi Art Club in New York City.
My program has been developed on a Sun SPARC workstation, and would
on a PC. I did not relish the thought of trying to carry a SPARC to New
They do not work well when they are not connected to a network, they do
under an airplane seat, and I did not want risk having my primary
platform stuck in some customs broker's office for weeks while I missed
Hugh Loebner agreed that I could enter the competition contingent
supplying a computer for me to use. And he did. Sun computers agreed to
SPARC workstation to the competition.
My program, Joe the Janitor, was in. I was committed.
As the date for the contest approached, I devoted a couple of weeks
implementing the mundane technical details that would be required for
I decided that the easiest way to configure my entry was to have the
communicate with a PC via the serial port. That way the apperance of the
would be identical to the human confederates' screens. An added bonus was that
would take responsibility for collecting the transcripts in the required
All I had to do was make my program communicate through the serial
port on a
standalone SPARC using the communications protocol specified for the
To get control of the serial port, I poured through the UNIX
manuals to learn all about "ioctl()" and "termio.h" and "non-cannonical
and other mysterious UNIX incantations.
I also poked and prodded Loebner's communications program to learn
double carriage returns and "CCC99" handshakes and other arcane rites.
Next, I had to learn all about how Sun Workstations are administered
stand-alone mode. Sun's motto is "The computer is the network." My worst
nightmare was traveling all the way to New York and then finding that I
not get the SPARC running properly. I thought that Sun would probably
computer that worked in standalone mode, but I could not risk being
unawares if their machine expected to find a network plugged into the
port. So I learned about more obscure UNIX incantations called, "boot
"localhost 127.0.0.1" and "hostname.xx0".
Finally, I had to introduce realistic keystroke delays, typing
thoughtful pauses into the output of my program. Unlike the previous
judges would be seeing the output of the program displayed character by
character. The program not only had to appear to understand English, it
look like there was a human being typing the answers.
Armed with my program disk, a sheet of instructions for configuring
standalone mode, another sheet of instructions for communicating with
program, and my own cables and manuals -- just in case -- I drove to the
airport on Thursday afternoon.
I toted my suitcase across the airport parking lot in -20 C, a wind
steadily at 30 km/hr, and 5 cm/hr snow accumulation -- in technical
Canadian mid-winter blizzard -- wondering if the airplane would be able
off at all. But my fears were for nought. Air Canada was not about to be
deterred by a little adverse weather.
In New York, I found that their balmy +3 C with no percipitation was
for my eiderdown parka with the fur-rimmed snorkle hood.
Something else to make me sweat through the next two days.
Friday morning I found the Salmugundi Club with the help of my cab
("What? Fifth Avenue? Where on Fifth Avenue? You don't know the cross
Are you sure you don't know the cross street? How about guessing. Maybe
street? Does that sound right? No? Well, pick a cross street!") and found
Loebner waiting for me.
Let me make this perfectly clear. I found that only
Loebner was waiting for me. ("Staff? Help? No, there's no one else. I'm
this contest myself. Like Blanche in 'A Streetcar Named Desire,' I'm
the kindness of strangers.")
He favored western-style string ties. ("Howdy, Stranger!").
There was a room stacked high with some thirty-odd crates. Sure
enough, two of
these crates held a SPARCstation and monitor and the rest held IBM PCs
monitors. None of these crates held null modem cables. None of these
power bars. And none of these crates held the video multiplexers
show the contest to the audience.
Fortunately, I brought my own null modem and cables. Hugh went out
a couple of power bars.
The SPARCstation that Sun delivered was perfectly configured. There
need for "boot -s" or "localhost 127.0.0.1" or even "mv hostname.le0
hostname.xx0". Twenty minutes later Joe the Janitor and I were on
Hugh Loebner got FRED, Robby Garner's program, running and announced
had a contest. Even if no other contestants or confederates showed up,
had two computer programs that could compete against each other.
I was not going to win by default.
I spent the rest of the day fiddling with Joe. Tweaking this and
that; uncertain whether I was improving his performance or introducing
bugs. But I was too nervous to leave him alone.
On Saturday, Joe the Janitor would face Joseph Weintraub's program,
PC-Therapist, which had won the first three Loebner Competitions. Though
and I had both won Loebner medals in previous years, we had never
the same competition.
The courier did not arrive with the promised cables. Hugh went out
more power bars. He had some new null modem cables custom made.
That evening, two other competitors, Philip Maymin and Joseph
arrived at the club. We ate a Christmas dinner, and then played some pool. Hugh
a game called "cowboy" ("Howdy, Stranger!"). I won our game. At least I
that I won something in New York.
We ended the evening making sure that the other programs
Now we had a four-way contest. As well, the courier had finally delivered
cables and null modems, so we would have a contest that an audience
The next morning, bright and early (8:15 AM), we started setting up
for the competition.
Unfortunately, the competition was held in the same
the Christmas dinner, so there was no way to set up the computers before
of the contest. Hugh ("I depend on the kindness of strangers"), Philip,
superintendent, and I rolled up our sleeves and began uncrating the IBM
carrying them up the stairs. I dearly wish elevators had been invented a
and fifty years ago when the Salmugundi Club was founded.
To be honest, I rather enjoyed helping set up the computers. It gave
something more productive to do than to sit and
how Joe would perform.
In three hours Philip, Hugh, and I managed to set up a dozen PCs, one
and twenty monitors, install the communications software everywhere,
right machines together with the null modems, and install curtains to
the judges from the confederates and the audience.
When the confederates arrived, Hugh led them to their terminals
and gave them their instructions. The
judges arrived and were
introduced. The competition was begun. Judges typed questions for
minutes on each terminal and programs and confederates responded. The
watched. Philip and I watched. Joseph Weintraub spent most of his time
club lounge, cool and confident.
In the second round, the judges were given an additional five minutes
query any terminal that they were uncertain about. None of the judges
trying Joe a second time. I knew that was a bad sign.
Finally, the judges were asked to rank-order each terminal from most
The results were tallied and Hugh announced the winner: "Joseph
Actually I came in second, but losing to Joseph Weintraub was still
Robby Garner from Robitron came in third. He was at a clear
because his program, FRED, ran from DOS so his screen looked different
other seven screens.
Philip Maymin's strategy was to minimize the judges' opportunity to
with his program. It produced very long output at a painfully slow
Many judges only had an opportunity to ask a single question and we only
about three different answers during the whole contest. Cute idea. The
were not impressed.
After the competition, we talked to the judges. They were mostly from
media and unanimously agreed that they enjoyed being judges. They were
hurry to leave and the journalists among them took the time to interview
everyone in sight.
I was disappointed that Weintraub won again, but the rules were clear
There are lessons for me to learn. Several of my hypotheses were
Or at least cast into strong doubt.
First, I had hypothesized that the number of topics that would arise
open conversation would be limited. If you look at Dale Carnegie, an
making small talk to strangers, he states that strangers talk about (in
- their names
- where they live
- where they used to live
- people that they know in common
- the weather
- books, television, movies and music
I believe that he is correct and I programmed Joe to have some
common questions on each of these topics.
My error was that the judges, under Loebner's rules, did not treat
competitors as though they were strangers. Rather, they specifically
program with unusual questions like, "What did you have for dinner last
or "What was Lincoln's first name." These are questions that no one
ask a stranger in the first fifteen minutes of a conversation.
Robby Garner's program, FRED, encountered the same problem for about
reason. It was prepared to answer questions about various aspects of his
personal life, but the judges never asked any questions which produced
Second, I hypothesised that, once the judge knew that he was talking
computer, he would let the computer suggest a topic. I do not believe
existing computer program can seriously pretend to be a human being for
than a half-dozen interactions, so I consider the human confederates to
be a red
herring. I believe that the real issue is whether my program appears
than the other programs.
Thus, my program tried to interest the judges in Joe's employment
soon as possible. This would lead most quickly to the richest
because this was the part of the program that had been most highly
I was surprised to see how persistant some judges were in refusing to
discuss Joe's job. It seemed that the judges would rather see the
"I don't know," twenty times in a row to various strange questions than
reasonable responses to questions about why he is worried about losing
I guess they really wanted to hammer home the point that Joe is not a
Third, I hypothesized that the judges would be more tolerant of the
saying, "I don't know." than of a non-sequiter. Thus, rather than having
program make a bunch of irrelevant statements when it could not
questions, I simply had it rotate through four statements that were
with "I don't know."
Weintrab's program, however, was a master of the non-sequiter. It
continually reply with some wildly irrelevant statement, but throw in a
qualifying clause or sentence that used a noun or verb phrase from the
question in order to try to establish a thin veneer of relevance.
I am amazed at how cheerfully the judges toleranted that kind of
can only conclude that people do not require that their converstational
be consistent or even reasonable.
But I am not ready to draw any conclusion about whether this is a
problem with the Turing test. Remember that we are talking about
partners that are fairly quickly recognizable as computer programs. To
completely human, I would expect (hope) that the program would have to
more responsive to the questions that were asked.
Fourth, I hypothesized that a critical component of "humanness" was
personality. I felt that it was important that my program have a
I think I was successful in this. In discussion with the judges after
contest, when I confessed to being Joe's creator, one of the judges said
thought Joe had the best defined personality. I then asked if he rated
the most human computer and he said, "No." I probed and prodded for a
minutes, but he could not explain why he thought that humanness was
than having a human personality.
I am still puzzled about that.
The failure of all four of my hypotheses leaves me in a quandry.
I believe that I could modify Joe to beat Weintraub simply by
"I don't know" part of the program with a little Weintraub/ELIZA style
I estimate that it would take about two weeks of effort to produce a
that would be adequate for my purposes, though much less sophisticated
Weintraub's. Then my program would still answer all of the questions
already does, but, when it encountered an unfamiliar question, it would
say "What?" or "I don't know." Rather, it would just introduce a new
as smoothly as Weintraub's program, but smoothly enough.
But I don't know if I want to do that. Making that modification would
purpose whatsoever, apart from winning the next competition. The primary
my TIPS software, like Robby's FRED, is to create useful information
such as computer help systems, not to win Loebner's competition. I do
believe that Weintraub's approach, which follows in the footsteps of
Weisenbaum's ELIZA, will ever lead to a useful way to deliver factual
Lying awake in the middle of the night in my hotel room after losing
competition, I wrote out a list of eight major enhancements to my software
make it a more powerful information delivery system. I would much rather
my time implementing these enhancements than re-implementing an enhanced
that would not be good for anything.
As well, I am philosophically enamoured with the idea of writing a
which models a real human being rather than a program which simply tries
field random questions. And I am philosophically opposed to writing a
that performs syntactic tricks without any interesting semantics (or
specifically, semantic-based pragmatics). I would rather keep trying to
human being that will do better than Weintraub's program than to beat
him at his
own game. If I have to resort to ELIZA's tricks, then I will be
fundumental flaw in my own approach.
The next contest will be held in April '96. I enjoyed entering the
competition immensely in the last two years. I would encourage everyone
in natural language to think about entering the next one. If for no
than to show that ELIZA-style programs are not the epitomy of
For myself, I do not know what the future will bring. Except that I
that I will keep developing my software. And I will start thinking
about what it means to judge a conversational partner as "human."
A Retrospective Note from 2013: I always intended to enter the Loebner Prize Competition again, but my
research took me in other directions and I and never returned to natural language
programming. It's a pity that life doesn't give us time to do everything we want because
I would have liked to have tried the competition again.