You, Robot

The Boston Phoenix
December 11 - 18, 1997

[Features]

You, robot

The high-tech world is littered with failed attempts to make computers that seem like people. What makes a linguist think she can succeed where the techies haven't?

by Tom Scocca
On a dark, nondescript street, seen from two angles through two operating-system windows on a jumbo computer monitor, a sort-of-personal moment is about to take place. Two legless figures resembling shirt mannequins, one bronze-colored and one blue, are drifting toward one another, steered by Hannes Vilhjálmsson, a third-year graduate student at the MIT Media Lab. Under the indigo light fixtures of the workroom, Vilhjálmsson works the arrow keys, and the figures float closer. Closer still, and then, just before they meet, the mannequins lift their bulbous cartoony heads, flick one dull sidelong glance at each other, and look away again. They pass, silently.
It is a disconcertingly human thing to see computer-generated images do. The figures, created by a program called BodyChat, are rough -- legless, with Tinkertoy arms and empty space where their necks should be -- but their movements seem profoundly natural. They are following the bluntest of directions: each figure is governed by a toggle switch labeled AVAILABLE/ UNAVAILABLE, and both switches are set to UNAVAILABLE. But the instructions are being carried out with a rare sophistication. If you were to meet one of the figures on the street and get that glance, you would not misunderstand it; you would get the message, UNAVAILABLE, as clear as you please. This is what Professor Justine Cassell and her Gesture and Narrative Language Group have in mind.
It's a project at which plenty of other people have failed. The personal-computing era has left, strewn in its wake, a vast array of devices and programs that were supposed to make machines seem human: bleating speech synthesizers, annoying on-screen "helpers," near-useless home robots. But Cassell's work may not meet with the same fate. She is not an engineer or programmer by trade; her degrees are in comparative literature, linguistics, and developmental psychology. By applying her knowledge of the less-than-obvious patterns of human behavior, she hopes to make computers deal with people on genuinely human terms. And computers -- and people -- seem to be taking to it.

The computers that most of us use are the products of a very different philosophy. The windows, the desktop, the little icons representing folders and documents were born from the idea that computers should blend into the work environment, not reach out to users. Mark Weiser, the chief technologist at Xerox's legendarily innovative Palo Alto Research Center, believes computers should be neither seen nor heard; to him, the computing technology of the future will be ubiquitous and invisible. The ideal, Weiser says, is for computers to be so straightforward to use that you won't think about them, any more than you think about the hammer when you're driving nails.
But some researchers are finding that the hammer is a misleading metaphor. People may think they prefer the idea of impersonal machines, but MIT professor Youngme Moon says the interaction between humans and machines is already a social one.
"Every time it communicates with you, you have a social response," says Moon, who runs MIT's Social Intelligence Research project, and who has collaborated with Cassell.
In one experiment, Moon had people perform a series of learning exercises on a computer, then answer a survey evaluating the computer's performance. She found that when the computer asked users to critique its work, they would soft-pedal their responses, the same way people tend to temper face-to-face criticism of other people. When they were surveyed by a different computer, or with pencil and paper, their answers were markedly more negative.
Given the way people actually relate to their machines, then, making the machines more humanlike seems inevitable. Indeed, the idea of building artificial people dates back thousands of years; MIT theologian Anne Foerst calls it "a very old dream of humankind." It runs through creation myths, the Pygmalion story, medieval Jewish myths of golems, and tales like Pinocchio and Frankenstein. And just as people 15 years ago were captivated by the idea of a home robot, even if it was just a remote-controlled-car chassis with a mechanical arm, people exploring the world inside their computers are inclined to look for man-made beings there.
The uses of such technology could be legion. One application would likely be the building of "animated interface agents," i.e., walking, talking embodiments of computer programs. In some jobs, on-screen synthetic people could replace real people -- as clerks, reference librarians, and the like. Phone companies, Cassell reports, are for some reason smitten with that idea, hoping to staff their retail communications stores with technologically impressive fake salespeople.
The other obvious application is to the online world; many people who interact online are looking for a way to make avatars, their graphical stand-ins in cyberspace, seem more human. "Conversations are definitely better if you have bodies," Cassell says. But currently, even the chat groups that offer graphical avatars have figures that you simply drag around the screen; some of the fancier ones may execute series of arbitrary gestures.
As it stands, both animated agents and avatars run to the pointless-seeming or the creepy. Cassell recalls an unsuccessful desktop agent, or on-screen helper, with a grin so relentless that "people didn't want to go near their computers." And one type of avatar, she says, looks at its watch at random moments -- even if the person it's chatting with is relating something like news of a death in the family.

Xerox's Weiser sees such failures -- particularly that of the much-touted "friendly" desktop agent Microsoft Bob -- as evidence that people don't want human interaction with their computers. But Cassell's view is that the attempts reflect an "intuition that bodies and faces are important." The problem, she says, is that designers have had no idea how human gestural communication really works.
They're not alone in this. The actual way people use gesture is a subject that remains little understood and much overlooked. When Cassell decided to make a study of it, at the University of Chicago, she says she found that it was "the poor stepchild" -- too nonverbal for linguists, too communicative for psychologists.
Most people didn't even recognize that there was anything much to study. For a long time, Cassell says, people believed that visual cues simply echoed what was being said aloud.
In fact, she says, gesture is a communicative channel of its own, one that interacts with spoken language to convey additional information. By gesturing, a speaker can describe scenes spatially, reinforce relationships with listeners, or add physical and metaphoric detail to a message.
English speakers, for one instance, augment our language's underexpressive verbs with pantomime -- as we say we "went" somewhere, Cassell explains, our hands make walking or driving gestures. We shape out the positions of objects we discuss. We explain our point of view.
The key thing is that we don't know we're doing it. Thumbs-ups and bird-flippings aside, the gestures we make aren't conscious or voluntary. But they happen nonetheless -- and they convey information. Cassell ran an experiment in which an actor recounted the plot of a Sylvester-and-Tweety cartoon short while using hand gestures that added new information to, or in some cases contradicted, the spoken story. In one instance, the narrator marked the positions of Tweety and Sylvester with his left and right hands, respectively, then said Sylvester lunged at Tweety -- but moved the Tweety hand quickly toward the Sylvester hand. When asked to retell the story, the observers mingled the information from the gestures and the spoken version so that, among other things, Tweety took a turn going after Sylvester.
Because people aren't consciously aware of such communication, computer avatars that let people dictate their movements fail. "[Gestures] are automated in us," Cassell says. "You don't know you're doing them." Cassell herself knows, of course; she has trained herself to remember and re-create gestures and is constantly aware of how she and the people around her are moving, the way one might pay attention to a foreign language. ("I'm not a native," she says.)
For the vast majority that takes gesture for granted, deliberate gesturing comes as a distraction. "Dukakis was a good example," Cassell says. The erstwhile presidential candidate's handlers told him his hand movements were too busy and ethnic-looking, she says, so he worked hard on changing them. Cassell breaks into a brief, uncanny imitation of the ex-governor's stump manner, chopping at the air with tight little strokes. "What people noticed was that he wasn't trustworthy," she says. His behavior didn't fit his words.

To put people at ease, then, a computer that uses gesture should move seamlessly and meaningfully, as little like Michael Dukakis as possible. The gesture group's project to that end is called Gandalf, a "multimedia interactive humanoid agent." A sort of a very poor man's Max Headroom, Gandalf is a screen display of a crudely rendered cartoon head, fat-cheeked and Viking-helmeted, with a floating hand alongside.
With the user hooked up to a complicated harness and headset to track gaze and posture (future versions will use cameras instead), Gandalf engages him or her in a discussion about the solar system, looking from the human to a second screen showing the planets and back again. In a video of a session, the conversation is stilted -- Gandalf is designed to converse about the solar system, not to say particularly interesting things about it -- but continuous.
Clunky as it is, Gandalf taught the group one of its most important lessons: emotion doesn't seem to matter. Gandalf's conversational routine can be divided into two sets of gestures: emotional ones (smiling, frowning, knitting its brow in puzzlement) and communicative ones (nodding, pointing, and turning to face different directions). In one test, the lab alternately disabled each set of behaviors, so that Gandalf was using only emotional gestures, or only communicative ones. The emotionally deficient version of Gandalf, they discovered, could carry on a conversation just fine; without the ability to nod and point, however, its conversation quickly derailed. "The emotional stuff was not what made this agent intelligent or easy to use," Vilhjálmsson says.
Many of today's designers, he says, don't get that point. "The emphasis is to create emotionally rich avatars. But that totally jumps over a whole level, which is the communicative layer." In everyday life, he explains, we rarely know the emotional condition of the people we encounter, yet we manage to interact with them anyway. When the supermarket cashier rings up your purchases, Vilhjálmsson notes, "you have no idea what emotional state is being portrayed" -- yet you have no trouble getting your change.

In Cassell's estimation, the purpose of these electronic companions is not to share their feelings, but to be responsive and accessible. Socially competent computers matter because they offer the prospect of new kinds of interactions. Rather than computers disappearing into the woodwork, the way Weiser imagines things, Cassell sees them taking on new relevance. She is particularly interested in getting children to use computers for storytelling and self-guided activity; the next table over from the BodyChat computer holds a jumble of hardware-riddled stuffed animals employed in such projects.
Cassell herself "was a very building kind of kid," she says -- one who made dollhouse furniture despite not having any dolls. "I didn't have standard toys," she recalls. "I wanted technology."
Her office now, which she shares with a lean-faced and phlegmatic dog named Esme, partly makes up for that, with playthings spread out all over the shelves. Prominent among them is a cluster of Barbies (and Sindy Dis-moi tout, a French Barbie knockoff), a kind of toy that Cassell says she never really looked at till she visited Mattel a few years ago. "I got interested in what image this sends to girls," she says. "I think it's probably very satisfying to have a role-consonant toy . . . a toy that fits with what lots of kids are telling you it means to be a girl."
She presses a button on the back of an "interactive" talking Barbie, and it pipes up to ask, in a sentence audibly cobbled from randomly picked phrases, if Barbie and Cassell could "go dancing/with Ken/after the game."
With that, the appeal of socially intelligent machines comes into clearer focus. "Half the population has been less included in the technological world," she says. Video games, which are most kids' point of entry into the computer world, are generally "boy-friendly," emphasizing the gender distinction. For Cassell, part of the appeal of storytelling software or toys that can interact intelligently is that they draw girls into technology, while also encouraging boys to focus on exploring social relationships.

This is not, however, an idea that most computer people have time for. Until a generation weaned on such technology grows up, the business will remain in the hands of the existing boys, who've traditionally conceived of the pinnacle of computing power as what Anne Foerst refers to as "a disembodied male." Advancement is identified with building shitkicker processors, refining and redoubling the computer's ability to match one particular set of human capabilities. These are the people whose inventions beat Gary Kasparov in chess.
With that comes the self-confidence peculiar to engineers and hard scientists, the suspicion that other fields of inquiry don't have much to offer. "With any technical person," Cassell says, "if you have a background in the social sciences, you have to show you have something that you need to know."
She finds there's further resistance yet to hearing things from someone who is, like Cassell, small and female. At conferences, she says, "everyone assumes I'm 12. Being female and young-looking means that when a conversation starts in a professional situation, I have to assert my status."
But the traditional indifference to Cassell's field of interest also creates opportunity. Though nobody has done much to bring social intelligence into computing, the computational power to do it is sitting around waiting to be used.
The limitations of the group's work are not, by and large, technological ones. Once a set of rules for the avatars' conversational behavior had been worked out, Vilhjálmsson says, "it was trivial to create the system."
And those glance-trading mannequins on their deserted street have become something of a hot ticket. The market is itching to see BodyChat turned into product. Vilhjálmsson has been invited to international computer conferences to discuss his research, and is avidly courted by industry. Clearly, the work is filling a void.
"The technology's already here to create the things we should be doing," Moon says. "What you really need in this field is people who understand people."
"I humanize the interface to allow people to reflect on their own humanity," Cassell says. "Some people are in this to make computers more like humans. I want to enable humans to remain human."
Tom Scocca can be reached at tscocca[a]phx.com