TUTORIAL July 9, 2001
Talking Heads: Facial Animation in The Getaway
by Gavin Moore
In this article, I'm going to describe Talking Heads, our facial animation system which uses parsed speech and a skeletal animation system to reduce the workload involved in creating facial animation on large scale game projects. SCEE's Team Soho is based in the heart of London, surrounded by a plethora of postproduction houses. We have always found it difficult to find and keep talented animators, especially with so many appealing film projects being created on our doorstep here in Soho.
The Getaway is one of SCEE's groundbreaking in-house projects. It is being designed by Team Soho, the studio that brought you Porsche Challenge, Total NBA, and This Is Football. It integrates the dark, gritty atmosphere of films like Lock, Stock, and Two Smoking Barrels and The Long Good Friday with a living, breathing, digital rendition of London. The player will journey through an action adventure in the shoes of a professional criminal and an embittered police detective, seeing the story unfold from two completely different characters with their own agendas.
The Getaway takes place in possibly the largest environment ever seen in a video game; we have painstakingly re-created over 50 square kilometers of the heart of London in blistering photorealistic detail. The player will be able to drive across the capital from Kensington Palace to the Tower of London. But the game involves much more than just racing, the player must leave their vehicle to enter buildings on foot to commit crimes ranging from bank robberies to gang hits.
So, with a huge project such as The Getaway in development and unable to find enough talented people, the decision was made to create Talking Heads, a system that would severely cut down on the number of man-hours spent on tedious lip-synching.
Breaking It Down
The first decision to be made was whether to use a typical blend-shape animation process or to use a skeleton-based system. When you add up the number of phonemes and emotions required to create a believable talking head, you soon realize that blend shapes become impractical. One character might have a minimum of six emotions, 16 phonemes, and a bunch of facial movements such as blinking, breathing, and raising an eyebrow. Blend shapes require huge amounts of modeling, and also huge amounts of data storage on your chosen gaming platform.
The skeleton-based system would also present certain problems. Each joint created in the skeleton hierarchy has to mimic a specific muscle group in the face.
"If you want to know exactly which muscle performs a certain action, then you won't find an answer in Gray's Anatomy. The experts still haven't defined the subject of facial expression. Though psychologists have been busy updating our knowledge of the face, anatomists have not." -- Gary Faigin, The Artist's Complete Guide to Facial Expression
Most information on the Internet is either too vague or far too specialized. I found no one who could tell me what actually makes us smile. The only way forward was to work with a mirror close at hand, studying my own emotions and expressions. I also studied the emotions of friends, family, work colleagues, and people in everyday life. I have studied many books on facial animation and over the years attended many seminars. I strongly recommend a book by Gary Faigin, The Artist's Complete Guide to Facial Expression. If you can, try and catch Richard Williams in one of his three day master classes; his insight into animation comes from working with the guys who created some of the best Disney classics.
Building Your Head
Only part of a face is used during most expressions. The whole face is not generally used in facial expressions. The areas around the eyes, brows and the mouth contain the greatest numbers of muscle groups. They are the areas that change the most when we create an expression. We look at these two positions first and gather most of our information from them. Although other areas of the face do move (the cheeks in a smile for example), 80 percent of an emotion is portrayed through these two areas.
Neutral positions. We can detect changes in a human face because we understand when a face is in repose. We understand the positions of the brow and the mouth, and how wide the eyes are. These elements are constant from face to face. This is true if we are familiar with a person's face at rest or not (see Figure 1).
This changed the way we built our models, adding greater detail around the eyes and the mouth. Simulating the muscle rings seen in anatomy books allowed for greater movement in the face at these points.
The proportions of the face are the key to building a good head. Get this right and you are well on the way to creating realistic facial animation. Asymmetry is another goal to strive for when modeling your heads. Do not create half a head and flip it across to create the other half. The human head is not perfectly symmetrical.
Study of facial proportions by Leonardo da Vinci.
The width of the mouth is the same as the distance between the centers of the pupils.
The angle between the top lip and the bottom lip is 7.5 degrees.
The bottom of the cheekbones is the same height as the end of the nose.
There are many rules concerning facial proportions. The overall shape of the head is governed by a simple rule: The height of the skull and the depth of the skull are nearly the same. The average skull is only two-thirds as wide as it is tall. The human head can be divided into thirds: forehead to brow; brow to base of nose; and base of nose to chin. The most consistent rule is that the halfway point of the head falls in the middle of the eyes. Exceptions to this are rare. A few other general rules:
- The width of the nose at the base is the same as the width of an eye.
- The distance between the brow and the bottom of the nose governs the height of the ear.
The heads for The Getaway all stem from one model. This head contains the correct polygon count, animation system and weighting. We scan actors using a system created by a company called Eyetronics, a very powerful and cost-effective scanning process. A grid is projected onto the person's face whom you wish to scan and photographs are taken. These photographs are passed through the software and converted into 3D meshes. Each mesh is sewn together by the software, and you end up with a perfect 3D model of the person you scanned. At the same time it creates a texture map and applies this to the model.
Then the original head model, the one that contains the correct polygon count and animation, is morphed into the shape of the scanned head. Alan Dann, an artist here at SCEE, wrote proprietary in-house technology to morph the heads inside Maya. The joints in the skeleton hierarchy are proportionally moved to compensate for the changes in the head. We are left with a model that has the stipulated in-game requirements but looks like the actor we wish to see in the g.
1,500-polygon model used for high-res in-game and medium resolution cutscenes.
The Getaway heads are designed with incredible level of detail. We use a 4,000-polygon model for extreme close-ups in the real-time cut scenes. The highest-resolution in-game model is 1,500 polygons, which includes tongue, teeth, eyelashes, and hair.
The skeleton hierarchy also contains level of detail; we remove joints as the characters move further away from the camera. Eventually only three joints remain, enough to rotate the head and open the mouth using the jaw.
Creating the Skeleton
The skeleton hierarchy was created based on the above study. Two main joints are used as the controls, the neck and the head. The "neck" is the base, the joint that is constrained to the skeleton of the character model. This joint can either be driven by constraints or motion capture data from the character model can be copied across. This gives us the point at which we have seamless interaction between the head and body. The "head" joint would control slight head movements: shaking and nodding, random head motions, and positions taken up in different expressions. The head leans forward during anger or downward when sad. This is the joint that all other joints spring from; it's used as the controlling joint. Wherever it goes, the rest of the joints go. Other joints which relate to specific muscle groups of the face are:
Three control each eye, one in each eyelid and one for the eye itself
Two joints, one on either side of the nose.
Two joints control each cheek.
Two joints on either side of the jaw.
Three joints in the tongue.
Four joints control the lips.
- Six joints control the forehead and eyebrows.
Front and side views of the facial animation system, showing the skeleton hierarchy.
The idea behind this mass of joints is that they simulate certain muscle groups. The muscles of the face are attached to the skull at one end. The other end is attached straight to the flesh or to another muscle group. This is different from muscles in the body, which are always attached to a bone at both ends. As the muscles contract, it should be a simple case of just animating the scales of our joints to simulate these contractions. Unfortunately this is not the case, as there are actually hundreds of muscles which all interact together. To achieve realistic expression we had to rotate, scale, and translate the joints.
How do you go about assigning an arbitrary head model to this skeleton? The original skinning of the character took two whole days of meticulous weighting, using Maya and its paint weights tool to achieve this.
I didn't wish to do this for every head. Joe Kilner, a programmer here at SCEE who was writing the animation system with me, came up with a MEL script (Maya Embedded Language) that would copy weights from one model to another. The script basically saved out the weights of the vertices using two guidelines: the vertex's normal direction and UV coordinates. This enabled us to export weights from one head and import them onto another.
For this to work, we had to make sure that all of our head textures conform to a particular fixed template. The added bonus of this is that then we can apply any texture to any head. The template also made it easier to create our face textures.
Emotions and the Face
Research has shown that people recognize six universal emotions: sadness, anger, joy, fear, disgust, and surprise. There are other expressions that we have that are more ambiguous. If you mix the above expressions together, people offer differing opinions on what they suggest. Also, physical states such as pain, sleepiness, passion, and physical exertion tend to be harder to recognize. So if you wish to make sure that the emotion you are trying to portray is recognized, you must rely on the overall attitude or animation of the character. Shyness, for example, is created with a slight smile and downcast eyes. But this could be misinterpreted as embarrassed or self-satisfied.
Emotions are closely linked to each other. Worry is a less intense form of fear, disdain is a mild version of disgust, and sternness is a mild version of anger. Basically blending the six universal emotions or using lesser versions of the full emotions gives us all the nuances of the human face.
A typical face texture in The Getaway.
Continue to page 2. >>
(Originally published in Game Developer March 2001.)