579
Views
31
CrossRef citations to date
0
Altmetric
Original Articles

REAL-TIME ANIMATION OF INTERACTIVE AGENTS: SPECIFICATION AND REALIZATION

&
Pages 510-529 | Published online: 07 Jul 2010

Abstract

Embodied agents are a powerful paradigm for current and future multimodal interfaces yet require high effort and expertise for their creation, assembly, and animation control. Therefore, open animation engines and high-level control languages are required to make embodied agents accessible to researchers and developers. We present EMBR, a new real-time character animation engine that offers a high degree of animation control via the EMBRScript language. We argue that a new layer of control, the animation layer, is necessary to keep the higher-level control layers (behavioral/functional) consistent and slim while allowing a unified and abstract access to the animation engine (e.g., for the procedural animation of nonverbal behavior). We also introduce new concepts for the high-level control of motion quality (spatial/temporal extent, power, fluidity). Finally, we describe the architecture of the EMBR engine, its integration into larger project contexts, and conclude with a concrete application.

INTRODUCTION

Turning virtual humans into believable, and thus acceptable, communication partners requires highly natural verbal and nonverbal behavior. This problem can be seen from two sides: creating intelligent behavior (planning context-dependent messages) and producing corresponding surface realizations (speech, gesture, facial expression, etc.). The former is usually considered an AI problem, whereas the latter can be considered a computer graphics problem (for nonverbal output). Although previous embodied agents systems created their own solutions for transitioning from behavior planning to graphical realization (Hartmann et al., Citation2005; Kopp and Wachsmuth, Citation2004; Neff et al., Citation2008), recent research has identified three fundamental layers of processing that facilitates the creation of generic software components (Kopp et al., Citation2006; Vilhjalmsson et al., Citation2007): intent planner, behavior planner, and surface realizer. This general architecture allows the implementation of various embodied agents realizers that can be used by the research community through unified interfaces.

In this article we present a new realizer called EMBRFootnote 1 (embodied agents behavior realizer) and its control language EMBRScript. An embodied agents realizer has particularly demanding requirements: It must run at interactive speed, animations must be believable while complying with high-level goals and be synchronized with respect to multiple modalities (speech, gesture, gaze, facial movements) as well as external events (triggered by the surrounding virtual environment or by the interaction partners), and it must be robust and reactive enough to cope with unexpected user input with human-like responses. The system should provide the researchers with a consistent behavior specification language offering the best compromise between universality and simplicity. Finally, all components of such a system should be open and freely available, from the assets creation tools to the rendering engine. In the terminology of the SAIBA framework (Kopp et al., Citation2006; Vilhjalmsson et al., Citation2007), users work on the level of intent planning and behavior planning and then dispatch high-level behavior descriptions in the behavior markup language (BML) to the realizer, which transforms it into an animation. Because the behavior description is abstract, many characteristics of the output animation are left for the realizer to decide. There is little way to tune or modify the animations planned by existing realizers (Thiebaux et al., Citation2008). To increase animation control while keeping high-level behavior descriptions simple, we propose an intermediate layer between: the animation layer. The animation layer gives access to animation parameters that are close to the actual motion generation mechanisms like spatiotemporal constraints. It thus gives direct access to functionality of the realizer while abstracting away from implementation details.

On the one hand, the animation layer provides users with a language capable of describing fine-grained output animations without requiring a deep understanding of computer animation techniques. On the other hand, the concepts of this layer can be used as building blocks to formally describe behaviors on the next higher level (BML). We also show how to integrate the control of motion quality (spatial extent, temporal extent, power and fluidity) into our framework.

To sum up, the main contributions of this article are as follows:

Introducing a new, free behavior realizer for embodied agents

Presenting a modular architecture for real-time character animation that combines skeletal animation, morph targets, and shaders

Introducing a new layer of specification called the animation layer, implemented by the EMBRScript language, that is based on specifying partial key poses in absolute time

New formulations for realizing motion qualities (spatial/temporal extent, power, fluidity) on individual motions, based on the concept of nucleus, which subsumes stroke and independent hold of a gesture.Footnote 2

In the following, we first review related work and then describe the animation layer and EMBRScript. We then explain EMBR's modular architecture and conclude with a concrete application and future work.

RELATED WORK

In the terminology of the SAIBA framework, the nonverbal behavior generation problem can be decomposed into behavior planning and realization. At the first split of the SAIBA framework, the BML describes human movement at the level of abstract behavior (Kopp et al., Citation2006) (perform a pointing gesture, shrug, etc.).

The problem of behavior planning may be informed by the use of communicative function (De Carolis et al., Citation2004), linguistic analysis (Cassell et al., Citation2001), and archetype depiction (Ruttkay and Noot, Citation2005) or may be learned from real data (Stone et al., Citation2004; Neff et al., Citation2008). The problem of realization involves producing the final animation, which can be done either from a gesture representation language (Kopp and Wachsmuth, Citation2004; Hartmann et al., Citation2005) or from a set of active motion segments in the realizer at runtime (Thiebaux et al., Citation2008; Gillies et al., Citation2008).

Kopp and Wachsmuth (Citation2004) created an embodied agent realizer that provides the user with a fine-grained constraint-based gesture description language (MURML) that lets the user precisely specify communicative gestures involving skeletal animation and morph target animation. This system allows the user to define synchronization points between channels but automatically handles the timing of the rest of the animations using motion functions extracted from the neurophysiological literature. Their control language can be regarded to be on the same level of abstraction as BML, being, however, much more complex with deeply nested XML structures. We argue that a number of low-level concepts should be moved to what we call the animation layer.

The SmartBody open-source framework (Thiebaux et al., Citation2008) relates to our work as a freely available system that lets a user build its own behavior realizer by specializing generic animation segments, called motion controllers, organized in a hierarchical manner. Motion controllers have two functions: They generate the animation blocks and manage motion generation (controllers) and blend policy, scheduling, and time warping (meta-controllers). As SmartBody uses BML (Kopp et al., Citation2006) as an input language, it must tackle both the behavior selection and the animation selection problem. Although extending the controllers to tailor animation generation is feasible, there is currently no easy way to modify the behavior selection as “each BML request is mapped to skeleton-driving motion controllers” (Thiebaux et al., Citation2008, p. 153). Moreover, even if SmartBody lets users import their own art assets, the only supported assets creation tool is Maya (commercial). As for rendering, the authors claim to have full and partial intergration of multiple engines (e.g., Unreal 2.5 [full], Gamebryo, Ogre3D, and Half-Life 2's Source engine). The BML Realizer Footnote 3 (BMLR) is an open source project that uses the SmartBody system as an engine and Panda3D as a renderer. It therefore remedies the drawbacks of commercial tools from the SmartBody system. Existing commercial products like NaturalMotionFootnote 4 or Golaem behavior PackFootnote 5 are focused on low level motion generation and do not provide a language capable of specifying the motions in an abstract way.

A more recently development is the PIAVCA framework (Gillies et al., Citation2008) for controlling responsive animation. PIAVCA provides a range of real-time motion editing filters that can be combined to achieve complex animation effects and authoring of control flows that make it possible to create both scripted and reactive responses to events. However, it seems that PIAVCA is more focused on generating reactive behavior: For instance, it doesn't offer an animation specification language. Furthermore, although it gives access to basic animation primitives like morph targets or joints, PIAVCA doesn't provide elaborate procedural animation routines like inverse kinematics but rather relies extensively on motion capture.

ANIMATION LAYER: EMBRScript

It has been proposed that an abstract behavior specification language like BML should be used to communicate with the realizer. Such a language usually incorporates concepts like relative timing (e.g., let motions A and B start at the same time) and lexicalized behaviors (e.g., perform head nod), sometimes allowing parameters (e.g., point to object X). Although we acknowledge the importance of this layer of abstraction, we argue that another layer is needed that allows finer control of animations without requiring a programmer's expertise. We call this layer the animation layer. It can be regarded as a thin wrapper around the animation engine with the following most important characteristics:

Specifies key poses in time

Uses absolute time

Uses absolute space

Avoids deeply nested specification structures

We incorporate the functionality of this layer in a language called EMBRScript (see Fig. for a sample script). EMBRScript's main principle is that every animation is described as a succession of key poses. A key pose describes the state of the character at a specific point in time (TIME_POINT), which can be held still for a period of time (HOLD). For animation, EMBR performs interpolation between neighboring poses. The user can select interpolation method and apply temporal modifiers. A pose can be specified using one of four principal methods: skeleton configuration (e.g., reaching for a point in space, bending forward), morph targets (e.g., smiling and blinking with one eye), shaders (e.g., blushing or paling), or autonomous behaviors (e.g., breathing). Sections 3.1 to 3.3 describe each of these methods in detail. Because the animation layer is located between behavior planning (BML) and realizer (animation engine), one can implement a BML player by translating BML to EMBRScript, as depicted in Fig. .Footnote 6 Note that the problem of translating BML temporal and spatial constraints to specific EMBRScript formulations is an open problem that we are working on at the moment.

FIGURE 1 EMBRScript sample (bottom box). The script describes realizations of the behavior specified in the original BML script (top box).

FIGURE 1 EMBRScript sample (bottom box). The script describes realizations of the behavior specified in the original BML script (top box).

Skeleton Configuration

Animating virtual humans using an underlying rigid skeleton is the most widely used method in computer animation. In EMBRScript a skeleton configuration can be described in two different ways: (1) using forward kinematics (FK): all angles of all joints are specified, usually prefabricated with the help of a 3D animation tool, (2) using inverse kinematics (IK): a set of constraints (e.g., location of wrist joint, orientation of shoulder joint) are passed to an IK solver to determine the pose in real time. In Fig. the pose description labeled p2 in EMBRScript defines a key pose using two kinematic constraints on the right arm: a position constraint and a partial orientation, both defined in Cartesian coordinates. In EMBR, kinematic constraints modify parts of the skeleton called BODY_GROUPS, which are defined in terms of skeleton joints. Pose description p3 referes to a stored pose in the engine's pose repository. The animation description Clip:headShake refers to a prefabricated animation clip (which is treated as a sequence of poses) also residing in the engine's repository.

Morph Targets and Shaders

The face is a highly important communication channel for embodied agents: Emotions can be displayed through frowning, smiling, and other facial expressions, and changes in skin tone (blushing, paling) can indicate nervousness, excitement or fear. In EMBR, facial expressions are realized through morph target animation, and blushing and paling are achieved through fragment-shader based animation. In EMBRScript, the MORPH_TARGET label can be used to define a morph target pose and multiple morph targets can be combined using weights. Like with skeletal animation, the in-between poses are computed by interpolation. In Fig. , pose p1 in the EMBRScript sample defines a weight for a morph target key pose that corresponds to the basic facial expression of anger in MPEG-4. For skin tone one defines a SHADER_TARGET together with an intensity, like the blushing pose in p1.

Autonomous Behaviors

Autonomous behaviors are basic human behaviors that are beyond conscious control. Examples are breathing, eye blinking, the vestibuloocular reflex, eye saccades and smooth pursuit, balance control, and weight shifting. Such behaviors can be realized with automatic routines that are controlled with parameters like breathing frequency or blinking probability. In addition to a set of predefined autonomous behaviors, EMBR also lets the users define and implement their own autonomous behaviors and associated control parameters. The pose description labeled p1 in the EMBRScript sample in Fig. shows how a user can modify autonomous behavior parameters like breathing frequency and amplitude.

Temporal Variation and Interpolation Strategies

Human motion is rarely linear in time. Therefore, procedural animations derived from interpolation between poses must be enhanced with respect to temporal dynamics. Therefore, EMBR supports time warp profiles that can be applied on any animation element and correspond to the curves depicted in Figure . Time warp profiles conveying ease in, ease out, and ease in and ease out can be specified in the EMBR language with the two parameters of function family (TAN and EXP) and slope steepness (a real number). The first gesture described in the EMBRScript sample of Fig. illustrates a possible usage of the TIME_WARP element.

FIGURE 2 Time warp profiles can be used to model ease in, ease out, and ease in and ease out. EMBRScript offers two spline-based function families, TAN and EXP, where parameter σ roughly models steepness at (0, 0). A profile may (intentionally) result in overshoot like in Hartmann et al. (Citation2005).

FIGURE 2 Time warp profiles can be used to model ease in, ease out, and ease in and ease out. EMBRScript offers two spline-based function families, TAN and EXP, where parameter σ roughly models steepness at (0, 0). A profile may (intentionally) result in overshoot like in Hartmann et al. (Citation2005).

Immediate, High-Priority Execution

An agent may have to respond to an interruptive event (like dodging an incoming shoe). To specify behaviors that require immediate, high-priority execution, EMBRScript provides a special TIME_POINT label: asap. A behavior instance whose time stamp is asap is performed as soon as possible, overriding existing elements.

EMBR ARCHITECTURE

The EMBR engine reads an EMBRScript document and produces animations in real time. In practice, this means that EMBR must produce a skeleton pose for every time frame that passes. This process is managed by a three-component pipeline consisting of the motion factory, the scheduler, and the pose blender (Fig. ). This processing is independent of the concrete rendering engine (cf. Section 6 to see how rendering is managed).

FIGURE 3 EMBR architecture.

FIGURE 3 EMBR architecture.

To give an overview, EMBR first parses the EMBRScript document, which results in a sequence of commands and constraints. The motion factory gathers and rearranges constraints according to their time stamp and type to create a set of time-stamped motion segments. Motions segments are sent to the scheduler, which sorts out at regular intervals a set of motion segments whose time stamp matches the current time. Relevant poses are sent to the pose blender. The pose blender merges all input poses resolving possible conflicts and outputs a final pose.

Motion Factory

The motion factory produces the building blocks of the animation called motion segments from the key poses specified in EMBRScript. A motion segment represents an animation for part of the skeletonFootnote 7 over a period of time. For instance, a motion segment may describe a waving motion of the right arm or the blushing of the face. Each motion segment contains an instance of a specialized actuator that drives the animation. The actuator's type depends on the motion generation method and the relevant pose component (recorded animation playback, skeleton interpolation, morph target weight, and shader input interpolation). Motion segments are controlled in terms of absolute time and the timing can be warped (see Section 3.4) to model (e.g., ease in and ease out).

Scheduler and Pose Blender

The scheduler manages incoming motion segments, makes sure that active segments affect the computation of the final pose, and removes obsolete segments. For each time frame the scheduler collects all active segments and assigns a weight according to the following algorithm:

If segment is terminating (fade out), assign descreasing weight from 1 to 0 (overlapping segments are interpolated according to their respective weights)

Else if segment contains a kinematic constraint, it is tagged with priority (will override all other segments)

Else segment is assigned weight 1 (overlapping segments are interpolated according to their respective weights)

The pose components from the motion segments are merged in the pose blender according to their weights using linear interpolation (we plan to allow more blending policies in the future). For kinematic constraints it is often critical that the resulting pose is not changed (e.g., orientation and hand shape of a pointing gesture); therefore, the pose is tagged priority and overrides all others.

EXPRESSIVITY PARAMETERS

Even a single human being displays an almost infinite variety in motion, depending on mood, emotion, style, and many other factors. For character animation it is desirable to achieve variety by applying a small set of high-level parameters to the same motion. Prior work has come up with various parameters. Hartmann et al. (Citation2005), from now on called HPM, introduced spatial extent, temporal extent, fluidity, power, and repetition, whereas EMOTE used the effort dimension of Laban's movement analysis system (Chi et al., Citation2000). Both models use a similar set of parameters (Table ).

TABLE 1 Traditional Principles of Animation and Their Realization in EMOTE and HPM

We decided to select four HPM parameters but reformulated their implementation as described in this section. The parameters are (all vary between −1 and 1)

Temporal extent: how fast/slow a movement is

Spatial extent: size of the amplitude of a motion

Fluidity: smoothness and continuity of the movement

Power: sense of force and engagement of the movement

We introduce the concept of nucleus to make these parameters applicable to a low-level language like EMBRScript. This is necessary for two reasons. First, we need to know where the most expressive part of a gesture is located in a sequence of poses (McNeill, Citation1992), because usually only this part is modified. Second, in EMBRScript a single gesture could be defined across multiple sequences (e.g., one for each arm), or a sequence could contain multiple gestures. The nucleus construct is therefore used to mark the expressive part and to bind together the potentially many parts of a single nucleus.

Because we apply the expressivity parameters to a nucleus and not, as HPM do, to a whole character, we can modulate the expressiveness of an agent over time, for instance, to convey different emotions or context-dependent changes of conduct (e.g., due to status differences). Figure shows how a nucleus is defined and parametrized and then referenced from particular pose(s) belonging to that nucleus.

FIGURE 4 Nucleus embeds a set of expressivity parameters. It is bound to a K_POSE by ID.

FIGURE 4 Nucleus embeds a set of expressivity parameters. It is bound to a K_POSE by ID.

Parameter Implementation

Our formulation of expressiveness is based on the modification of three different aspects of the animation:

Shape of the key poses

Interpolation parameters between key poses

Timing between key poses

We control the fluidity of an animation by modifying the interpolation style. For this, we implemented Kochanek spline interpolation (Kochanek and Bartels, Citation1984) in quaternion space (details can be found in the Appendix).

Temporal Extent

The temporal extent parameter p temp specifies how slow or fast a motion appears by varying the duration of the several gesture parts from key pose to key pose. A negative value reduces the duration, whereas a positive value increases the amount of time needed to reach the target point. This time offset is obtained using a (logarithmic) function of the original duration d: delay = log(dp temp·1.5. The logarithm avoids too big delays or too drastic cuts. The 1.5 scaling value guarantees that P temp values are kept within an intuitive range ([−1, 1]). The temporal extent can cause a pause after finishing the modified gesture because the overall duration of the motion is shortened for negative values or may lead to an overlap with the subsequent gesture for positive values. Also note that particular synchronization points with other modalities (most notably speech) that are located after the nucleus can become out-of-sync.

Spatial Extent

Various methods have been proposed to systematically enlarge/shrink an existing gesture. The challenge is to preserve the overall shape of the gesture after the modification. For instance, if a gesture is simply scaled with regard to a fixed point without respecting the trajectory of the gesture, the overall shape may be distorted (e.g., a circle to an ellipsoid). One may instead adjust single segments between key points, which may cause problems at start and end points and may also distort the geometry.

Therefore, we decided to use a “gesture-centric” method: We compute the bounding box of the original key points k i of both hands (for bihanded gestures). According to the value of the spatial extent parameter p spatial, this box is scaled together with the contained key coordinates around its center as depicted in Fig. . Again, the scalar Scalemorph guarantees that P spatial values are kept within an intuitive range and depends on the agent's morphology. This method proved most useful for a number of different gestures, whereas the other approaches always failed for a specific geometric or symmetric gestures.

FIGURE 5 Applying deformation over a precomputed bounding box is well fitted to gestures depicting 3D trajectories.

FIGURE 5 Applying deformation over a precomputed bounding box is well fitted to gestures depicting 3D trajectories.

Power

The power parameter p pow varies the energy of the gesture, a forceful motion for positive values emphasizes the stroke. It adds a preparation key pose to generate a preparatory motion in the opposite direction of the motion and with a length of 0.1·p pow·the length of the original motion. A 300-ms hold at this position is added, and the remaining motion is sped up by factor 1.0 + (p pow·0.2). The tension parameter for interpolation is set equal to p pow, leading to straight segments for high power and curves for low power. The bias parameter is used to create over- or undershoot at the end of strokes by setting it also equal to p pow for all motion segments, as depicted in Fig. .

FIGURE 6 A positive value for power adds a preparation phase before the stroke and some overshoot toward its end.

FIGURE 6 A positive value for power adds a preparation phase before the stroke and some overshoot toward its end.

Fluidity

This parameter p fluid affects continuity and holds of gestures. A high fluidity setting shortens or removes pauses by subtracting up to 300·v fluid ms from existing pauses and guarantees the continuity of the interpolation by setting continuity to zero if fluidity if greater or equal zero. A negative value causes the target joint to rest at every key position for 300·v fluid ms and adds corners in the interpolation process by setting continuity to −p fluid.

To conclude, we introduced formulations for our four quality parameters based on the concept of the nucleus. Although this needs to be validated by a user study, our impression is that our gesture-centric approach to expressivity parameter realization makes the resulting motion look more natural.

INTEGRATING EMBR INTO LARGER PROJECTS

An open character animation engine is only useful if it can be easily integrated into a larger project context and if it is possible to extend it, specifically by allowing new characters to be controlled. For EMBR, one of our goals was to provide a framework whose components are freely available. Therefore, we rely on the free 3D modeling tool Blender Footnote 8 for assets creation and on the free Panda3D engineFootnote 9 for 3D rendering. This section describes the complete pipeline from assets creation to runtime system. We sum up our goals as follows:

Components for assets creation and rendering are freely available.

Modifying existing characters is straightforward.

Creating a new agent from scratch is possible.

Use of alternative assets creation tools or renderers is possible.

The EMBR framework is depicted in Fig. : It can be characterized by an assets creation phase (top half of figure), a runtime phase (bottom half), and data modules connecting the two (boxes in the middle).

FIGURE 7 EMBR framework: Assets creation and runtime system.

FIGURE 7 EMBR framework: Assets creation and runtime system.

Assets Creation

When creating a new character, two mandatory steps are involved: creating 3D assets in a 3D modeling tool (Blender) and specifying the EMBR character configuration file. Optionally, shader programs can be designed. In the 3D modeling tool, one first creates static resources: the character's mesh, skeleton, mesh-skeleton rigging, and textures. For facial animation, one usually creates a set of morph targets. Finally, one creates a repertoire of skeletal animations. Finally, the user may create a set of programmable shaders (e.g., for changing skin tone at runtime [blushing/paling] or for improved rendering). Shader programming is highly dependent on the rendering engine. For instance, only the CgFootnote 10 programming language is currently supported by Panda3D. Developers can expose parameters from the shader program and control them through EMBRScript. Shader input parameters must be declared in the EMBR character configuration file. Once the character is ready, export scripts package the data from Blender for later usage by the EMBR engine and the Panda3D renderer.

Character Configuration File

Because in the modeling stage the user is free to choose joint names and skeleton topology, the EMBR character configuration file must be created to inform EMBR about the character's inherent features. The character configuration file is in charge of specifying the different properties and capabilities of the agent so they can be exposed and accessed through the EMBRScript language. Currently, the character specification file support the following elements for one agent:

A set of available prerecorded animation clips (Section 3.1)

A set of available morph targets (Section 3.2)

A tree element describing the joint hierarchy

Joint properties: name, boundaries, and kinematic gain

Joint groups: defined by a set of joint references

Kinematic structures (set of joint references) and their associated algorithms (Section 3.1)

Autonomous behavior: defined by their animation(s) and triggering mechanism: periodic, randomized, or event based (Section 3.3)

The character specification file not only lets a user describe the properties and capabilities of a new agent; it also provides a basic language capable of describing complex multimodal behaviors by combining basic animations using three basic relational predicates: reverse, sequential, and parallel.

For instance, the animation triggered by the breathing behavior is a compound of a skeletal animation and a morph target based animation, played forward then backward, as described in Fig. .

FIGURE 8 Breathing animation is a compound of a skeletal animation and a morph target based animation, played forward then backward.

FIGURE 8 Breathing animation is a compound of a skeletal animation and a morph target based animation, played forward then backward.

The character configuration file follows an XML specification whose DTD is available online.Footnote 11

Runtime

At runtime, EMBR is initialized with the EMBR character data (Fig. ). First, it dynamically populates the animation factory with motion segment producers corresponding to the character's attributes and capabilities defined in the character configuration file. Second, it configures the EMBRScript parser. EMBR uses the Panda3D rendering engine for interactive display of the character. Panda3D provides a Python scripting interface, which we use for synchronizing EMBR with Panda3D and for controlling the runtime system. During the lifespan of a character inside an EMBR session, two instances of the characters exist: one stored in EMBR, representing the poses that result from processing EMBRScript, and another one stored in Panda3D, representing the display-optimized version of the character. Our Python synchronizer ensures that the Panda3D character is always in the same the state as the EMBR character. Because gestures are defined using high-level constraints and because these gestures are resolved at the last moment using real-time motion generation methods, our system in particularly suited for interactive animation. However, users wanting to stick with preprocessed motion (handcrafted) or captured can also import and use their own asset.

APPLICATION IN PROCEDURAL GESTURE SYNTHESIS

To demonstrate the capabilities of EMBR, we outline its application in a gesture synthesis system (Kipp et al., Citation2007; Neff et al., Citation2008). The system produces coverbal gestures for a given piece of text using statistical profiles of human speakers. The profiles are obtained from a corpus of annotated TV material (Kipp et al., Citation2006). The gesture annotations can be considered a low-dimensional representation of the high-dimensional original motion, if the latter is seen as a frame-wise specification of all joint angles. In gesture synthesis the low-dimensional representation facilitates planning of new motions. However, at the final stage such low-dimensional representations have to be translated back to a full motion. In this section we describe one example of such a translation: from gesture annotation to EMBRScript commands.

The annotation of gesture is performed by the hierarchical three-layered decomposition of movement (Kendon, Citation2004; McNeill, Citation1992), where a gestural excursion is transcribed in terms of phases, phrases, and units. Our coding scheme adds positional information at beginning and end of strokes and independent holds. The transcription can be seen as a sequence of expressive phasesFootnote 12 s = ⟨p 0,…, p n−1⟩ where each phase is an n-tuple p = (h, t s , t e , p s , p e ) specifying handedness (LH, RH, 2H), start/end time, and start/end pose. This description can be used to re-create the original motion, which is useful for synthesizing new gestures or for validating how faithfully the coding scheme describes the form of the gesture.

For the translation to EMBRScript we separate the pose vector s into two channels for LH and RH, obtaining two pose vectors s LH and s RH. Each vector is then packaged into a single GESTURE tag (cf. Fig. ). For each pose start/end information, a respective key pose is defined using positional constraints. Note that even two-handed (2H) gestures are decomposed into the described LH and RH channels. This is necessary to model the various possibilities that arise when 2H gestures are mixed with single-handed gestures in one g-unit. For instance, consider a sequence of ⟨2H, RH, 2H⟩ gestures. There are now three possibilities for what the left hand does between the two 2H gestures: retracting to rest pose, held in mid-air, or slowly transition to the beginning of the third gesture. Packaging each gesture in a single gesture tag makes modeling these options awkward. Using two channels for RH, LH allows to insert arbitrary intermediate poses for a single hand. Although this solution makes the resulting EMBRScript harder to read, it seems to be a fair trade-off between expressivity and readability.

Using this straightforward method we can quickly “re-create” gestures that resemble the gesture of a human speaker using a few video annotations. We implemented a plugin to the ANVIL annotation tool (Kipp, Citation2001) that translates the annotation to EMBRScript and sends it to EMBR for immediate comparison between original video and EMBR animation. Therefore, this translation can be used to refine both coding schemes and the translation procedure. A coding scheme thus validated is then an ideal candidate for gesture representation in procedural animation systems.

CONCLUSION

We presented a new real-time character animation engine called EMBR (embodied agents behavior realizer), describing architecture, the EMBRScript control language, and how to integrate EMBR into larger projects. EMBR allows fine control over skeletal animations, morph target animations, shader effects like blushing and paling, and autonomous behaviors like breathing. The EMBRScript control language can be seen as a thin wrapper around the animation engine that we call the animation layer, a new layer between the behavior layer, represented by BML, and the realizer. Although the behavior layer has behavior classes (a pointing gesture, a head nod) for specifying form and allows for time constraints to specify time, the animation layer uses channels, spatial constraints, and absolute time to control the resulting animation. The latter is therefore much closer to the animation engine while abstracting away from implementation details. We showed how to use EMBR in conjunction with gesture coding to visually validate the coding scheme. Thus encoded gestures can then be used to populate procedural gesture synthesis systems. Although EMBRScript may sound like a scripting language, we see it as a general specification layer that abstracts away from concrete animation engine implementations. EMBRScript may therefore develop into a standardization discussion on how to control character animation in general. EMBR is now freely available to the research community.Footnote 13 Finally, for overall integration purposes we want to develop a consistent way of describing the features and capabilities of embodied, extending existing standards like h-anim.

Supplemental material

uaai_a_492161_sup_14275451.zip

Download Zip (5.4 MB)

Acknowledgments

This research has been carried out within the framework of the Excellence Cluster Multimodal Computing and Interaction (MMCI), sponsored by the German Research Foundation (DFG). We thank Pascal Pohl for his valuable contribution to the expressivity model and for his help during the writing process. EMBR's shaders management has been greatly enhanced by Stefan John.

Notes

See also http://embots.dfki.de/EMBR

An independent hold is the hold phase of a strokeless gesture, which usually carries the “meaning” of the gesture (Kita et al., Citation1998).

http://cadia.ru.is/projects/bmlr

http://www.naturalmotion.com/

http://www.golaem.com/

BML examples are deliberately chosen to be similar to the ones used in Thiebaux et al. (Citation2008).

More precisely, for part of the pose that includes morph targets and shaders.

http://www.blender.org

http://panda3d.org

http://developer.nvidia.com/page/Cg_main.html

http://embots.dfki.de/EMBR/characterConfiguration.dtd

An expressive phase is either a stroke or an independent hold; every gesture phrase must by definition contain one and only one expressive phase (Kita et al., Citation1998).

REFERENCES

  • Cassell , J. , H. Vilhjálmsson , and T. Bickmore . 2001 . BEAT: The behavior expression animation toolkit . Proc. of SIGGRAPH 2001 477 – 486 .
  • Chi , D. , M. Costa , L. Zhao , and N. Badler . 2000 . The EMOTE model for effort and shape . Proc. of SIGGRAPH 2000 173 – 182 .
  • De Carolis , B., C. Pelachaud , I. Poggi , and M. Steedman . 2004 . APML, a mark-up language for believable behavior generation . In: Life-Like Characters. Tools, Affective Functions and Applications , eds. H. Prendinger and M. Ishizuka , 65 – 84 . Cognitive Technologies Series , Berlin : Springer .
  • Gillies , M. , X. Pan , and M. Slater . 2008 . PIAVCA: A framework for heterogeneous interactions with virtual characters . In: Proc. of Intelligent Virtual Agents , eds. H. Prendinger , J. Lester , and M. Ishizuka , 494 – 495 . Tokyo , Japan .
  • Hartmann , B. , M. Mancini , and C. Pelachaud . 2005. Implementing expressive gesture synthesis for embodied conversational agents. In: Proc. of Gesture in Human-Computer Interaction and Simulation , eds. G. Sylvie , N. Courty , and J. F. Kamp , 188–199. Berder Island , France.
  • Kendon , A. 2004 . Gesture—Visible Action as Utterance . Cambridge : Cambridge University Press
  • Kipp , M. 2001 . Anvil—a generic annotation tool for multimodal dialogue . Proc. of Eurospeech , 1367 – 1370 .
  • Kipp , M. , M. Neff , K. H. Kipp , and I. Albrecht . 2006 . Toward natural gesture synthesis: Evaluating gesture units in a data-driven approach . In: Proc. of Intelligent Virtual Agents , eds. C. Pelachaud , J.-C. Martin , E. André , G. Chollet , K. Karpouzism and D. Pelé , 15 – 28 . Marina Del Rey , USA : Springer .
  • Kipp , M. , M. Neff , and I. Albrecht . 2007 . An annotation scheme for conversational gestures: How to economically capture timing and form . Journal on Language Resources and Evaluation , 41 : 325 – 339 .
  • Kita , S. , I. van Gijn , and H. van der Hulst . 1998 . Movement phases in signs and co-speech gestures, and their transcription by human coders . In: Gesture in Human-Computer Interaction and Simulation , eds. I. Wachsmuth and M. Fröhlich , 23 – 35 . Bielefeld , Germany .
  • Kochanek , D. , and R. Bartels . ( 1984 ). Interpolating splines with local tension, continuity, and bias control . SIGGRAPH Comput. Graph. 33 – 41 .
  • Kopp , S. , B. Krenn , S. Marsella , A. N. Marshall , C. Pelachaud , H. Pirker , K. R. Thórisson , and H. Vilhjlmsson . 2006 . Towards a common framework for multimodal generation: The behavior markup language . In: Proc. of Intelligent Virtual Agents , eds. J. G. Carbonell and J. Siekmann , 205 – 217 . Marina Del Rey , USA .
  • Kopp , S. , and I. Wachsmuth . 2004 . Synthesizing multimodal utterances for conversational agents . Computer Animation and Virtual Worlds , 15 : 39 – 52 .
  • McNeill , D. 1992 . Hand and Mind: What Gestures Reveal about Thought . Chicago : University of Chicago Press
  • Neff , M. , M. Kipp , I. Albrecht , and H.-P. Seidel . 2008 . Gesture modeling and animation based on a probabilistic recreation of speaker style . ACM Transactions on Graphics , 27 : 1 – 24 .
  • Ruttkay , Z. , and H. Noot . 2005 . Variations in gesturing and speech by GESTYLE . Int. Journal of Human-Computer Studies , 62 : 211 – 229 .
  • Shoemake , K. 1987 . Quaternion calculus and fast animation, computer animation: 3-D motion specification and control . SIGGRAPH 1987 Tutorial 101 – 121 .
  • Stone , M. , D. DeCarlo , I. Oh , C. Rodriguez , A. Stere , A. Less , and C. Bregler . 2004 . Speaking with hands: Creating animated conversational characters from recordings of human performance . In: ACM Transactions on Graphics , ed. John C. Hart , 23(3):506–513.
  • Thiebaux , M. , S. Marsella , A. N. Marshall , and M. Kallmann . 2008 . Smartbody: Behavior realization for embodied conversational agents. In: Proc. of the Intl. Joint Conf. on Autonomous Agents and Multiagent Systems, eds. L. Padgham, D. Parkes, J. Müller, and S. Parsons, 7(1):151–158. Estoril, Portugal.
  • Vilhjalmsson , H. , N. Cantelmo , J. Cassell , N. E. Chafai , M. Kipp , S. Kopp , M. Mancini , S. Marsella , A. N. Marshall , C. Pelachaud , Z. Ruttkay , K. R. Thórisson , H. van Welbergen , and R. J. van der Werf . 2007 . The behavior markup language: Recent developments and challenges . In: Proc. of Intelligent Virtual Agents , eds. C. Pelachaud , J.-C. Martin , E. André , G. Chollet , K. Karpouzis , and D. Pelé , 99 – 111 . Marina Del Rey , USA .

APPENDIX

ADDING TENSION, BIAS AND CONTINUITY PARAMETERS TO QUATERNION INTERPOLATION

Linear interpolation does not create convincing paths for limb motion and orientation, whereas cubic interpolation leads to more natural curves. To influence these paths with our expressivity parameters, we transfer the parameters introduced in Kochanek and Bartels (Citation1984), tension, continuity, and bias, to the squad interpolation algorithm by Shoemake (Citation1987)

In contrast to the normal squad algorithm, incoming and outgoing tangents are different. We influence the computation of this tangents in dependence of the parameters tension t, bias b, and continuity c, all in [−1.0, 1.0]:

With the modified tangents we compute the additional quaternions a and b for the squad interpolation:

Below is a brief description of how the different parameters (tension, bias, and continuity) influence the interpolated path. A graphical representation is also depicted in Fig. .

FIGURE 9 Influence of the tension parameter −1, 0, 1 on the quaternionic interpolation spline visualized on the simplified SO(3) hypersphere.

FIGURE 9 Influence of the tension parameter −1, 0, 1 on the quaternionic interpolation spline visualized on the simplified SO(3) hypersphere.

Tension defines how sharply the curve bends between the key points. For 1.0 it reduces the length of the corresponding tangent vector to zero, leading to linear interpolation, and for −1.0 it doubles its length, resulting in more unstressed curves.

Continuity controls the angle between the incoming and outgoing tangents at key points. For 0.0 spline tangent continuity at the key points is preserved, whereas for −1.0 and 1.0 the curve has acute corners to the in- or outside of the path.

Bias specifies the direction of the path at key points; for −1.0 it is completely determined by the incoming, whereas for 1.0 by the outgoing tangent.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.