Xface Open Source Project and SMIL-Agent Scripting Language for ...

0 downloads 138 Views 2MB Size Report
With XfaceEd's editing tools, one can set these feature points and the zone of influence and define muscle models for ea
Xface Open Source Project and SMIL-Agent Scripting Language for Creating and Animating Embodied Conversational Agents Koray Balcı



Elena Not

Massimo Zancanaro

FBK-irst Via Sommarive, 18 I-38050 Trento, Italy

FBK-irst Via Sommarive, 18 I-38050 Trento, Italy

FBK-irst Via Sommarive, 18 I-38050 Trento, Italy

[email protected]

[email protected] Fabio Pianesi

[email protected]

FBK-irst Via Sommarive, 18 I-38050 Trento, Italy

[email protected] ABSTRACT Xface is a set of open source tools for creation of embodied conversational agents (ECAs) using MPEG4 and keyframe based rendering driven by SMIL-Agent scripting language. Xface Toolkit, coupled with SMIL-Agent scripting serves as a full 3D facial animation authoring package. Xface project is initiated by Cognitive and Communication Technologies (TCC) division of FBK-irst (formerly ITC-irst). The toolkit is written in ANSI C++, and is open source and platform independent.

Categories and Subject Descriptors I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—virtual reality; H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—artificial, augmented, and virtual realities

General Terms Performance, Standardization, Design

Keywords Embodied conversational agents, 3D talking heads, mpeg4 facial animation, scripting, open source

1.

INTRODUCTION

An expressive ECA would be of help in various applications such as human-computer dialogues aimed at problem solving, tutoring, advising, adaptive multimodal presentations, or also simple, canned, information presentations linked to html pages, possibly manually written by a human ∗Author is also a PhD student in Computer Engineering Department at Bogazici Universtiy, Istanbul, Turkey Copyright is held by the author/owner(s). MM’07, September 23–28, 2007, Augsburg, Bavaria, Germany. ACM 978-1-59593-701-8/07/0009.

author). We believe that the lack of free and open source tools for creation and animation of faces limit the further research on mentioned areas. Xface, our open source toolkit together with SMIL-Agent scripting language aims to solve this problem.

2.

XFACE TOOLKIT

In late 2003, with the above motivations, we initiated the project and released the early version of the toolkit in 2004 [3]. Over the years the toolkit evolved with new features and gained acceptance from the community allowing us to have a dedicated user base and testing group that reports bugs and new feature requests on a regular basis. The toolkit currently incorporates four pieces of software. The core Xface library is for developers who would like to embed 3D facial animation in their applications. XfaceEd editor provides an easy to use interface to generate MPEG4 ready meshes from static 3D models. XfacePlayer is a sample application that demonstrates the toolkit in action and XfaceClient is used as a script editor and communication controller over network with the player. Some key features of Xface Toolkit can be listed as follows: • Accepts MPEG4 FAP files and SMIL-Agent scripts as input. • Supports muscle based deformation (for MPEG4) and keyframe interpolation based animation using morph targets. • Muscle deformation methods/rules can be extended easily. • Blending of visemes (visual phonemes), emotions, expressions. • Head and eye movements (random and controlled). • Can use various TTS engines (MS SAPI, Festival [7]). We have experiments with English and Italian, and got reports from our users successfully using it in Spanish and Dutch.

• Control over TCP/IP. XfacePlayer can be controlled using any programming language with our messaging system. • Save animation as video. • Platform independent code base. (We distribute only windows version, however people reported compiling it under Linux successfully) The toolkit handles facial animation in two modes; MPEG4 facial animation (FA) and keyframe interpolation. Next we will discuss these modes in more detail.

2.1

MPEG4 Facial Animation

In 1999, Moving Pictures Experts Group released MPEG4 as an ISO standard [1, 2, 9]. According to MPEG4 FA standard, there are 84 feature points (FP) on the head. For each FP to be animated, corresponding vertex on the model, and the indices to the vertices in the zone of influence of this FP should be set. Then, 68 facial animation parameters (FAPs) drive the animation on those feature points. With XfaceEd’s editing tools, one can set these feature points and the zone of influence and define muscle models for each zone. Then, muscle deformations under different FAP values can be previewed and parameters can be fine tuned. Once all the information is in place, a face configuration file in XML syntax is produced. This file contains various information such as 3D models used for the face (one can have separate models for head, hair, teeth, etc.), textures, FPs, zones of influences, muscle models used, weight factors, etc. This configuration file then used in XfacePlayer to generate animation.

2.2

Keyframe Interpolation

As an alternative, Xface Toolkit also implements a keyframe interpolation based animation framework. In this mode, you should have a set of keyframes for different emotions and visemes in place prepared externally. In XfaceEd, these keyframes are inserted to the configuration and saved. Using SMIL-Agent scripts, we can test the animation within the editor. Visual speech (visemes) and emotions are defined as different channels where they are blended [6, 4, 5, 10] and interpolated over time. In Figure 1, you can see some of the emotion keyframes used for Alice model. In both MPEG4 and keyframe based modes, all the algorithms are implemented in the core library mentioned previously. This enables identical animation behavior in both XfaceEd and XfacePlayer while also letting application developers to implement their own players if they needed by linking only the core library. In Figures 3 and 4, various screenshots from XfacePlayer and XfaceEd are shown.

3.

SMIL-AGENT

Synthetic characters are often integrated in multimodal interfaces to convey messages to the user, provide visual feedback and engage them in the dialogue, also through emotional involvement. This is accomplished through a suitable synchronization of voice, lip movements, facial expressions, gestures, etc., In the end, synthetic characters should not be considered as a single modality but as stemming from the synergic contribution of different communication channels

Figure 1: Emotion keyframes for Alice face.

that, properly synchronized, generate an overall communication performance. With respect to other existing scripting languages SMILAgent [8] pushes further the idea of having a separate representation for the various communication modalities of a synthetic character (e.g., voice, speech animation, sign animation, facial expressions, gestures, etc.) and their explicit interleaving in the presentation performance. Furthermore, SMIL-Agent explicitly abstracts away from all > You have been diagnosed as suffering from angina pectoris, which appears to be mild. LookLeft

• Generation of relational reports, with the automatic generation of the SMIL-Agent scripts starting from the acoustic and visual scene analysis of meetings, the animations produced by the player (in form of AVI files) are integrated into multimodal SMIL presentations. (EU FP6 IST project CHIL4 )

15 0 5 In Figure 2 the flow of animation generation is presented. In Xface Toolkit, there is a separate library for interpreting SMIL-Agent scripts and creating animation definitions. As shown in the figure, this task is not so trivial and involves communication with speech synthesizers (TTS engines), various modalities such as speech, emotions and gestures should be extracted, processed and blended and synchronized before producing the final animation.

• Persuasion studies for education. (EU FP6 NoE project HUMAINE5 ) • Healthcare support system. Statistics from our project page show that the latest binary setup file of Xface Toolkit (v0.94; released December 2006) downloaded 480 times, and overall we have a download count of 2238 since the development started in May 2004. We also get daily support requests and feedbacks, mostly from graduate students all over the world. In the future, we plan to use Xface Toolkit in some other projects, increase its feature set and enlarge the user base. For further information and material, visit our web pages: Xface: http://xface.itc.it SMIL-Agent: http://tcc.itc.it/i3p/research/smil-agent

5.

Figure 2: SMIL-Agent script processing. We are currently working on a new user friendly, visual SMIL-Agent editing tool, which we plan to release this summer1 .

4.

CONCLUSION

With Xface project, we aim to develop a set of tools that are easy to use and extend, open to researchers and software developers. All the pieces in the toolkit are operating system independent, and can be compiled with any ANSI C++ standard compliant compiler. For animation, toolkit relies on OpenGL and is optimized enough to achieve satisfactory frame rates (minimum 25 frames per second) with high polygon count (12000+ polygons) using modest hardware. We distribute Xface under Mozilla Public License Version 1.1 in our Subversion server 2 and it can be freely downloaded. The release archives are also available at SourceForge3 . We have used our toolkit in various projects such as:

REFERENCES

[1] ISO/IEC JTC1/WG11 N1901, Text for CD 14496-1 Systems. Fribourg Meeting, November 1997. [2] ISO/IEC JTC1/WG11 N1902, Text for CD 14496-1 Visual. Fribourg Meeting, November 1997. [3] K. Balcı. Xface: MPEG-4 based open source toolkit for 3d facial animation. In Proc. Advance Visual Interfaces, Italy, May 2004. [4] T. Bui, D. Heylen, and A. Nijholt. Combination of facial movements on a 3d talking head. In Proc. of Computer Graphics International 2004 (CGI 2004), Crete, Greece, June 2004. IEEE Computer Society. [5] Y. Cao, W. C. Tien, P. Faloutsos, and F. Pighin. Expressive speech-driven facial animation. ACM Transactions on Graphics, October 2005. [6] J. Edge and S. Maddock. Expressive visual speech using geometric muscle functions. In Proc. of the 19th Eurographics UK Chapter Annual Conference (UCL), pages 11–18, April 2001. [7] T. C. for Speech Technology Research. The Festival Speech Synthesis System. University of Edinburgh, 2002. http://www.cstr.ed.ac.uk/projects/festival/. [8] E. Not, K. Balcı, F. Pianesi, and M. Zancanaro. Synthetic characters as multichannel interfaces. In Proc. ICMI05, Italy, October 2005. [9] I. Pandzic and R. Forchheimer. MPEG-4 Facial Animation: The Standard, Implementation and Applications. Wiley, New York, 2002. [10] H. Pyun, W. Chae, Y. Kim, H. Kang, and S. Y. Shin. An example-based approach to text-driven speech animation with emotional expressions. Technical Report 200, KAIST, July 2004.

1

see http://xface.itc.it/projectxq/ for an early version. http://xfacesvn.itc.it/svn/trunk 3 http://sourceforge.net/projects/xface/ 2

4 5

http://chil.server.de/servlet/is/101/ http://emotion-research.net/

Figure 3: XfacePlayer rendering animation with various emotions.

Figure 4: XfaceEd: Setting FAPU and FP, preview FAPs, and test with SMIL-Agent.