Doctoral research

Automating the Conversion of Natural Language Fiction to Multi-Modal 3D Animated Virtual Environments

Download the dissertation (PDF, 240Mb)Publications

Abstract:

Popular fiction books describe rich visual environments that contain characters, objects, and behaviour. This research develops automated processes for converting text sourced from fiction books into animated virtual environments and multi-modal films. This involves the analysis of unrestricted natural language fiction to identify appropriate visual descriptions, and the interpretation of the identified descriptions for constructing animated 3D virtual environments.

The goal of the text analysis stage is the creation of annotated fiction text, which identifies visual descriptions in a structured manner. A hierarchical rule-based learning system is created that induces patterns from example annotations provided by a human, and uses these for the creation of additional annotations. Patterns are expressed as tree structures that abstract the input text on different levels according to structural (token, sentence) and syntactic (parts-of-speech, syntactic function) categories. Patterns are generalized using pair-wise merging, where dissimilar sub-trees are replaced with wild-cards. The result is a small set of generalized patterns that are able to create correct annotations. A set of generalized patterns represents a model of an annotator’s mental process regarding a particular annotation category.

Annotated text is interpreted automatically for constructing detailed scene descriptions. This includes identifying which scenes to visualize, and identifying the contents and behaviour in each scene. Entity behaviour in a 3D virtual environment is formulated using time-based constraints that are automatically derived from annotations. Constraints are expressed as non-linear symbolic functions that restrict the trajectories of a pair of entities over a continuous interval of time. Solutions to these constraints specify precise behaviour. We create an innovative quantified constraint optimizer for locating sound solutions, which uses interval arithmetic for treating time and space as contiguous quantities. This optimization method uses a technique of constraint relaxation and tightening that allows solution approximations to be located where constraint systems are inconsistent (an ability not previously explored in interval-based quantified constraint solving).

3D virtual environments are populated by automatically selecting geometric models or procedural geometry-creation methods from a library. 3D models are animated according to trajectories derived from constraint solutions. The final animated film is sequenced using a range of modalities including animated 3D graphics, textual subtitles, audio narrations, and foleys.

Hierarchical rule-based learning is evaluated over a range of annotation categories. Models are induced for different categories of annotation without modifying the core learning algorithms, and these models are shown to be applicable to different types of books. Models are induced automatically with accuracies ranging between 51.4% and 90.4%, depending on the category. We show that models are refined if further examples are provided, and this supports a boot-strapping process for training the learning mechanism.

The task of interpreting annotated fiction text and populating 3D virtual environments is successfully automated using our described techniques. Detailed scene descriptions are created accurately, where between 83% and 96% of the automatically generated descriptions require no manual modification (depending on the type of description). The interval-based quantified constraint optimizer fully automates the behaviour specification process. Sample animated multi-modal 3D films are created using extracts from fiction books that are unrestricted in terms of complexity or subject matter (unlike existing text-to-graphics systems). These examples demonstrate that: behaviour is visualized that corresponds to the descriptions in the original text; appropriate geometry is selected (or created) for visualizing entities in each scene; sequences of scenes are created for a film-like presentation of the story; and that multiple modalities are combined to create a coherent multi-modal representation of the fiction text.

This research demonstrates that visual descriptions in fiction text can be automatically identified, and that these descriptions can be converted into corresponding animated virtual environments. Unlike existing text-to-graphics systems, we describe techniques that function over unrestricted natural language text and perform the conversion process without the need for manually constructed repositories of world knowledge. This enables the rapid production of animated 3D virtual environments, allowing the human designer to focus on creative aspects.

 

 

Publications:
Journal articles:
GLASS, K. AND BANGAY, S. 2009. A method for automatically creating 3D animated scenes from annotated fiction text.
IADIS International Journal on Computer Science and Information Systems, 2009. PDF
GLASS, K. AND BANGAY, S. 2007. Evaluating and improving morpho-syntactic classification over multiple corpora using pre-trained, “off-the-shelf”, parts-of-speech tagging tools.
South African Computer Journal, 40:4-10, 2008. PDF
Conference Papers (peer reviewed):
GLASS, K. AND BANGAY, S. 2008c. Automating the creation of 3D animation from annotated fiction text.
In IADIS-CGV ’08: Proceedings of the IADIS International conference on Computer Graphics and Visualization (Amsterdam, The Netherlands). International Association for Development of the Information Society, 3-10. PDF
GLASS, K., ALCOCK, B. AND BANGAY, S. 2007. Mechanisms for multimodality: taking fiction to another dimension.
In AFRIGRAPH ’07: Proceedings of the 5th international conference on Computer graphics, virtual reality, visualisation and interaction in Africa (Grahamstown, South Africa). Association for Computing Machinery, ACM Press, New York, United States of America, 135-144. PDF
GLASS, K. AND BANGAY, S. 2007b. A naïve salience-based method for speaker identification in fiction books.
In PRASA ’07: Proceedings of the 18th Annual Symposium of the Pattern Recognition Association of South Africa (Pietermaritzburg, South Africa). 1-6. PDF
GLASS, K. AND BANGAY, S. 2007a. Constraint-based conversion of fiction text to a time-based graphical representation.
In SAICSIT ’07: Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries (Fish River, South Africa). Association for Computing Machinery, ACM Press, New York, United States of America, 19-28. (Awarded best paper) PDF
GLASS, K. , MORKEL, C., AND BANGAY, S. 2006. Duplicating Road Patterns in South African Informal Settlements Using Procedural Techniques.
In AFRIGRAPH ’06: Proceedings of the 5th international conference on Computer graphics, virtual reality, visualisation and interaction in Africa (Cape Town, South Africa). Association for Computing Machinery, ACM Press, New York, United States of America, 161-170. PDF
GLASS, K. AND BANGAY, S. 2006. Hierarchical rule generalisation for speaker identification in fiction books.
In SAICSIT ’06: Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries (Cape Town, South Africa). Association for Computing Machinery, ACM Press, New York, United States of America, 31-40. PDF
GLASS, K. AND BANGAY, S. 2005. Evaluating parts-of-speech taggers for use in a text-to-scene conversion system.
In SAICSIT ’05: Proceedings of the 2005 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries (White River, South Africa). Association for Computing Machinery, ACM Press, New York, United States of America, 20-28. PDF
Posters:
GLASS, K. AND BANGAY, S. 2005. Building 3D Virtual Environments Using Natural Language Descriptions.
In SAICSIT ’05: Proceedings of the 2005 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries (White River, South Africa). PNG