Composing a multimedia presentation may require creation or generation of suitable images and video segments, as well as animation, sound, or special effects. Obtaining images or video sequences can be prohibitively expensive when costs of travel to location, equipment, staff, etc, are considered. Those problems can be alleviated with the use of pictorial and video digital libraries, such libraries require methods for comprehensive indexing and annotation of stored items and efficient retrieval tools.We propose a system based on user oriented perceptions as they influence query formation in image and video retrieval. We present a method based on user dependent conceptual structures for creating and maintaining indexes to images and video sequences.
In this discussion paper we propose an intelligent system that provides sufficient flexibility for the user to improve the retrieval and querying ability as the user's experience and knowledge increase. Given that even in well structured areas such as book cataloguing in modern libraries, different cataloguers create vastly different indices (only 30-40% overlap [Sormunen 1987]), it seems an impossible goal to create a computerised system to do the task that we humans cannot do well ourselves, and know not how it should be done. Despite this problem, we have been sharing information in various forms, by word of mouth, in print or through pictures, sketches and movies, without feeling restricted in any way. A useful information retrieval system must offer functionality similar to that of our own memory. What we remember, and hence recall, is our personal perception of remembered information. The best real life example of this concept are our varied choices of mnemonics. We all have our own methods for memorising names, passwords, poetry, etc. If we want to build a useful indexing and retrieval system, it must allow for a different representation for every user. Instead of attempting to build a common sense knowledge representation and use reasoning to correlate different expressions with the same meaning [Lenat & Guha 1990], we propose a system that builds noetic maps of sought (and found) information according to individual user's perception.
Our aim is to develop tools and techniques for indexing, browsing and retrieval of information in a media independent manner. The system will consist of a knowledge base of chunks of knowledge in various forms at the lowest level and a set of conceptual maps as constructed by individuals at the higher level. These maps may or may not have shared components, depending on user's choices. We term this underlying structure and the associated tools the Unified Mental Annotation and Retrieval Tool (UMART).
The conceptual structure consists of two main sections: the shared, low level section the high level, user dependent section.
Currently, we are investigating the use of a unified indexing scheme for text, image and video data based on Schank's Unified Indexing Frame (UIF) and Conceptual Dependencies [Burke & Kass 1994]. Whilst the requirements for text, image and video are slightly different there are many similar aspects of different media types. This similarity is exploited in the choice of the basic data structure, which is augmented to cater for the specific needs of each media type. Individual chunks of knowledge are stored in low level common concept (llcc) frames.
The second part of the conceptual structure: high level personalised concept (hlpc) frames, consists of nodes that enable the user to link the llccs and combinations of llccs and hlpcs. This results in a network of nodes which we term a noetic map.
A set of tools for traversal of the map as well as search facilities are developed. The simplest tool allows the user to view and examine all defined concepts, as well as to explore the surrounding map. Querying facility is provided to find individual concepts, individual groups of concepts or paths through the map according to some restrictions on the links between structures.
We impose no restrictions on what type of knowledge may be represented in the map, as we can never attempt to build a complete or sound representation. The usefulness of the final design can only be evaluated through experimentation.
Firstly, we want to depart from conventional ways of perceiving information and knowledge. Current representational techniques for text provide the user with fairly rigid ways of searching and querying [Cutting et. al. 1993]. Also, several cognitive approaches have been taken to study the similarity in classes of users to provide interfaces that are useful. One of the problems with this approach is the choice of what are "significant" user perspectives. We choose to retain personal differences and build on them so that each user can view the basic index with a perspective that is suitable for the task at hand and accommodates personal (and personalised) choices.
The novelty of our approach lies in applying a novel combination of abstract notions of knowledge representation together with search and image processing techniques to form a tool for knowledge indexing and retrieval. A great deal of work has been done on efficient search methods and keyword indexing for text. Recently there is a growing interest in applying image processing techniques to video segmenting and basic analysis at a mechanical level. Before we can use the information, in any medium, we must have tools for selecting relevant information and presenting it to the user in a meaningful way, other than a flat display of found text. Our proposed system concentrates on building such tools. Since ability to find relevant information is intimately related to indexing the information in appropriate way, we propose to combine these two processes, and create a single indexing and retrieval system.
There are several significant differences between our proposed system and other existing methods of information indexing and retrieval such as WAIS and World Wide Web. First, none of these systems have a conceptually based index; they are based on keyword type classification. Further, there is no mechanism by which structure of web pages created by one user can be shared (incorporated) or conceptually compared with those created by other users. In fact one of the biggest problems with the World Wide Web (www) is the problem of finding information, and identifying similar types of information. WWW can be compared to a flexible interface, whereas we propose the underlying system. WWW could be used to present answers to queries, although a typical interface would need to be expanded to allow for dialogue with the user and presentation of choices contained in an answer to a query.
In the first stage of the project we investigate the development of the low level common concept frames to index video, text and image data. There have been several approaches that have been used in AI for the representation of structured knowledge. Associative graphs [Quinlan 1968] and frames [Minsky 1975] are some of the methods suggested by other researchers. Each one of these methods provides a framework for structuring the knowledge, but offers no guidelines in what and how should be represented within the structure. Resulting structures suffer from the same problems as any simple cataloguing method, making sharing of representations difficult if not impossible.
Another method of representing structured knowledge are conceptual dependencies [Schank 1972, Schank and Riesbeck 1981]. This method was originally used to understand the meaning of natural language text ' The stereotypical situations are stored in memory as a script for a play. The data structure called conceptual dependency is used to represent common everyday experiences whose understanding is required to comprehend natural language speech or text. In the analogy to scripts in a play, a script consists of slots for actors, actions, props, and setting. All actions that actors can perform have to fall into one of the predefined categories. In an extension to this work Schank proposed the Universal Indexing Frame. It was developed by observing story remindings in context and developing frameworks in which remindings could be explained. This data structure was proposed as a means of explicitly retaining personal user perspectives.
Independently of data type (medium) a given chunk of information can be classified in three fundamental ways: (i) what can be understood from the chunk on its own, independently of the context, (ii) what can be deduced from the chunk considered in context, (iii) what is known about the creation of the chunk, eg. for a film that would be film production information such as camera angle, type of shot. etc , for a book the information could be about the author. We propose to investigate and develop the low level indexing frame based on a variation of the conceptual dependencies and unified indexing frame to allow for this type of fundamental classification.
The second stage of the project involves the design of the structure of individual mental concepts and relations between them for the UMART. A class structure for high level personalised concepts (hlpcs) will be developed. We propose a class on concept nodes and link types for the construction of hlpcs, and this class will form the foundation for our investigation. Similar to Schank's notion we propose an initial set of concept nodes:
Category | Sub-category |
Animate (agent) | human, animal, plant, geological, manufactured |
Inanimate action | natural, manufactured, fictional |
Natural process | nuclear, chemical, evolutionary, geological, living |
Location | global, continental, country, state, locality, extra-terrestrial, map |
Abstract notion | thought, hypothesis, theory, concept, paradigm |
Both the usefulness and feasibility of this classification will be explored. Preliminary design for the hlpc structure is discussed in [Venkatesh & Kieronska 1995].
One of the unique features of the noetic map will be inclusion of map concepts for some of the more complex structures. In our preliminary work [Kieronska & Venkatesh 1994] we have shown how a spatial map can be used as a complex linking structure for a set of spatially and temporally related concepts. We believe that the same approach can be used for thematically related concepts. A meta-concept will have explicit representation for a collection of related concepts, using the analogy of maps.
The general concept representation does not share logically grounded properties of spatial relations. Therefore we cannot prove completeness of the representation. Similarly to Schank's original supposition that all actions can be classified as one of 12 categories, which later were extended to 13, we propose a possible classification of conceptual relations. The simplicity is a vital aspect, as anything too complex will be difficult to use and immediately result in a ambiguous indices. Under specified concepts will result in bigger sets of possible answers to queries (see stage 3), which then can be manually further narrowed.
The types of links, however, must be categorised, just as spatial relations are well defined. The following taxonomy of links is being explored. At a coarse level, two concepts C1 and C2 may be related to each other in the following ways:
subsumption (inv(generalisation)) | C1 subset C2 |
instance_of | E Æ C1 |
analogy | C1 @ C2 |
antonym | C1 @ ~C2 |
aspect_of | C1 « C2 Æ |
cause_of | C1 => C2 |
leads_to | C1 -> C2 |
question_posed | C1 -> QA |
answer_offered | AQ <- C1 |
The aspect_of relationship differs from subsumption in that it allows for definition of a concept that is somewhat related to another one, yet it is not fully relevant (ie an overlap relation), for example weather pattern is an aspect of growing roses, yet, comprehensive knowledge of all about weather is not required. The inverse is not necessarily true - the body of knowledge about weather does not have to include roses' influence on the matter.
Following the known story analysis techniques [Ferguson et. al. 1992, Osgood 1994], a piece of knowledge may provide one or more answers, and pose one or more questions that require further clarification. For a given concept an unmatched link may be established with a reference to a partially defined concept QA. A match (and hence a full link) will be established when the remainder of the QA is found (possibly manually). In our example the hlpc patents may contain explanation of what a patent is, and include a reference to an instance of radio patent. It may contain a question on the court case related to the granting of the patent. Limited forms of reasoning may be performed on the hulks and links between them.
Classification based on the above link structure is not unique and unambiguous. However, links created by a particular person would reflect that person's perception and hence would be easier to remember. Creation of every concept node and every link involves filling out a template and augmenting standard information with personalised annotation. Given time, this structure can grow to a phenomenal size and become as difficult to use as the information in its original form. Efficient search methods, automatic detection of duplicates and merging of similar concepts are necessary to make UMART feasible. With the search and simple reasoning as the basis, an intelligent querying system will be constructed. The sort of queries that we envisage are:
Tools for manipulation of retrieved concepts will be incorporated at this level. On the basis of repeated choices in terms of items accessed and searched, shortcuts can be automatically created for individual users.
Cutting, D. R., Karger, D. R. and Pedersen, J. O. (1993). Constant interaction time scatter/ gather browsing of very large document collections. Proceedings of 16th Annual International SIGIR, Pittsburgh, PA, USA.
Davis, M. (1994). Knowledge representation for video. Proceedings of Indexing and Re-use in Multimedia Systems, AAAI-94 Workshop, 19-28.
Ferguson, W., Bareiss, L., Birnbaum, and Osgood, R. (1992). ASK Systems: An approach to the realisation of story-based teachers. Journal of the Learning Sciences, 2, 95-134.
Genesereth, M. R., and Ketchtel, S. P. (1994). Software agents. Communications of the ACM, 37(7), 48-53.
Harman, D., (1992). User friendly systems instead of user friendly front ends. Journal of the American Society for Information Science, 43(2), 164-174.
Kieronska, D. H. and Venkatesh, S. (1994). Indexing of video data. Proceedings Workshop on Spatial Knowledge Representation and Reasoning, Singapore.
Laurel, B. (Ed) (1990). The Art of Human-Computer Interface Design. Reading, MA: Addison-Wesley, 1990.
Lenat, D. B. and Guha, R. V. (1990). Building Large Knowledge-based Systems: Representation and Interface in the Cyc Project. Reading, MA: Addison -Wesley.
Mayhew, D. L. (1992). Principles and guidelines in software user interface design. Englewood Cliffs, NJ: Prentice-Hall.
Mayoh, B. (1987). Are machines as good as people in drawing conclusions from knowledge represented in catalogues, data bases and expert systems? In Irene Wormell (Ed), Knowledge Engineering: Expert systems and information retrieval, 53-58.
Minsky, M. (1975). A framework for representing knowledge. In P. Winston (Ed), The Psychology of Computer Vision. McGraw Hill.
Osgood, R. E. (1994). Question-based conceptual indexing of conversational multimedia. Proc. of Indexing and Re-use in Multimedia Systems, AAAI-94 Workshop, 141-150.
Quinlan, M. (1968). Semantic memory. In M. Minsky (Ed), Semantic Memory and Processing. Cambridge: MIT Press.
Schank, R. C. (1972). Conceptual dependency: A theory for natural language understanding, Cognitive Psychology, 3.
Schank, R. C. and Riesbeck, C. K. (1981). Inside Computer Understanding: Five Programs plus Miniature. Lawrence ErIbaurn Associates.
Sormunen, E. (1987). A knowledge based intermediary system for information retrieval. In Irene Wormell (Ed), Knowledge Engineering: Expert systems and information retrieval, 59-73.
Venkatesh, S. and Kieronska, D. H. (1995). Conceptual representation as a basis for media-independent indexing. To appear in Proceedings First International Conference, Indexers - Partners in Publishing, Melbourne.
Authors: Dorota Kieronska and Sevetha Venkatesh Department of Computer Science Curtin University of Technology PO Box U1987 Perth, WA 6001 Australia Email: dorota@cs.curtin.edu.au, svetha@cs.curtin.edu.au Please cite as: Kieronska, D. and Venkatesh, S. (1996). Media independent knowledge indexing and retrieval. In C. McBeath and R. Atkinson (Eds), Proceedings of the Third International Interactive Multimedia Symposium, 192-196. Perth, Western Australia, 21-25 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1996/ek/kieronska.html |