IIMS 96 contents
[ IIMS 96 contents ]

The perspective for content addressable multimedia databases

Douglas G Myers
Curtin University of Technology
The need to efficiently manage data in an information system was recognised over two decades ago. The databases that evolved, though, and which are now widely used, are homogeneous and involve only small insertions of highly structured data that is forced to be unique. Consequently, to a very large extent data is synonymous with information in such systems. Multimedia, by its very nature, is based on heterogeneous date. For many potential applications of multimedia information systems, data insertions will involve massive volumes about whose contents very little is known. For example, video data. Further, what constitutes information in this data can be quite dynamic and strongly application dependent That is to say, applications can be envisaged where what can be classified as information may well depend on some learning process within the fabric of the query system. Multimedia calls for radically new forms of databases. This paper presents an argument that the most appropriate management paradigm for multimedia date revolves about retrieval by content. Four common abstract models of this paradigm are examined which have Increasing level of flexibility. Three are realistic targets for development and application within distributed multimedia systems in the short to medium term. The fourth remains conceptually by far the most attractive, but formidable research problems make it unlikely it will be seen even in experimental form for some years.


Introduction

An information system stores, communicates and processes information. Storage centres about a database and the access mechanisms provided by the database management system. Communication relays information to where the user is located where that user is any entity with which the information system is interacting. Increasingly, communication is being provided by a computer network such as the Internet. That sets bandwidth limitations and introduces issues of latency. Processing is usually involved with the interface between the user and the information system, and often concerned with transformations such as visualisation. It is essentially local and so dependent on the limitations and strengths of the resources available to the user at the access point.

A multimedia information system is distinguished in that whatever quantum of information is defined has multiple media components. At the heart of such a system is a multimedia database. As the various media components of the data are usually added in differing time scales, variations in the capabilities of the insertion process often lead to this being called an image or video database. Multimedia databases m conceptually far more complex than any currently in existence and design theories still lie very much in the research domain. The problem is that in existing databases, the concepts of data and information are quite tightly coupled while in multimedia databases there is only a loose association. Images illustrate the point. Many are a two dimensional representation of a three dimensional scene. Their information content consequently relates to that scene, not the array of pixels in the data structure which forms them. Thus that information content can and usually is vast, depending on the context in which the image is examined. For multimedia databases, then, a radically different approach to database management is required.

The potential of multimedia information systems

For what problems does a multimedia information system offer a solution? The following illustrate part of the spectrum of applications:

The concept of an architecture and an organisation

Any 'black box' may be described in terms of its function, organisation and implementation. Function refers to input-output behaviour which is the potential user's perception of the system. In computing, this description is increasingly called an architecture, although for databases it is usually called a scheme (Korth & Silbershatz, 1991). Two of its key components are a set of abstract data elements and a query language to access those elements.

In general terms, the organisation of a system is a mathematical model which logically describes how outputs are generated from inputs through some sequence of primitive actions. As a logical description it may not describe how the system is actually constructed, hence the need for an implementation description. For a database, the organisation defines the semantics of the query language and gives a more detailed description of the abstract data elements of the architecture and how they are manipulated.

An image is a data structure typically around one Mbyte in size. A single one hour video sequence is the equivalent of around 100,000 such images. Consequently, a multimedia database can be expected to be some terabytes in size; the equivalent of more than 1000 common hard disk drives. That suggests a distributed implementation which raises various practical problems and organisational design issues. This extreme size also makes data compression almost mandatory and that greatly complicates storage and access issues. Compression is a process of reducing data redundancy. Apart from the usual forms of coded compression such as MPEG, multimedia databases are likely to utilise the following:

Navigation also presents numerous organisational issues. In its most general sense, database navigation can be created as one of. The former is basically the mode of existing databases, especially relational, and is highly developed. The latter is not as well researched and has no preferred approaches. It is, though, a mode which seems well suited to object oriented databases and these in turn seem well suited to multimedia information systems.

Some architectural issues

The design of a multimedia database will be strongly influenced by the use of a multimedia information system. Two points on this issue: Turning to the multimedia databases directly, three general comments can be made: Now: The focus of any general discussion on the architecture of a multimedia database needs to be on the abstract data model of that database and the query language. The data model is simply the classes of the data structures within the database and their properties. A key concern in selecting a model is its ability to accurately organise the information of interest within its data structures. Particularly for multimedia databases, this issue of representation is complex. A new discipline termed knowledge engineering, part of artificial intelligence, is addressing the problem. That work suggests the most appropriate models are object models or entity relationship ( ER ) models. Object models are the abstract foundation of object oriented systems and seem well suited to multimedia applications. Further, the paradigm flows easily into the organisation. ER models are quite general and seem well suited to situations where concepts of information are ill defined. The considerable interest in ER models in recent years has produced a significant research literature. Gyssens et al (1994), Tsuda et al.(1991), and Staube & Ozsu (1995) are examples of current thinking on the representation problem.

The retrieval function tends to figure prominently in query languages. Its design is remarkably complex given the many issues which must be addressed. How is a query to be framed when the information to be accessed is in multimedia form? ( Yoshitaka et al, 1994 ) Should it be in one medium only such as text, or should it consist of multiple media? If the latter, then how to frame the query in some consistent manner? Should the search be such that components of the query can only he used to examine like components in the database? For example, audio may only be used to search audio. If, however, a query is seen as general, then how to combine the different elements of the query given there may be inconsistency between them, what constraints should exist on seeking a combination of media and how to make comparisons?

Another non-trivial question is what is the outcome of a retrieval? Consider the operation of a database in abstract form. As an architecture. a database is simply a collection or set of entities where each is a structure of some form built around atomic elements. Therefore, retrieval may be viewed as a navigation process where at each point a two stage process of:

For a homogeneous database: Contrast this with heterogeneous databases: There may be inconsistency. How are multiple responses due to this to be resolved? There may also be redundancies where a query can be satisfied across several different media. Then should all components satisfying the query be returned and if so in what form?

The implication is that only part of a multimedia entity accessed is retrieved and that requires the query to include constraints which define that part.

If the information in a self organising database is unknown, then neither can effective retrieval mechanisms be known. That suggests learning might need to be an integral part of such a database and so that in some sense it becomes a deductive database. A deductive database has data, plus facts and rules. The facts and rules not only guide retrieval, but assist in learning new facts and rules that may be used to improve future performance. An apt analogy suggested in the recent literature is that such retrieval mechanisms, especially with regard to actions on very information rich sources such as images, may be best viewed as information mining mechanisms.

A taxonomy for multimedia databases

An abstract taxonomy for multimedia databases needs to be based on retrieval mechanisms where these in turn are based on the information content of entities. Given that the entities in a database are actually structures of atomic elements, then that suggests in the Most general terms and following Berra et al (1993) retrieval can be classified according to whether that accessing action is dependent on: More explicitly, this suggests a taxonomy as follows: While these levels describe an increasing level of complexity in the search process, the last is a very significant increase indeed. Three problems associated with it can be highlighted: There is the question of interpreting the meaning of a query. This has two aspects, an interpretation of the complete query and of the individual query terms, which are best illustrated by examples: Each of these requires additional interpretation and knowledge, and probably some interpretation of past retrievals to gauge the general desires of the user. That is to say, it requires some context in which to interpret user requests. Framing that is a difficult problem. Semantic analysis fall very strongly within the domain of artificial intelligence and while it has progressed significantly over the last decade or so it still has difficulty answering questions such as these with authority.

This taxonomy relates to architecture. With respect to organisations:

While level 0 and level 1 may be easy to create, they lack the flexibility needed for many important multimedia applications (IEEE Computer Special Issue, 1995). The task, therefore, is to determine how to make level 2 and 3 databases practical.

Retrieval by similarity

Similarity measures can be broadly divided into comparing a reference against derived: The first is very similar to level 1 retrieval. However, in multimedia the number of features can be extremely large, possibly even infinite, and so the issue becomes one of whether the reference and image features are sufficiently close.

Features can be interpreted quite broadly. For images, for example, they can be interpreted as any of the following:

Derived features are the most widely used as they are easily determined albeit at some high computational cost. Being the outcome of a mathematical operation and not an interpretation, though, means they often relate poorly to content. It is a reflection of the degree of difficulty in deriving content that feature sets are still considered.

Structural similarity requires expressing the data model entity in the form of a connected graph, tree, pyramid or some similar form showing how the elements within that entity link to one another. Similarity measures based on structure tend to perform quite well. This is to be expected as they more directly relate to content. The problem is actually forming them, although in the case of images deriving graphs can be relatively easy. The most successful approach at present is to apply templates of likely image objects and correlate then or pass them through a recognition structure such as a neural network to identify them.

The strongest interest in similarity measures is with respect to graphs. The problems of recognising human faces illustrates why ( Pentland et al, 1993 ). As the significant difference between faces relate to the lips, eyes, nose, ears and chin and the distances between them, a mathematical measure is needed to include those. If that measure proves inadequate, then it is necessary to turn to secondary factors such as hair. The formulation of a similarity measure is quite complex, but a graph which links the components with the key comparisons first can greatly simplify the process while still offering a sophisticated analysis.

Semantic retrieval

Understanding information in any form is difficult. A fruitful way of examining this problem is to assume information is expressed in some abstract language. That is to say, it consists of tokens of some form organised into sentences. This may actually involve a hierarchy. In speech, for example, words are formed from phonemes according to some given rules. To understand this information requires several steps. The first is to interpret the tokens themselves. This may be taken as some general form of dictionary reference. Next is the syntactic problem; the problem of determining the relationship between the tokens and the significance of these. Viewed generally, it is the problem of using the grammar of the language to parse the sentence. Finally, there is the semantic problem; assigning a meaning to the sentence. Semantics are necessary as a sentence can he grammatically correct, yet have no meaning. A more difficult problem is that in most forms of information there are ambiguities and so there can be several meanings. That meaning may be resolved by looking at past information, but often it is only resolved by referring to some context; a body of knowledge assumed in the communication but never stated. To illustrate, there is no need in speech to outline what will happen at a birthday party because that is part of common experience.

Understanding information is vital to multimedia database management in order to be able to translate actions between the different media. It seems an immensely complex problem. It certainly may be if the general case is tackled. However, limited problems can be tackled and quite successfully as some computer games, inventory ordering systems and other examples show. This suggests building separate semantic modules, each directed at a particular task. This is in part the underlying principle of intelligent agents suggested for graphical user interfaces, network searching and similar tasks.

Creating an information understanding system involves capturing the knowledge needed to perform the semantic analysis and then forming that knowledge into a structure for use within a computing system. Knowledge capture is difficult, but far from impossible. However, representing it can present problems. There are two common approaches:

Declarative captures facts and assertions and relies on statements in a logic or some relational expression such as graph or semantic network. Procedural information captures actions or consequences and is usually based on a formal grammar. As the former is ideal for dictionary structures and the latter for semantic analysis, usually both are needed and used.

Semantic retrieval is complex, but with some determination useful structures can be created. The real problem is that the time taken to interpret information, especially complex information such as images, involves considerable computation. Current computing structures are poorly organised towards such problems. Ideally, symbolic computers should be used, but few are available and there is much work to be done before they can be remotely considered mainstream.

Research problems in multimedia databases

Multimedia databases are a highly complex technology. Their design still falls within the research domain and it will be some years before they offer the robustness and features expected of commercial systems. This is illustrated by some of the key problems which still need to be addressed:

References

Berra, P. B., Golshani, E, Mehrotra, R.& Sheng, O. R. L. (1993). Guest Editor's Introduction: Multimedia Information Systems. IEEE Transactions on Knowledge and Data Engineering, 5(4), 545-550.

Gyssens, M., Pareadaens, L, Van Den Bussche, J. & Van Gucht, D.(1994). A graph-oriented object database mode. IEEE Transactions on Knowledge and Data Engineering, 6(4), 572-586.

IEEE Computer, Special Issue ( 1995). Finding the Right Image: Content-Based Image Retrieval Systems. IEEE Computer Society, 28(9).

Korth, H. F.& Silbershatz, A. (1991). Database System Concepts. New York, McGraw Hill.

Pentland, A., Ricand, R. W. & Sclaroff, S. (1993). Photobook: Tools for content-based manipulation of image databases. MIT Media Laboratory Perceptual Computing Technical Report 285.

Staube, D. D.& Ozsu, M. T. ( 1995 ). Query optimisation and execution plan generation in object oriented data management systems. IEEE Transactions on Knowledge and Data Engineering, 7(2), 210-227.

Tsuda, K., Yamamoto, K., Hirakawa, M., Tanaka, M. & Ichikawa, M. (1991). MORE: An object oriented data model with a facility for changing object structures. IEEE Transactions on Knowledge and Data Engineering, 3(4), 444-465.

Weigart, T. J. & Tsai, J. ( 1994 ). A computationally tractable non-monotonic logic. IEEE Transactions on Knowledge and Data Engineering, 6(1), 57-63.

Author: Douglas G Myers
School of Electrical and Computer Engineering
Curtin University of Technology
GPO Box U1987
Perth, Western Australia 6001

Please cite as: Myers, D. G. (1996). The perspective for content addressable multimedia databases. In C. McBeath and R. Atkinson (Eds), The Learning Superhighway: New world? New worries? Proceedings of the Third International Interactive Multimedia Symposium, 280-286. Perth, Western Australia, 21-25 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1996/lp/myers1.html


[ IIMS 96 contents ] [ IIMS Main ] [ ASET home ]
This URL: http://www.aset.org.au/confs/iims/1996/lp/myers1.html
© 1996 Promaco Conventions. Reproduced by permission. Last revision: 15 Jan 2004. Editor: Roger Atkinson
Previous URL 17 Feb 2001 to 30 Sep 2002: http://cleo.murdoch.edu.au/gen/aset/confs/iims/96/lp/myers1.html