IIMS 1992: Edgar et al - digital storage of image and video sequences for interactive media applications

Digital storage of image and video sequences for interactive media integration applications: A technical review

T. H. Edgar, C. V. Steffen and D. A. Newman
IMAGE Technology Research Group
School of Electrical and Computer Engineering
Curtin University of Technology

Significant new developments in the storage capacity and processing capability of the personal computer has enabled the implementation of digital processing techniques for full motion video in Interactive Media Integration applications. Important research into the use of algorithms to compress image data, and the development of worldwide standards for image compression have allowed some major computer companies to produce hardware and software which can digitally store and replay image and video information utilising a computer hard disc or compact disc. This has the potential to revolutionise Interactive Media Integration applications since the user has easy access to low cost storage and image replay facilities and can readily edit sequences without costly specialist equipment.
In addition, some of the compression techniques which permit storage and replay are being applied to the transmission of image and video information around computer networks. This capability is raising the potential of multiple access to Interactive Media applications, providing for advances in applications such as the desktop office.
This paper reviews the technological situation which has produced some of these advances in image handling capability. The development of image compression standards and their implementation is discussed. The implications for the multimedia developer and user, of new storage techniques are commented upon. The potential impact of network applications are discussed and the technological developments towards new applications in multimedia are reviewed.

Introduction

Interactive Media Integration techniques use motion video, audio, still images, graphics, text and special effects to enable the computer to be used as a powerful tool in an educational, training or information exploration setting.

Most multimedia applications, using all these features, centre on the combination of computer derived text, graphics and special effects with video from an analogue storage device such as a video cassette recorder or videodisc. Whilst this approach produces a perfectly acceptable multimedia platform, it requires the addition to the computer of a peripheral video player and appropriate signal processor. Videodisc player technology, utilising laser reading technology, produces high quality full motion video in conventional analogue signal form and a device is then needed to combine and lock the video signal from the video player with the computer monitor signal. The computer derived text or graphics are then overlaid on the video image, producing the final visual effect Since the search speed of the video disc can be quite slow (several hundred milliseconds), the capability of an application to manipulate video in a creative manner is severely limited. In addition, it is difficult to modify video images during replay, reducing the special effects capability of the platform.

Clearly it is more desirable to store and process the images on the computer to take advantage of the processing capability of modern machines. This requires image information to he stored in a digital format to provide accurate and rapid retrieval. In order for the image information to be processed in a manner which can allow suitable applications to replay, review and edit the information, the development of a number of important techniques and computer technology advances were required.

Image storage and transmission techniques

The volume of data representing real time video sequences converted into a simple digital format can comprise as much as 25 Megabytes of image data per second. One hour of video represents as much as 90 Gigabytes of information, even without sound included. For the computer to be able to process this much information it needs to be able to read the information from the storage device, reconvert it back to an image and display the information all in the interval between video frames (1/25th of a second, for PAL video signals). In addition, the storage device needs to be able to store a reasonable length of video sequence and also have space to store the application and other programs. Without techniques to reduce the amount of data storage capacities would need to be several hundred Gigabytes rendering the storage of video information an improbable option.

Fortunately the advent of parallel processors and faster computers have now brought video processing capability to the desktop computer. Techniques in data compression have also reduced storage requirements to more acceptable levels.

The storage of images on a computer is becoming increasing complex as the requirement to store higher resolution images for monitors with better colour capability arises. The images are rarely stored in raw digitised data format for a number of reasons. Firstly, the size of the image file would be prohibitively large and secondly, a raw block of image data is meaningless without some information describing the properties of the image. This information would include, for example, size of the image, colour information and the amount of data used to describe each pixel. Several image coding and storage formats are available to cater for a wide range of image data types.

Image and video coding rely on the inherent redundancy of information within an image or video sequence and on the characteristics of the human visual system (HVS) to produce image data compression and decompression without causing intolerable image quality degradation. The amount of image degradation is subjective and even though models of the HVS are commonly included in coding schemes, some visual artefacts of compression are noticeable to the viewer. Some of these are more noticeable than others. Even though a compression technique may produce good theoretical results, the inadequacies and individual variations in the HVS models require that compression techniques also produce good practical performance.

The coding technique must recognise inherent redundancy in the image and be able to remove it to require less storage capacity, then replace the information in the decompression process without causing unacceptable image degradation. This redundancy can usually be found in the inter-element correlation in both the spatial and temporal domains in the image. Spatial domain redundancy is usually found in high colour correlation between neighbouring pixels, whilst temporal domain redundancy can be yielded from similarities in successive video frames. Compression techniques make use of this redundancy to provide high compression ratios with the final ratio being dependant on the amount of redundancy present in the image and the compression technique used. Image coding therefore comprises a sequence of processes, performed in a predetermined order, and applied to the source image to produce a digitally encoded and compressed version of the source image data. Decompression is performed by inverting each of the encoder operations and executing them in the reverse order, converting the compressed data stream into a reconstructed image.

A wide range of image coding procedures have been developed to enable compression techniques to be applied to image information. These techniques range from simple Run Length Encoding (RLE) to the application of Discrete Cosine Transform and Entropy Coding algorithms. Many variations of these techniques are available, each claiming attributes which suit differing image data requirements. Most encoding procedures utilise the Discrete Cosine transform because of its reversibility and its capability to produce symmetrical compression /decompression but they also include at least one other encoding process such as quantisation or motion compensation /estimation to produce a viable compression algorithm [1].

The limited bandwidth of computer networks, and the size of image data files to be transmitted also require that some image processing techniques be applied for image transmission. These processes may require image compression to be performed but other techniques such as Progressive Image Transfer techniques are gaining ground as more appropriate image handling methods. Progressive Image Transmission (PIT) is a technique which allows fast approximations of the image to be quickly transmitted to the display device. This permits the user to interpret the image before more lengthy transmission of the final high quality image is complete. A simple implementation of PIT techniques displays a very low resolution of the image first, then progressively updates the image until the final high quality resolution is reached. The aim of these PIT techniques is to complete the reconstruction of a recognisable image in the same time, or less, than it would take using more conventional transmission procedures. Often, only 10% of the image data is required to recognise a progressively displayed image whilst it may take 50% or more data to recognise an image transmitted using Run Length Encoding techniques. In addition, data at the very bottom of an RLE image will not become visible until the end of image transmission, whilst a PIT image is reconstructed in full frame low resolution form, rendering the whole image visible simultaneously.

Image and video coding standards

Since a wide range of encoding techniques are possible and the computer industry is now producing the technology to include digital storage of video information, it has become necessary to develop an acceptable world wide set of standards to ensure compatibility of image processing technology. Efforts to formulate standards defining the coded representation of pictures dates as far back as September 1982 when the International Standards Organisation (ISO) established Working Group 8 (WG8) to consider coding algorithms. It was decided that the Discrete Cosine Transform produced superior results and most image coding standards now employ this technique. Concurrent efforts by the International Telegraph and Telephone Consultative Committee (CCITT) led to the formation of joint study groups to achieve standardisation of video, image and audio coding. Three main image coding standards affecting multimedia applications have resulted from these efforts. These are the Joint Photographics Experts Group (JPEG) [2], CCITT H.261 [3] and the Motion Picture Experts Group (MPEG) [4] standards. A comparison between standards is presented in Table 1.

Table 1

Medium JPEG H.261 MPEG

Image compression Tele conference Video and audio compression

Coding algorithm DCT based DCT based DCT based

Quality High Low High

Compression ratio 8-100:1 100-300:1 80-275:1

Implementation cost Low Low High

Standards organisation CCITT/ISO CCITT CCITT/ISO

The three main standards shown in Table 1 have substantial parts of their encoding process in common even though they are directed at different image media. They are all based on the DCT algorithm and H.261 and MPEG both use the similar techniques for temporal domain compression. A decoder which is capable of decoding any compressed data stream can be designed with only a little extra complexity compared to that of a single standard encoder.

In addition to these three main standards a number of companies have developed proprietary algorithms performing full motion video compression. History has shown that standards are sometimes superseded by better non-standard efforts but both JPEG and H.261 already enjoy the backing of a number of large companies and are therefore more likely to be adopted as worldwide standards. Little information is available for proprietary algorithms due to the commercial nature of their applications.

Hardware and software implementation

To date only a limited number of complete interactive multimedia solutions, incorporating compressed digital video and audio have been announced. The hardware solutions are INTEL/IBM's Digital Video Interactive (DVI), Phillips Compact Disc Interactive (CD-I) and UVC's Multimedia 1. Apple have produced a software compression solution called "Quicktime". It is believed that the new DVI (Action Media 11) board includes an open architecture algorithm which will permit the user, in later releases, to implement their own algorithm (and hence comply with MPEG) if required, whilst it is understood that CD-I uses a similar technique to the MPEG standard.

DVI, UVC and Apple solutions are designed to be used with a computer platform and the image data is either stored /retrieved from a hard disc or CD, whilst CD-I addresses a different market and provides a complete integrated stand alone system.

C-Cube Microsystems were the first company to produce a JPEG compliant solution. Their 10 MHz version of the CL550A image compression processor was released as an evaluation board for the Apple Macintosh in May 1990. This board allows near real time compression of still images of varying size at a theoretical compression range of 8-200:1. At more practical compression ratios of 10-25:1 reconstructed images are nearly indistinguishable from the original. It is understood that the upgrade 25 MHz version has just been released. Other companies are concentrating efforts on supplying VLSI chips which comply with JPEG. Software solutions complying with JPEG are also available, for example, Picture Press offer a compression /decompression package with a board for the Apple which can achieve compression ratios of 100:1.

The introduction of the Integrated Services Digital Network (ISDN) has permitted the birth of the H.261 standard. Since 1989 a number of companies have been offering codecs to implement H.261 for video teleconferencing on the digital network. These have been quite expensive but companies such as LSI Corporation are producing chip sets which are capable of performing H.261 compression in real time using a plug in board for the personal computer.

Compact Disc Interactive is a specification for a computer system built around a CD-ROM drive. It is an entirely different concept to all the other products available in that it offers a multimedia solution providing an interactive CD-ROM based environment The CD-I products are oriented towards the consumer market which includes home entertainment, games and education. CD-I allows the integration of video, still pictures, audio, graphics and text providing the user with a complete multimedia CD-ROM player. A number of companies are producing CDs compatible with CD-I technology.

Conclusions

The growth of the technology indicates that there is a need for cost effective solutions and the development of a wide range of application packages. Storage of . image data in digital format permits great flexibility in the storage and manipulation of images but the data requires the application of compression procedures to enable maximum benefit to be obtained from storing data on computer hard discs or CDs. The rapid growth of the technology has also shown the need for standardisation to provide a common platform for the exchange of visual and audio data. The development of these standards is at a very early stage and produces a natural caution on the part of the developer of applications in choosing a delivery platform. The fact that the technological developments have been ahead of discussions on standards has led to the implementation of a range of proprietary algorithms which do not conform to an agreed standard. It appears however that most developers are tending towards providing platforms which can be used to comply with MPEG standards. The implementation of standards appears to be now catching up with the technology.

Algorithms for computer network image transmission are at a very early developmental stage and it is important to ensure that these standards are developed to include computer network requirements.

References

Viscito, E. and Gonzales, C. A. (1990). Encoding of motion video sequences for the MPEG environment using arithmetic coding, Visual Communication and Image Processing '90, Vol 1360, pp 1572-1576, October 1990.
Wallace, G. K. (1991). The JPEG still picture compression standard. Communications of the ACM, 34(4, April), 30-44.
CCITT Draft Revision of Recommendation H.261 (1990). Video codec for audiovisual services at 64 kbits/second. July 1990.
Le Gall, D. J. (1991). MPEG: A video compression standard for multimedia applications. Communications of the ACM, 34(4, April), 46-58.

Authors: Dr. T. H. Edgar, Mr. C. V. Steffen, Mr. D. A. Newman
IMAGE Technology Research Group
School of Electrical and Computer Engineering
Curtin University of Technology
Perth, Western Australia
Please cite as: Edgar, T. H., Steffen, C. V. and Newman, D. A. (1992). Digital storage of image and video sequences for interactive media integration applications: A technical review. In Promaco Conventions (Ed.), Proceedings of the International Interactive Multimedia Symposium, 279-284. Perth, Western Australia, 27-31 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1992/edgar1.html

[ IIMS 92 contents ] [ IIMS Main ] [ ASET home ]
This URL: http://www.aset.org.au/confs/iims/1992/edgar1.html
© 1992 Promaco Conventions. Reproduced by permission. Last revision: 4 Apr 2004. Editor: Roger Atkinson
Previous URL 6 Apr 2000 to 30 Sep 2002: http://cleo.murdoch.edu.au/gen/aset/confs/iims/92/edgar1.html

Medium	JPEG	H.261	MPEG
Medium	Image compression	Tele conference	Video and audio compression
Coding algorithm	DCT based	DCT based	DCT based
Quality	High	Low	High
Compression ratio	8-100:1	100-300:1	80-275:1
Implementation cost	Low	Low	High
Standards organisation	CCITT/ISO	CCITT	CCITT/ISO