Significant new developments in the storage capacity and processing capability of the personal computer has enabled the implementation of digital processing techniques for full motion video in Interactive Media Integration applications. Important research into the use of algorithms to compress image data, and the development of worldwide standards for image compression have allowed some major computer companies to produce hardware and software which can digitally store and replay image and video information utilising a computer hard disc or compact disc. This has the potential to revolutionise Interactive Media Integration applications since the user has easy access to low cost storage and image replay facilities and can readily edit sequences without costly specialist equipment.In addition, some of the compression techniques which permit storage and replay are being applied to the transmission of image and video information around computer networks. This capability is raising the potential of multiple access to Interactive Media applications, providing for advances in applications such as the desktop office.
This paper reviews the technological situation which has produced some of these advances in image handling capability. The development of image compression standards and their implementation is discussed. The implications for the multimedia developer and user, of new storage techniques are commented upon. The potential impact of network applications are discussed and the technological developments towards new applications in multimedia are reviewed.
Most multimedia applications, using all these features, centre on the combination of computer derived text, graphics and special effects with video from an analogue storage device such as a video cassette recorder or videodisc. Whilst this approach produces a perfectly acceptable multimedia platform, it requires the addition to the computer of a peripheral video player and appropriate signal processor. Videodisc player technology, utilising laser reading technology, produces high quality full motion video in conventional analogue signal form and a device is then needed to combine and lock the video signal from the video player with the computer monitor signal. The computer derived text or graphics are then overlaid on the video image, producing the final visual effect Since the search speed of the video disc can be quite slow (several hundred milliseconds), the capability of an application to manipulate video in a creative manner is severely limited. In addition, it is difficult to modify video images during replay, reducing the special effects capability of the platform.
Clearly it is more desirable to store and process the images on the computer to take advantage of the processing capability of modern machines. This requires image information to he stored in a digital format to provide accurate and rapid retrieval. In order for the image information to be processed in a manner which can allow suitable applications to replay, review and edit the information, the development of a number of important techniques and computer technology advances were required.
Fortunately the advent of parallel processors and faster computers have now brought video processing capability to the desktop computer. Techniques in data compression have also reduced storage requirements to more acceptable levels.
The storage of images on a computer is becoming increasing complex as the requirement to store higher resolution images for monitors with better colour capability arises. The images are rarely stored in raw digitised data format for a number of reasons. Firstly, the size of the image file would be prohibitively large and secondly, a raw block of image data is meaningless without some information describing the properties of the image. This information would include, for example, size of the image, colour information and the amount of data used to describe each pixel. Several image coding and storage formats are available to cater for a wide range of image data types.
Image and video coding rely on the inherent redundancy of information within an image or video sequence and on the characteristics of the human visual system (HVS) to produce image data compression and decompression without causing intolerable image quality degradation. The amount of image degradation is subjective and even though models of the HVS are commonly included in coding schemes, some visual artefacts of compression are noticeable to the viewer. Some of these are more noticeable than others. Even though a compression technique may produce good theoretical results, the inadequacies and individual variations in the HVS models require that compression techniques also produce good practical performance.
The coding technique must recognise inherent redundancy in the image and be able to remove it to require less storage capacity, then replace the information in the decompression process without causing unacceptable image degradation. This redundancy can usually be found in the inter-element correlation in both the spatial and temporal domains in the image. Spatial domain redundancy is usually found in high colour correlation between neighbouring pixels, whilst temporal domain redundancy can be yielded from similarities in successive video frames. Compression techniques make use of this redundancy to provide high compression ratios with the final ratio being dependant on the amount of redundancy present in the image and the compression technique used. Image coding therefore comprises a sequence of processes, performed in a predetermined order, and applied to the source image to produce a digitally encoded and compressed version of the source image data. Decompression is performed by inverting each of the encoder operations and executing them in the reverse order, converting the compressed data stream into a reconstructed image.
A wide range of image coding procedures have been developed to enable compression techniques to be applied to image information. These techniques range from simple Run Length Encoding (RLE) to the application of Discrete Cosine Transform and Entropy Coding algorithms. Many variations of these techniques are available, each claiming attributes which suit differing image data requirements. Most encoding procedures utilise the Discrete Cosine transform because of its reversibility and its capability to produce symmetrical compression /decompression but they also include at least one other encoding process such as quantisation or motion compensation /estimation to produce a viable compression algorithm [1].
The limited bandwidth of computer networks, and the size of image data files to be transmitted also require that some image processing techniques be applied for image transmission. These processes may require image compression to be performed but other techniques such as Progressive Image Transfer techniques are gaining ground as more appropriate image handling methods. Progressive Image Transmission (PIT) is a technique which allows fast approximations of the image to be quickly transmitted to the display device. This permits the user to interpret the image before more lengthy transmission of the final high quality image is complete. A simple implementation of PIT techniques displays a very low resolution of the image first, then progressively updates the image until the final high quality resolution is reached. The aim of these PIT techniques is to complete the reconstruction of a recognisable image in the same time, or less, than it would take using more conventional transmission procedures. Often, only 10% of the image data is required to recognise a progressively displayed image whilst it may take 50% or more data to recognise an image transmitted using Run Length Encoding techniques. In addition, data at the very bottom of an RLE image will not become visible until the end of image transmission, whilst a PIT image is reconstructed in full frame low resolution form, rendering the whole image visible simultaneously.
Medium | JPEG | H.261 | MPEG |
Image compression | Tele conference | Video and audio compression | |
Coding algorithm | DCT based | DCT based | DCT based |
Quality | High | Low | High |
Compression ratio | 8-100:1 | 100-300:1 | 80-275:1 |
Implementation cost | Low | Low | High |
Standards organisation | CCITT/ISO | CCITT | CCITT/ISO |
The three main standards shown in Table 1 have substantial parts of their encoding process in common even though they are directed at different image media. They are all based on the DCT algorithm and H.261 and MPEG both use the similar techniques for temporal domain compression. A decoder which is capable of decoding any compressed data stream can be designed with only a little extra complexity compared to that of a single standard encoder.
In addition to these three main standards a number of companies have developed proprietary algorithms performing full motion video compression. History has shown that standards are sometimes superseded by better non-standard efforts but both JPEG and H.261 already enjoy the backing of a number of large companies and are therefore more likely to be adopted as worldwide standards. Little information is available for proprietary algorithms due to the commercial nature of their applications.
DVI, UVC and Apple solutions are designed to be used with a computer platform and the image data is either stored /retrieved from a hard disc or CD, whilst CD-I addresses a different market and provides a complete integrated stand alone system.
C-Cube Microsystems were the first company to produce a JPEG compliant solution. Their 10 MHz version of the CL550A image compression processor was released as an evaluation board for the Apple Macintosh in May 1990. This board allows near real time compression of still images of varying size at a theoretical compression range of 8-200:1. At more practical compression ratios of 10-25:1 reconstructed images are nearly indistinguishable from the original. It is understood that the upgrade 25 MHz version has just been released. Other companies are concentrating efforts on supplying VLSI chips which comply with JPEG. Software solutions complying with JPEG are also available, for example, Picture Press offer a compression /decompression package with a board for the Apple which can achieve compression ratios of 100:1.
The introduction of the Integrated Services Digital Network (ISDN) has permitted the birth of the H.261 standard. Since 1989 a number of companies have been offering codecs to implement H.261 for video teleconferencing on the digital network. These have been quite expensive but companies such as LSI Corporation are producing chip sets which are capable of performing H.261 compression in real time using a plug in board for the personal computer.
Compact Disc Interactive is a specification for a computer system built around a CD-ROM drive. It is an entirely different concept to all the other products available in that it offers a multimedia solution providing an interactive CD-ROM based environment The CD-I products are oriented towards the consumer market which includes home entertainment, games and education. CD-I allows the integration of video, still pictures, audio, graphics and text providing the user with a complete multimedia CD-ROM player. A number of companies are producing CDs compatible with CD-I technology.
Algorithms for computer network image transmission are at a very early developmental stage and it is important to ensure that these standards are developed to include computer network requirements.
Authors: Dr. T. H. Edgar, Mr. C. V. Steffen, Mr. D. A. Newman IMAGE Technology Research Group School of Electrical and Computer Engineering Curtin University of Technology Perth, Western Australia Please cite as: Edgar, T. H., Steffen, C. V. and Newman, D. A. (1992). Digital storage of image and video sequences for interactive media integration applications: A technical review. In Promaco Conventions (Ed.), Proceedings of the International Interactive Multimedia Symposium, 279-284. Perth, Western Australia, 27-31 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1992/edgar1.html |