IIMS 1992: Al-Hawamdeh and Narayanaswamy - an integrated search strategy for image retrieval

An integrated search strategy for image retrieval

S. Al-Hawamdeh and S. Narayanaswamy
Institute of Systems Science
National University of Singapore

This paper presents an integrated search strategy for image retrieval. It combines the best match search using image description with a pattern recognition technique used to recognise objects in a 2D image. The best match search involves ranking the retrieved images in order of descending similarity with the query, where the similarity for each image is determined by the number of terms the description of the image has in common with the query. Using text, the retrieval performance depends on the accuracy of the description associated with each image. To alleviate this problem, recognised objects in the 2D images are used to improve the retrieval performance. The integrated search strategy is implemented using a collection of sports picture and the results obtained show that the new search strategy improves the retrieval performance considerably.

1. Introduction

One approach to developing an image retrieval system is to extend the existing information systems to include images. In this case images are accessed using alphanumeric data such as file name or image number. Typical examples include: Database Management Systems (DBMS), Object Oriented Databases (OODB), Information Retrieval Systems (IRS) and Knowledge Base Systems (KBS). While this approach is easy, it has a limited capacity since the existing retrieval facilities were designed to accommodate alphanumeric and textual data. The other approach is to develop a new type of image retrieval system. This could include the development of a pattern matching based image retrieval system or a combination of that and the existing information systems. Pattern recognition involves feature extraction, context processing and pattern classification. In feature extraction, shapes and objects are identified through detecting edges and angles. Once the shape of the object is identified, it is measured to obtain quantitative values, or compared with previously stored patterns of objects. The next step is to apply some sort of discrimination function to classify the objects into one of many predetermined categories.

The picture archival system (PAS) used in this experiment is based on a free text database. Free text based systems provides more flexibility and allow the processing of the unstructured text accompanying the image[l]. The text can be processed automatically and every word in the text can be used to search the database. The free text capability is maintained by an inverted list of index terms (keywords). The architecture is very flexible. The inverted list contains for each keyword a list of the document references to which that keyword has been assigned. In a free text search system, the inverted list contains the text words from the documents and the references to all documents containing each given word. So the documents to be retrieved in response to a given query are identified by obtaining from the inverted list the document reference lists corresponding to each query term.

2. Retrieval techniques

2.1 Pattern recognition

The image recognition problem can be expressed as : given a digitised image, and given a library of objects, look for occurrences of each library object within the image. Objects that are relevant to our application domain are selected for inclusion into an object library. The library may be extracted from "good" images or, alternatively, created manually. During recognition, features are extracted from the image and some sort of feature matching is used to determine if a particular library object resides within the current image. In the initial implementation, objects are modelled as sets of linear segments. The features used are the size, location and orientation of each line segment as well as the angles between linear segments.

Since the database consists of colour images, colour may also be used as an object feature. During recognition, features are extracted from the image and compared to the features of each object in the library. If there is a match (within a predefined error bound) then the object is deemed to have been found within the image. Rotational and scale invariance are critical. An important case that we must take care of is that of occluded objects, that is, objects that are partially obscured by other objects. For example, in sport database a basketball may be partially obscured by the hand of a player.

An algorithm for recognition of occluded objects has been implemented by Grimson[5,6,7]. This algorithm divides each curve in each object into linear segments and uses relationships between the line segments (such as separation and angular displacement) to match library objects with the image. The search is structured as a constrained depth first search, using an interpretation tree. The algorithm is computational expensive but this is acceptable since recognition is done off line. Another possibility is dynamic programming techniques. Gorman et. al. [41 use local features obtained by splitting a contour into segments which are described using Fourier descriptors. When two contours are compared, the distances between each of their respective segment sets is displayed in an intersegment distance table. A dynamic programming formulation for this problem is used along with a criterion for comparing the results of many matching attempts. The segment matching technique accommodates arbitrary rotation and translation of the object in the image and does not require precise scale information. Reasonable recognition accuracy is obtained.

Davis[3] uses relaxation techniques to perform shape matching. In his approach, a piece of shape is recognised as an approximate match to part of a larger shape for the class of shapes described by simple closed curves. His algorithms provide matches that are invariant to orientation and to a range of scale changes. Ayache and Faugeras [2] use a local and compact description of object boundaries and a fast recognition method involving generation and recursive evaluation of hypotheses. Hypothesis generation and verification is coupled with a recursive estimation of the model to scene transformation. It is robust to noise and can deal with scale changes.

Therefore, given the image and the library objects, we automatically generate a list of the objects associated with each image in our database. Object of similar shapes are grouped together into one class and assigned a class number and an object code. Once the image associated with the object code, this code can be used later for image retrieval. In the visual search and when the image is selected by the user, the text description and the objects codes associated with this image are used for to retrieve and rank the images.

2.2 Best match search

The basic idea of the best match search is to process the query lists against the inverted files, allocating a counter to each document encountered and setting it to one. Every time that a document appears again, its counter is incremented by one; the result in each counter at the end is the number of matching terms in common between the query and the corresponding document[10]. If the query and document terms have attached weights, then the result in each counter is incremented by the product of such weights rather than by one. But while this algorithm minimises the number of accesses to the document file using the inverted list, we still need to process the inverted lists corresponding to the query terms and calculate the similarity between each of the documents in the inverted lists and the query. As the number of documents in the database grows, the size of the inverted lists also grows. This results in access and computational problems.

These problems can be alleviated if the number of inverted lists to be inspected and the number of documents to be evaluated is minimised. To do so, an upper bound algorithm is used [11]. The algorithm maintains two sets, S contains all images examined so far and R contains images which, at any given time, their calculated similarity values are greater than the upper bound similarity value. The size of the set R is determined by the user before the search takes place. This is simply the number of images the user would like to see. Query terms are arranged in decreasing order of the weight assigned to each term. The algorithm processes the query terms starting with the ones which have high weight and a short corresponding inverted list, leaving the ones with long inverted lists to the end (including the extracted objects when images are used as queries). After processing the first list, we apply two conditions. If the rank set is not full then the image can be added to the ranking set, otherwise, the image with the higher weight is inserted in the ranking set, replacing the image with the lowest weight. The second condition is to check whether after processing i-1 query terms we still need to continue. In this case we need to keep track of the image with the lowest weight in the ranking set R and the image with the highest weight in the image list S. If, after calculating the upper bound, we find that the value of the upper bound is less than or equal to the image with the lowest weight in the ranking set R, then we don't need to proceed and the search can be terminated.

2.3 Integrated search strategy

In the initial search users state their queries in text format. Once the first batches of images is returned and displayed on the screen for inspection. Users can use these images to formulate a new query. In this case, the image description (text description) and the image objects extracted earlier are used to perform the search. Keywords extracted from the image description are checked against the inverted list and the corresponding images are retrieved. In the same manner, image components associated with the selected image/s are used to identify images in the database which contain similar objects.

For ranking purposes, the image weight resulting from matching the keywords between the image description and the query is incremented by the weight of the matching objects. This results in higher weight for the images which contain similar objects. Objects weights are assigned by the system based on the position of the object in a given class. As described earlier, each extracted object is assigned an object code and a class number. The object code indicates the relative similarity of the object to the rest of object in a given class. The code determined the similarity distance between the objects within the class. The shorter the distance between two objects, the greater the similarity and based on this the system will assigned higher weight to similar objects.

3. Results and discussion

PAS is a picture archival system contains 600 sport Pictures captured using a digital video camera attached to an AT&T Targa board on a personal computer. Each image is capture at 16 bit colour resolution, and optimally quantised to 8 bit colour using, the "median cut" algorithm[81. To reduce storage requirements further and to permit rapid transfer of image data, the processed images are also compressed using the Adaptive Block Truncation Coding (ABTC) compression algorithm [9]. Compression ratio of between 1:4 and 1:8 is achieved. Another level of optimisation is attained by showing a reduced size of each image during the search sessions. The images are displayed more simultaneously and quicker too. This is particularly advantageous during the browsing phase, when the user is more interested in scanning a large number of images in the shortest possible time.

Figure 1: Results of the query "Horse Jumping"

Figure 1 shows the results of the query ''Horse Jumping". The top 12 pictures from the returned list were chosen for inspection. Images in the returned list are ranked in descending order of their similarity to the query. inspecting the results, 8/12 of the pictures selected are about horse jumping and the 4/12 of the pictures are retrieved because the image description contains the words horse and jumping. The word horse in this case could means the animal horse or the pommel horse. Such a problem is common in free text retrieval system in which a word with the same spelling but different meaning results in false hits. As a result the performance of the retrieval system can be affected. One way of overcoming this problem is the use of controlled vocabulary to distinguish between the animal horse and the pommel horse. In image retrieval, the content of the image could help solving the problem by distinguishing the animal horse from the pommel horse.

Figure 2: Edge detection and object extraction

Using the extracted objects such as various shapes of the animal horse Figures (2), we can separate the pictures which contain animal horse from those containing pommel horse. Figure 3 shows the results obtained from the integrated search strategy. Comparing the results in Figure 1 with the results in Figure 3, we can see that all the pictures about the pommel horse are removed from the top 12 images. We should note here that the use of the extracted features in the query retrieves all the associated images which might not contain the query terms Horse and Jumping. This is advantageous over the text based approach in which images can only he retrieved if the image description contains at least one of the query terms.

Figure 3: Sample results from the combined search

4. Conclusion

In this paper we have discussed an integrated search strategy for image retrieval. Objects within the 2D images are recognised, extracted and used to enhance the image retrieval. Results obtained show that in certain cases where using text is a problem (due to insufficient text description or natural language ambiguity), recognised objects can be used successfully to identify relevant images. A typical application where text description is insufficient is the identification of logos in a trade mark system. Logos for example can have different shapes of the letter M or different shapes of a butterfly. Classifying these logos according to their shapes alleviate the problem associated with describing these logos using text.

5. References

Al-Hawamdeh, S. et. al. (1991). Nearest neighbour searching in a picture archival system. Proceeding of ACM International Conference on Multimedia and Information Systems, Institute of Systems Science, Singapore, Jan 16-19.
Ayache N. and Faugeras, O. D. (1986). HYPER: A new approach for the recognition and positioning of two-dimensional objects. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 8(1), 44-54.
Davis, Larry S. (1979). Shape matching using relaxation techniques. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 1(1), 60-72.
Gorman, L. W., Mitchell, O. R. and Kuhl, F. P. (1988). Partial shape recognition using dynamic programming, IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 10(2), 257-260.
Grimson, W. E. L. (1988). On the recognition of parameterized 2D objects. International Journal of Computer Vision, 3, 353-372.
Grimson, W. E. L. (1989). On the recognition of curved objects. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 11(6), 632-643.
Grimson, W. E. L. and Lozano-Perez, T. (1987). Localizing overlapping parts by searching the interpretation tree. IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI), 9(4), 469-482.
Heckbert, P. (1982). Color image quantization for frame buffer display. Computer Graphics, 16(3), 297-307.
Hui, L. (1989). Optimum mean-square block truncation coding of monochrome images. IEEE International Conference on Image Processing (ICIP '89).
Noreault, T., Koll, M. and McGill, M. J. (1977). Automatic ranked output from Boolean searches in SIRE. Journal of the American Society for Information Science, 28, 333-339.
Smeaton, A. F. and van Rijsbergen, C. J. (1981). The nearest neighbour problem in information retrieval. An algorithm using upper bound. ACM SIGIR Forum, 16, 83-87.

Authors: S. Al-Hawamdeh and S. Narayanaswamy, Institute of Systems Science, National University of Singapore, Kent Bridge, Singapore 0511
Please cite as: Al-Hawamdeh, S. and Narayanaswamy, S. (1992). An integrated search strategy for image retrieval. In Promaco Conventions (Ed.), Proceedings of the International Interactive Multimedia Symposium, 349-359. Perth, Western Australia, 27-31 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1992/al-hawamdeh.html

[ IIMS 92 contents ] [ IIMS Main ] [ ASET home ]
This URL: http://www.aset.org.au/confs/iims/1992/al-hawamdeh.html
© 1992 Promaco Conventions. Reproduced by permission. Last revision: 4 Apr 2004. Editor: Roger Atkinson
Previous URL 5 Apr 2000 to 30 Sep 2002: http://cleo.murdoch.edu.au/gen/aset/confs/iims/92/al-hawamdeh.html