This paper presents an integrated search strategy for image retrieval. It combines the best match search using image description with a pattern recognition technique used to recognise objects in a 2D image. The best match search involves ranking the retrieved images in order of descending similarity with the query, where the similarity for each image is determined by the number of terms the description of the image has in common with the query. Using text, the retrieval performance depends on the accuracy of the description associated with each image. To alleviate this problem, recognised objects in the 2D images are used to improve the retrieval performance. The integrated search strategy is implemented using a collection of sports picture and the results obtained show that the new search strategy improves the retrieval performance considerably.
The picture archival system (PAS) used in this experiment is based on a free text database. Free text based systems provides more flexibility and allow the processing of the unstructured text accompanying the image[l]. The text can be processed automatically and every word in the text can be used to search the database. The free text capability is maintained by an inverted list of index terms (keywords). The architecture is very flexible. The inverted list contains for each keyword a list of the document references to which that keyword has been assigned. In a free text search system, the inverted list contains the text words from the documents and the references to all documents containing each given word. So the documents to be retrieved in response to a given query are identified by obtaining from the inverted list the document reference lists corresponding to each query term.
Since the database consists of colour images, colour may also be used as an object feature. During recognition, features are extracted from the image and compared to the features of each object in the library. If there is a match (within a predefined error bound) then the object is deemed to have been found within the image. Rotational and scale invariance are critical. An important case that we must take care of is that of occluded objects, that is, objects that are partially obscured by other objects. For example, in sport database a basketball may be partially obscured by the hand of a player.
An algorithm for recognition of occluded objects has been implemented by Grimson[5,6,7]. This algorithm divides each curve in each object into linear segments and uses relationships between the line segments (such as separation and angular displacement) to match library objects with the image. The search is structured as a constrained depth first search, using an interpretation tree. The algorithm is computational expensive but this is acceptable since recognition is done off line. Another possibility is dynamic programming techniques. Gorman et. al. [41 use local features obtained by splitting a contour into segments which are described using Fourier descriptors. When two contours are compared, the distances between each of their respective segment sets is displayed in an intersegment distance table. A dynamic programming formulation for this problem is used along with a criterion for comparing the results of many matching attempts. The segment matching technique accommodates arbitrary rotation and translation of the object in the image and does not require precise scale information. Reasonable recognition accuracy is obtained.
Davis[3] uses relaxation techniques to perform shape matching. In his approach, a piece of shape is recognised as an approximate match to part of a larger shape for the class of shapes described by simple closed curves. His algorithms provide matches that are invariant to orientation and to a range of scale changes. Ayache and Faugeras [2] use a local and compact description of object boundaries and a fast recognition method involving generation and recursive evaluation of hypotheses. Hypothesis generation and verification is coupled with a recursive estimation of the model to scene transformation. It is robust to noise and can deal with scale changes.
Therefore, given the image and the library objects, we automatically generate a list of the objects associated with each image in our database. Object of similar shapes are grouped together into one class and assigned a class number and an object code. Once the image associated with the object code, this code can be used later for image retrieval. In the visual search and when the image is selected by the user, the text description and the objects codes associated with this image are used for to retrieve and rank the images.
These problems can be alleviated if the number of inverted lists to be inspected and the number of documents to be evaluated is minimised. To do so, an upper bound algorithm is used [11]. The algorithm maintains two sets, S contains all images examined so far and R contains images which, at any given time, their calculated similarity values are greater than the upper bound similarity value. The size of the set R is determined by the user before the search takes place. This is simply the number of images the user would like to see. Query terms are arranged in decreasing order of the weight assigned to each term. The algorithm processes the query terms starting with the ones which have high weight and a short corresponding inverted list, leaving the ones with long inverted lists to the end (including the extracted objects when images are used as queries). After processing the first list, we apply two conditions. If the rank set is not full then the image can be added to the ranking set, otherwise, the image with the higher weight is inserted in the ranking set, replacing the image with the lowest weight. The second condition is to check whether after processing i-1 query terms we still need to continue. In this case we need to keep track of the image with the lowest weight in the ranking set R and the image with the highest weight in the image list S. If, after calculating the upper bound, we find that the value of the upper bound is less than or equal to the image with the lowest weight in the ranking set R, then we don't need to proceed and the search can be terminated.
For ranking purposes, the image weight resulting from matching the keywords between the image description and the query is incremented by the weight of the matching objects. This results in higher weight for the images which contain similar objects. Objects weights are assigned by the system based on the position of the object in a given class. As described earlier, each extracted object is assigned an object code and a class number. The object code indicates the relative similarity of the object to the rest of object in a given class. The code determined the similarity distance between the objects within the class. The shorter the distance between two objects, the greater the similarity and based on this the system will assigned higher weight to similar objects.
Figure 1: Results of the query "Horse Jumping"
Figure 1 shows the results of the query ''Horse Jumping". The top 12 pictures from the returned list were chosen for inspection. Images in the returned list are ranked in descending order of their similarity to the query. inspecting the results, 8/12 of the pictures selected are about horse jumping and the 4/12 of the pictures are retrieved because the image description contains the words horse and jumping. The word horse in this case could means the animal horse or the pommel horse. Such a problem is common in free text retrieval system in which a word with the same spelling but different meaning results in false hits. As a result the performance of the retrieval system can be affected. One way of overcoming this problem is the use of controlled vocabulary to distinguish between the animal horse and the pommel horse. In image retrieval, the content of the image could help solving the problem by distinguishing the animal horse from the pommel horse.
Figure 2: Edge detection and object extraction
Using the extracted objects such as various shapes of the animal horse Figures (2), we can separate the pictures which contain animal horse from those containing pommel horse. Figure 3 shows the results obtained from the integrated search strategy. Comparing the results in Figure 1 with the results in Figure 3, we can see that all the pictures about the pommel horse are removed from the top 12 images. We should note here that the use of the extracted features in the query retrieves all the associated images which might not contain the query terms Horse and Jumping. This is advantageous over the text based approach in which images can only he retrieved if the image description contains at least one of the query terms.
Figure 3: Sample results from the combined search
Authors: S. Al-Hawamdeh and S. Narayanaswamy, Institute of Systems Science, National University of Singapore, Kent Bridge, Singapore 0511
Please cite as: Al-Hawamdeh, S. and Narayanaswamy, S. (1992). An integrated search strategy for image retrieval. In Promaco Conventions (Ed.), Proceedings of the International Interactive Multimedia Symposium, 349-359. Perth, Western Australia, 27-31 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1992/al-hawamdeh.html |