Semantic Scene Modeling and Retrieval
Julia Vogel
PhD Thesis, October 2004. [pdf]
Hartung-Gorre Verlag, Konstanz.
ISBN: 3-89649-967-2
Abstract: Semantics-based image retrieval has gained increasing
interest in recent years. As an area in linguistics, semantics deals with the
sense and the meaning of language. In the context of content-based image
retrieval, the research goal is to access the meaning of images by naming or
describing the most important image regions and their relationships.
The topic of this dissertation is the semantic description, understanding, and
modeling of natural scenes. The primary objective is to develop a computational
image representation that reduces the semantic gap between the image
understanding of humans and the computer. For humans, the most intuitive means
of communications about images is image description. Image semantics and image
description are thus closely interconnected.
We propose a semantic modeling of natural scenes that is based on the
classification of local semantic concepts. Image regions are extracted on a
regular 10x10 grid. The resulting patches are classified into nine concept
classes that subsume the main semantic content of the database images. Images
are represented through the frequency of occurrence of the semantic concepts.
This semantic modeling constitutes a compact, semantic image representation that
allows to describe or search for specific image content, or, on a higher level,
to model the semantic content of natural scene categories.
The semantic modeling has been intensively studied for categorization and
retrieval of natural scenes. Depending on the classification method and on the
quality of the concept detectors, good to very good categorization and retrieval
performance has been obtained. In particular, it is shown that the semantic
modeling leads to considerably better categorization and retrieval performance
compared to directly employing low-level features. Nevertheless, the analysis of
the mis-categorized scenes reveals that the regular semantic ambiguity of the
database images demands rather for a typicality ranking of images than for
hard-decision categorization.
This hypothesis is supported in two psychophysical experiments. Humans are able
to consistently categorize images, but the employed database consists to a large
degree of images that can be assigned to several scene categories. However, the
human participants were very consistent in ranking the database images according
to their semantic typicality.
It is shown visually and quantitatively, that the proposed semantic modeling is
also well-suited for semantic ranking of images. In particular, the typicality
transition between two scene categories can be modeled. In addition, we propose
a perceptually plausible distance measure that represents the most discriminant
semantic concepts of each scene category. The typicality ranking obtained with
this distance measure correlates highly with the human rankings.
Finally, this thesis discusses the problem of performance evaluation in
content-based image retrieval systems. When searching for specific local
semantic content, the retrieval results can be modeled statistically. We develop
closed-form expressions for the prediction of precision and recall in our
vocabulary-supported retrieval system. In addition, these expressions allow to
optimize precision and recall by up to 60%.
BibTex Record
@book{JuliaVogel_SemanticSceneModelingandRetrieval,
author = {Julia Vogel},
title = {Semantic Scene Modeling and
Retrieval},
publisher = {Hartung-Gorre Verlag Konstanz},
year = {2004},
series = {Selected Readings in Vision and
Graphics},
number = {33},
editor = {Luc Van Gool and Gabor Szekely and
Markus Gross and Bernt Schiele},
}
|