Personal tools
You are here: Home Research Projects Categorization eth80-db.html
 
PCCV - Object Categorization - The ETH-80 Database
Object Categorization

The ETH-80 Image Set

Existing publicly available image databases, like the COIL [Murase95], have been very influential. One corner-stone for the COGVIS-project is therefore the construction of a common object database, which serves as the basis for both psychophysical and computational studies concerning object recognition and categorization. In this section, we present the ETH-80 image set, a first subset of the COGVIS database, targeted specifically to the task of object categorization. The ETH-80 database contains 80 objects from 8 carefully chosen categories, high-resolution color images, and segmentation masks for every image.

Motivation

It is important to emphasize that the notion and the abstraction level of object classes is far from being uniquely and clearly defined. Notably, the question of how humans organize knowledge at different levels has received much attention in Cognitive Psychology [Brown58]. Taking an example from Brown's work, a dog can not only be thought of as a dog, but also as a boxer, a quadruped, or in general an animate being [Brown58]. Yet, dog is the term that comes to mind most easily, which is by no means accidental. Experiments show that there is a basic level in human categorization at which most knowledge is organized [Rosch76]. According to Rosch et al. [Rosch76,Lakoff87], this basic level is also

  • the highest level at which category members have similar perceived shape.
  • the highest level at which a single mental image can reflect the entire category.
  • the highest level at which a person uses similar motor actions for interacting with category members.
  • the level at which human subjects are usually fastest at identifying category members.
  • the first level named and understood by children.

These points are the motivation for us to address multi-level object categorization rather than the less clearly defined problem of object classification. Basic level categorization is easiest for humans. At the next lower levels, subordinate categories and the exemplar level used in object identification can be found. The next higher level, superordinate categories, requires a higher degree of abstraction and world knowledge. It is thus useful to start the generic object recognition task in the framework of basic-level categories, which seem to be a good starting point for visual classification.

The current version of the database is restricted to basic level categories. In a first step, we explicitly do not want to model functional categories (e.g. ``things you can sit on'') and ad-hoc categories (e.g. ``things you can find in an office environment'') [Barsalou83]. Even though those categories are important, they exist only on a higher level of abstraction and require a high degree of world knowledge and experience living in the real world.

The Database

Objects in the ETH-80 Database
Objects in the ETH-80 Database
The following figure shows the current status of our database. We include both biological and artificial (human-made) objects in 8 basic-level categories from the following superordinate areas:

  • ``fruits & vegetables'': apples, pears, tomatoes
  • ``animals'': cows, dogs, horses
  • ``human-made, small (graspable)'': cups
  • ``human-made, big'' (e.g. vehicles): cars
Objects from these areas have different affordances, that is different ways of interacting with the environment, and thus different characteristics. For each category, we provide 10 objects that span large in-class variations while still clearly belonging to the category.

Subdivision of an octahedron
Subdivision of an Octahedron
Each object is represented by 41 images from viewpoints spaced equally over the upper viewing hemisphere (at distances of 22.5-26°). The viewing positions were obtained by subdividing the faces of an octahedron to the third recursion level. For collecting the views, we employed an automated robot setup and a blue chromakeying background for easier segmentation. All images have been taken with a Sony DFW-X700 progressive scan digital camera with 1024*768 pixel resolution and a Tamron 6-12mm varifocal lens (F1.4).

For every image, we provide a high-quality segmentation mask, so that shape and contour based methods can be easily applied. An example segmentation mask and the extracted contour can be seen in the figure below. (Click on any of the images to see them in their full resolution).

Example database image with segmentation mask and extracted contour
Original Image Segmentation Mask Extracted Contour

The intended test mode is leave-one-object-out crossvalidation. This means we train with 79 objects and test with the one unknown object. Recognition is considered successful if the correct category label is assigned. The results are averaged over all 80 possible test objects. We use the database for a best case analysis: categorization of unknown objects under the same viewing conditions, with a near-perfect figure-ground segmentation, and known scale. In a practical application, such perfect information is seldomly available. But if an algorithm does not work under these ideal conditions, it is likely to fail in practice.

We have used this database to compare different methods for object categorization. In particular, we want to address the question of what the role of color, texture, and shape is for this task. For this reason, we have analyzed the performance of several state-of-the-art appearance- and contour-based recognition methods on the database categories. A detailed description of the experiments can be found here.


Download the ETH-80 database



Structure:


Publications:


References:

[Brown58] R. Brown, "How Shall a Thing be Called?". Psychological Review, 65:14-21, 1958.
[Rosch76] E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. Boyes-Braem, "Basic Objects in Natural Categories". Cognitive Psychology, 8:382-439, 1976.
[Barsalou83] L.W. Barsalou, "Ad-hoc Categories". Memory and Cognition, 11:211-227, 1983.
[Lakoff87] G. Lakoff, Women, Fire and Dangerous Things - What Categories Reveal about the Mind. Univ. of Chicago Press, 1987.
[Murase95] H. Murase and S.K. Nayar, "Visual Learning and Recognition of 3D Objects from Appearance". International Journal of Computer Vision, 14:5-24, 1995.

Links:


Contact:

Bastian Leibe (leibeATinformatik.tu-darmstadt.de)
Bernt Schiele (schieleATinformatik.tu-darmstadt.de)

Last update: June 16, 2003 by Bastian Leibe

by webmfritz last modified 2006-01-16 16:34