Personal tools
You are here: Home Research Projects Meeting Recorder
 

Context-Aware Meeting Recorder

Imagine you could create an audio-visual record of your entire life. Surprisingly this would only require 500 TB of data (assuming 100 years, 24h, 10 MB a minute). With current improvements in storage technology this will be available to the average user in the foreseeable future.
However, the retrieval of such data is not trivial. Humans do not retrieve information by date and time, but rather associate items of information with each other. This project addresses this issue by not only recording audio and video, but also contextual information, such as the users activity and the flow of discussion in a meeting. It thus allows to distinguish different phases of a meeting, such as discussion, presentation, or breaks or to find specific comments by particular meeting participants.

The context is recorded using two different sensor systems: a network of body-worn acceleration sensor and a microphone. The acceleration network is depicted on the left. It is used to acquire information about the activity of the user, such as walking, sitting or standing. This information allows to tell different phases of a meeting. In the break or during a presentation the user is most likely to stand, while during the meeting he will probably sit. These are important cues for finding information in an associative way.

The second sensor is a microphone. Apart from using it for doing the actual recording, two kinds of additional information are computed on the audio stream. Firstly it can be distinguished, whether the user was speaking or not. This allows to find stretches in the meeting, in which he was actively participating vs. only listening passively. Secondly, a speaker identification algorithm allows to find statements of particular speakers during a meeting. Also, it allows to distinguish a presentation (mainly one speaker) from a discussion (more than one speaker which change often).

These additional annotations are combined in a common retrieval tool. The picture on the right shows an example screen-shot (click to view larger picture). It allows the user to easily find and select the parts of the recording, that he is looking for. A special algorithm allows to browse large audio recordings by allowing to trade precision in time of the speaker identification against error rate: in a long stretch (eg. 1 hour) it does not matter so much, if the precision in time is very fine, because the user will want to do a finer search on a shorter stretch anyway. In the short stretch however, the time precision must be fine, while it is affordable, that the error rate is higher. 

We have shown how these personalized annotations can be automatically generated and used for retrieval. We believe that this kind of personal annotations are a very interesting application for wearable computing technology and for retrieval applications. 


Publications:


Video: 

Demonstration of Body-Worn Acceleration Sensor Signals (AVI, 9.5 MB)


Contact:

Nicky Kern (kern@inf.ethz.ch), Bernt Schiele (Bernt Schiele)
Holger Junker (junker@ife.ee.ethz.ch), Paul Lukowicz (lukowicz@ife.ee.ethz.ch), Gerhard Tröster (troester@ife.ee.ethz.ch)
by webmfritz last modified 2005-10-06 12:15