Context-Aware Meeting Recorder
Imagine you could create an audio-visual record of your entire
life. Surprisingly this would only require 500 TB of data (assuming
100 years, 24h, 10 MB a minute). With current improvements in
storage technology this will be available to the average user in the
foreseeable future.
However, the retrieval of such data is not trivial. Humans do not
retrieve information by date and time, but rather associate items of
information with each other. This project addresses this issue
by not only recording audio and video, but also contextual
information, such as the users activity and the flow of discussion
in a meeting. It thus allows to distinguish different phases of a
meeting, such as discussion, presentation, or breaks or to find
specific comments by particular meeting participants.
The
context is recorded using two different sensor systems: a network of
body-worn acceleration sensor and a microphone. The acceleration
network is depicted on the left. It is used to acquire information
about the activity of the user, such as walking, sitting or
standing. This information allows to tell different phases of a
meeting. In the break or during a presentation the user is most
likely to stand, while during the meeting he will probably sit.
These are important cues for finding information in an associative
way.
The
second sensor is a microphone. Apart from using it for doing the
actual recording, two kinds of additional information are computed
on the audio stream. Firstly it can be distinguished, whether the
user was speaking or not. This allows to find stretches in the
meeting, in which he was actively participating vs. only listening
passively. Secondly, a speaker identification algorithm allows to
find statements of particular speakers during a meeting. Also, it
allows to distinguish a presentation (mainly one speaker) from a
discussion (more than one speaker which change often).
These
additional annotations are combined in a common retrieval tool. The
picture on the right shows an example screen-shot (click to view
larger picture). It allows the user to easily find and select the
parts of the recording, that he is looking for. A special algorithm
allows to browse large audio recordings by allowing to trade
precision in time of the speaker identification against error rate:
in a long stretch (eg. 1 hour) it does not matter so much, if the
precision in time is very fine, because the user will want to do a
finer search on a shorter stretch anyway. In the short stretch
however, the time precision must be fine, while it is affordable,
that the error rate is higher.
We have shown how these personalized annotations can be automatically
generated and used for retrieval. We believe that this kind of
personal annotations are a very interesting application for wearable
computing technology and for retrieval applications.
Publications:
-
Wearable Sensing to Annotate
Meeting Recordings Nicky Kern, Bernt Schiele,
Holger Junker, Paul Lukowicz, and Gerhard Tröster. To Appear: In The
6th International Symposium on Wearable Computers, ISWC 2002,
Seattle, Washington, USA, October 2002.
-
Wearable Sensing to Annotate
Meeting Recordings Nicky Kern, Bernt Schiele,
Holger Junker, Paul Lukowicz, and Gerhard Tröster. In
Personal and Ubiquitous Computing: Selected papers from the ISWC2002
Conference, 2003.
Video:
Demonstration of Body-Worn Acceleration Sensor
Signals (AVI, 9.5 MB)
Contact:
Nicky Kern
(kern@inf.ethz.ch),
Bernt Schiele
(Bernt Schiele)
Holger Junker
(junker@ife.ee.ethz.ch),
Paul Lukowicz
(lukowicz@ife.ee.ethz.ch),
Gerhard Tröster
(troester@ife.ee.ethz.ch)
|