Inhalt des Dokuments
MPEG-7-based Audio Annotation for the Archival of Digital Video
|Project Manager ||Prof. Dr.-Ing. Thomas Sikora|
|Founded by ||BMWA (German Federal Ministry of Economics and Labour)|
|Project Period ||11/2002 - 03/2005|
MPEG-7 is a standardisation initiative of the Motion Pictures Expert Group (MPEG) that, instead of focusing on coding like MPEG-1, MPEG-2 and MPEG-4, is meant to be an standardization of the way to describe multimedia content (see also: MPEG-7 Link list).
This project is actually part of a larger one, called MPEG-7-based Archival of Digital Video. Its objective is the achievement of a complete audio-visual database management platform, allowing to segment, index and retrieve audio-visual data, based on MPEG-7 "descriptors" and tools.
2 other partners are involved:
- Heinrich-Hertz-Institut (HHI), which addresses the analysis of visual information
(MPEG-7-based Analyse and Visualisation Modules for the Archival of Digital Video).
- Canto Software, which addresses the general structure of the archival system
(MPEG-7-based Metadata Indexing Methods for the Archival of Digital Video)
Our part of the project concerns the segmentation, indexing and retrieval of audio information.
We focus on 3 main tasks:
- Audio Segmentation
Audio recordings are segmented and classified into coarse sound classes (voice, music, environmental sounds and silence) based on MPEG-7 Low Level Descriptors (LLDs).
- Sound Recognition and Classification
The MPEG-7 sound recognition tools provide a unified interface for searching the media by automatically indexing of audio using trained sound classes in a pattern recognition framework. We develop sound recognition systems that use (1) reduced-dimension features based on Independent Component Analysis (ICA) and (2) Hidden Markov Model (HMM) classifiers.
- Spoken Content Indexing and Retrieval
The MPEG-7 Spoken Content Description Tools allow detailed description of words and/or phones spoken within an audio stream. The Spoken Content Descriptor is a compact representation of the output of an Automatic Speech Recognition (ASR) system.