Inhalt des Dokuments
MPEG-7-based Audio Annotation for the Archival of Digital Video
| Project Manager | Prof. Dr.-Ing. Thomas Sikora |
|---|---|
| Founded by | BMWA (German Federal Ministry of Economics and Labour) |
| Project Period | 11/2002 - 03/2005 |
Abstract
MPEG-7 is a standardisation initiative of the Motion Pictures Expert Group (MPEG) that, instead of focusing on coding like MPEG-1, MPEG-2 and MPEG-4, is meant to be an standardization of the way to describe multimedia content (see also: MPEG-7 Link list).
This project is actually part of a larger one, called MPEG-7-based Archival of Digital Video. Its objective is the achievement of a complete audio-visual database management platform, allowing to segment, index and retrieve audio-visual data, based on MPEG-7 "descriptors" and tools.
2 other partners are involved:
- Heinrich-Hertz-Institut (HHI), which addresses the analysis of visual information
(MPEG-7-based Analyse and Visualisation Modules for the Archival of Digital Video). - Canto Software, which addresses the general structure of the archival system
(MPEG-7-based Metadata Indexing Methods for the Archival of Digital Video)
Our part of the project concerns the segmentation, indexing and retrieval of audio information.
We focus on 3 main tasks:
- Audio Segmentation
Audio recordings are segmented and classified into coarse sound classes (voice, music, environmental sounds and silence) based on MPEG-7 Low Level Descriptors (LLDs). - Sound Recognition and Classification
The MPEG-7 sound recognition tools provide a unified interface for searching the media by automatically indexing of audio using trained sound classes in a pattern recognition framework. We develop sound recognition systems that use (1) reduced-dimension features based on Independent Component Analysis (ICA) and (2) Hidden Markov Model (HMM) classifiers. - Spoken Content Indexing and Retrieval
The MPEG-7 Spoken Content Description Tools allow detailed description of words and/or phones spoken within an audio stream. The Spoken Content Descriptor is a compact representation of the output of an Automatic Speech Recognition (ASR) system.
| Name |
|---|
| Prof. Dr.-Ing. Thomas Sikora |
| Dr.-Ing. Hyoung-Gook Kim |
| Dr. Nicolas Moreau |
| Dipl.-Ing. Samour Amjad |
| Name |
|---|
| Edgar Berdahl |
| Juan José Burred |
| Andreas Cobet |
| Yuanfeng Cui |
| Daniel Ertelt |
| Martin Haller |
| Shan Jin |
| Steffen Roeber |
| Eric Yimnga |
