direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

MPEG-7-based Audio Annotation for the Archival of Digital Video

Project Data
Project Manager
Prof. Dr.-Ing. Thomas Sikora
Funded by
BMWA (German Federal Ministry of Economics and Labour)
Project Period
11/2002 - 03/2005


MPEG-7 is a standardisation initiative of the Motion Pictures Expert Group (MPEG) that, instead of focusing on coding like MPEG-1, MPEG-2 and MPEG-4, is meant to be an standardization of the way to describe multimedia content (see also: MPEG-7 Link list).
This project is actually part of a larger one, called MPEG-7-based Archival of Digital Video. Its objective is the achievement of a complete audio-visual database management platform, allowing to segment, index and retrieve audio-visual data, based on MPEG-7 "descriptors" and tools.
2 other partners are involved:

  • Heinrich-Hertz-Institut (HHI), which addresses the analysis of visual information
    (MPEG-7-based Analyse and Visualisation Modules for the Archival of Digital Video).
  • Canto Software, which addresses the general structure of the archival system
    (MPEG-7-based Metadata Indexing Methods for the Archival of Digital Video)

Our part of the project concerns the segmentation, indexing and retrieval of audio information.
We focus on 3 main tasks:

  • Audio Segmentation
    Audio recordings are segmented and classified into coarse sound classes (voice, music, environmental sounds and silence) based on MPEG-7 Low Level Descriptors (LLDs).
  • Sound Recognition and Classification
    The MPEG-7 sound recognition tools provide a unified interface for searching the media by automatically indexing of audio using trained sound classes in a pattern recognition framework. We develop sound recognition systems that use (1) reduced-dimension features based on Independent Component Analysis (ICA) and (2) Hidden Markov Model (HMM) classifiers.
  • Spoken Content Indexing and Retrieval
    The MPEG-7 Spoken Content Description Tools allow detailed description of words and/or phones spoken within an audio stream. The Spoken Content Descriptor is a compact representation of the output of an Automatic Speech Recognition (ASR) system.

Research Fields


  • Low Level Descriptors
  • Sound Recognition
  • Spoken Content
  • Speech Processing
Prof. Dr.-Ing. Thomas Sikora

Dr.-Ing. Hyoung-Gook Kim

Dr. Nicolas Moreau

Dipl.-Ing. Samour Amjad

Dipl.-Ing. Juan José Burred

Martin Haller

Steffen Roeber

Shan Jin

Daniel Ertelt

Eric Yimnga

Andreas Cobet

Yuanfeng Cui

Former Students
Edgar Berdahl

Zusatzinformationen / Extras