TU Berlin

Communication Systems GroupMultimedia Analysis and Processing

Page Content

to Navigation

Multimedia Analysis and Processing

Lupe

The Communication Systems Group of the TU Berlin, led by Prof. Dr. Thomas Sikora, has a strong track record in multimedia analysis and processing with more than 50 publications in this field, and was involved in many national and international funded projects related to multimedia analysis and processing. The following list gives an overview of research areas we are involved in:

  • Video surveillance, people safety and privacy protection
  • Multi-object tracking
  • Optical flow estimation
  • Crowd analysis and people counting
  • Lost luggage detection
  • Violent behaviour detection
  • Genre and commercial detection in TV broadcast
  • Geo-tagging
  • Face detection and recognition
  • Image and video segmentation
  • MPEG-7
  • Analysis and classification of speech-, audio- and video data
  • etc.

Research Activities

IOU Tracker

Bild

Tracking-by-detection is a common approach to multi-object tracking. With ever increasing performances of object detectors, the basis for a tracker becomes much more reliable. In combination with commonly higher frame rates, this poses a shift in the challenges for a successful tracker. We propose a very simple tracking algorithm which can compete with more sophisticated approaches at a fraction of the computational cost. With thorough experiments we show its potential using a wide range of object detectors. The proposed method can easily run at thousands of frames per second (fps) while outperforming the state-of-the-art on the DETRAC vehicle tracking dataset and achieves competitive results on the MOT17 benchmark. more to: IOU Tracker

Robust Local Optical Flow Estimation

Bild

The Robust Local Optical Flow (RLOF) is a sparse optical flow and feature tracking method. The main objective is to provide a fast and accurate motion estimation solution. The main advantage of the RLOF approach is the adjustable runtime and computational complexity which is in contrast to most common optical flow methods linearly dependend on the number of motion vectors (features) to be estimated. Thus the RLOF is a local optical flow method and most related to the PLK method ( better known as KLT Tracker ) and thus the famous Lucas Kanade method. The sparse-to-dense interpolation scheme allows for fast computation of dense optical flow fields. more to: Robust Local Optical Flow Estimation

Lagrangian-based Video Analytics

Bild

We aim for innovative ways to process and use dynamic patterns in video motion to quantify salient motion features and thus improve computer vision performance for tasks such as identification, segmentation, and classification. The proposed methodology provides a powerful set of data-driven descriptors for continuous and integral motion analysis on variable temporal scales (i.e., for short-term as well as long-term motion features). more to: Lagrangian-based Video Analytics

Probability Hypothesis Density (PHD) Multi-Object-Tracking Filter

Bild

The Probability Hypothesis Density (PHD) filter is a multi-object Bayes filter which has recently attracted a lot of interest in the tracking community mainly for its linear complexity and its ability to deal with high clutter especially in radar/sonar scenarios. In the computer vision community however, underlying constraints are different from radar scenarios and have to be taken into account when using the PHD filter. more to: Probability Hypothesis Density (PHD) Multi-Object-Tracking Filter

Hyper-Parameter Optimization for Convolutional Neural Networks Committees based on Evolutionary Algorithms

Bild

We propose an evolutionary algorithm-based framework to automatically optimize the CNN structure by means of hyper-parameters. Further, we extend our framework towards a joint optimization of a committee of CNNs to leverage specialization and cooperation among the individual networks. Experimental results show a significant improvement over the state-of-the-art on the well-established MNIST dataset for hand-written digits recognition. more to: Hyper-Parameter Optimization for Convolutional Neural Networks Committees based on Evolutionary Algorithms

Background Substraction / Foreground Detection / Static-Object Detection

Bild

Gaussian mixture models have been extensively used and enhanced in the surveillance domain because of their ability to adaptively describe multimodal distributions in real-time with low memory requirements. Nevertheless, they still often suffer from the problem of converging to poor solutions if the main mode stretches and thus over-dominates weaker distributions. We propose complementary background models for background modelling and to detect static and moving objects in crowded video sequences. more to: Background Substraction / Foreground Detection / Static-Object Detection

People Carrying Object Detection and Classification

Bild

Detecting people carrying objects detection and classification is a problem known from surveillance scenarios. It can be used as a first step in order to monitor interactions between people and objects, like depositing or removing an object. Research is focused on new machine learning approaches for pedestrian detection and new ways of feature representation, behavior analysis and machine learning techniques for classification. more to: People Carrying Object Detection and Classification

Multimodal Geo-Tagging

Bild

We present a hierarchical, multi-modal approach for placing Flickr videos on the map. Our approach makes use of external resources to identify toponyms in the metadata and of visual and textual features to identify similar content. First, the geographical boundaries extraction method identi es the country and its dimension. We use a database of more than 3.6 million Flickr images to group them together into geographical regions and to build a hierarchical model. A fusion of visual and textual methods is used to classify the videos location into possible regions. Next, the visually nearest neighbour method uses a nearest neighbour approach to nd correspondences with the training images within the preclassified regions. The video sequences are represented using low-level feature vectors from multiple key frames. The Flickr videos are tagged with the geo-information of the visually most similar training item within the regions that is previously ltered by the pre-classi cation step for each test video. The results show that we are able to tag one third of our videos correctly within an error of 1 km. more to: Multimodal Geo-Tagging

Consistent Two-Level Metric

Bild

Since the commonly used benchmarks for abandoned object detection (AOD) only have few abandoned objects and a non-standardized evaluation procedure, an objective performance comparison between different methods is hard. Therefore, we propose a new evaluation metric which is focused on an end-user application case and an evaluation protocol which eliminates uncertainties in previous performance assessments. more to: Consistent Two-Level Metric

Motion-based Object Segmentation

Bild

We present an unsupervised motion-based object segmentation algorithm for video sequences with moving camera, employing bidirectional inter-frame change detection. For every frame, two error frames are generated using motion compensation. They are combined and a segmentation algorithm based on thresholding is applied. We employ a simple and effective error fusion scheme and consider spatial error localization in the thresholding step. We find the optimal weights for the weighted mean thresholding algorithm that enables unsupervised robust moving object segmentation. more to: Motion-based Object Segmentation

Short-term Motion-based Object Segmentation

Bild

Motion-based segmentation approaches employ either long-term motion information or suffer from lack of accuracy and robustness. We present an automatic motion-based object segmentation algorithm for video sequences with moving camera, employing short-term motion information solely. For every frame, two error frames are generated using motion compensation. They are combined and a thresholding segmentation algorithm is applied. Recent advances in the field of global motion estimation enable outlier elimination in the background area, and thus a more precise definition of the foreground is achieved. We propose a simple and effective error frame generation and we consider spatial error localization. Thus, we achieve improved performance compared with a previously proposed short-term motion-based method and we provide subjective as well as objective evaluation. more to: Short-term Motion-based Object Segmentation

Datasets

TUB CrowdFlow Dataset

Bild

Optical Flow Dataset and Benchmark for Visual Crowd Analysis. A new optical flow dataset exploiting the possibilities of a recent video engine to generate sequences with groundtruth optical flow for large crowds in different scenarios. We break with the development of the last decade of introducing ever increasing displacements to pose new difficulties. Instead we focus on real-world surveillance scenarios where numerous small, partly independent, non rigidly moving objects observed over a long temporal range pose a challenge. more to: TUB CrowdFlow Dataset

Multi-Object and Multi-Camera Tracking Dataset (MOCAT)

Bild

The TU Berlin Multi-Object and Multi-Camera Tracking Dataset (MOCAT) is a synthetic dataset to train and test tracking and detection systems in a virtual world. One of the key advantages of this dataset is that there is a complete and accurate ground truth, including pixel accurate object masks, available. All sequences are rendered 3 times, each with different illumination settings. This allows to directly measure the influence of the illumination to the algorithm under test. There are 8 to 10 different camera views (including camera calibration information) with partly overlapping FOVs for each sequence available. The ground truth contains the world position for each object, so the multi-camera tracking performance can be evaluated as well. All sequences contain vehicles, animals and pedestrians as objects to detect and track. more to: Multi-Object and Multi-Camera Tracking Dataset (MOCAT)

Software

IOU Tracker @ GITHUB

Bild

Tracking-by-detection is a common approach to multi-object tracking. With ever increasing performances of object detectors, the basis for a tracker becomes much more reliable. In combination with commonly higher frame rates, this poses a shift in the challenges for a successful tracker. We propose a very simple tracking algorithm which can compete with more sophisticated approaches at a fraction of the computational cost. This GIT provides the Python implementation of the IOU Tracker. more to: IOU Tracker @ GITHUB

Robust Local Optical Flow Library (RLOF) @ OpenCV contrib (4.1)

Bild

The Robust Local Optical Flow (RLOF) is a sparse optical flow and feature tracking method. We are deligthed that it is now part of OpenCV Contribution library (4.1.0). The RLOF methods are motivated by the problem of local motion estimation via robust regression with linear models. The main objective is to provide real-time capability, accurate and scaleable motion estimation solution. The software implements several versions of the RLOF algorithms for sparse and dense optical flow estimation. more to: Robust Local Optical Flow Library (RLOF) @ OpenCV contrib (4.1)

Background Substraction Library (SGMM-SOD)

Bild

We provide binaries of the SGMM-SOD library in order to help other researchers to compare their results or to use our work as a module for their research. The files contain a binary package for the Windows operating system and a minimal example on how to use the library. We have tried to keep the interface as simple as possible more to: Background Substraction Library (SGMM-SOD)

Evaluation framework for abandoned object detection

Bild

Since the commonly used benchmarks for abandoned object detection (AOD) only have few abandoned objects and a non-standardized evaluation procedure, an objective performance comparison between different methods is hard. Therefore, we propose a new evaluation metric which is focused on an end-user application case and an evaluation protocol which eliminates uncertainties in previous performance assessments. more to: Evaluation framework for abandoned object detection

Awards

We were awarded the Lumiére Award 2018 for the Best Paper

Bild

We are delighted to announce that we were awarded the Lumiere Award for the Best Paper at the International Conference on 3D Immersion/Stereopsia in Brussels in Dec. 2018! On behalf of the Advanced Imaging Society based in Hollywood, CA, Stereopsia organizes the competitions for the “Lumiere Awards” for the territory consisting of Europe, the Middle East, and Africa (EMEA). more to: We were awarded the Lumiére Award 2018 for the Best Paper

Runner Up Best Full Paper Award @ ICIDS 2018

Bild

We are delighted to announce that our paper "Director’s Cut - Analysis of Aspects of Interactive Storytelling for VR Films" won the Runner Up Best Full Paper Award at the International Conference on Interactive Digital Storytelling, 05.12.2018 - 08.12.2018. more to: Runner Up Best Full Paper Award @ ICIDS 2018

Challenge Winner IWOT4S @ AVSS 2018

Bild

We are delighted to announce that our IOU tracker won in a row the IWOT4S Challenge for multi-object tracking at the International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS 2018, Auchkland, New Zealand, 27.11.2018 more to: Challenge Winner IWOT4S @ AVSS 2018

We won the VisDrone 2018 Challenge @ ECCV!

Bild

We are delighted to announce that our V-IOU tracker won the VisDrone 2018 Challenge for multi-object tracking at the the ECCV 2018 workshop "Vision Meets Drone: A Challenge" (or VisDrone2018, for short) on September 8, 2018, in Munich, Germany. more to: We won the VisDrone 2018 Challenge @ ECCV!

Challenge Winner IWOT4S @ AVSS 2017

Bild

We are delighted to announce that our IOU tracker won the IWOT4S Challenge for multi-object tracking at the International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS 2017, Lecce, Italy, 29.08.2017 more to: Challenge Winner IWOT4S @ AVSS 2017

Best Student Paper Award @ IEEE ICME 2017

Bild

We are delighted to announce that our paper "Steered mixture-of-experts for light field coding, depth estimation, and processing" won the Best Student Paper Award at the IEEE International Conference on Multimedia and Expo, 10.07.2017 - 14.07.2017. Congratulations to Ruben Verhack and the co-authors. more to: Best Student Paper Award @ IEEE ICME 2017

Prof. Thomas Sikora of Technical University Berlin receives prestigious 2016 Google Faculty Research Award in Machine Perception

Bild

Congrats to Prof. Sikora and his Communication Systems Lab members at TU Berlin, Lieven Lange, Rolf Jongebloed, and the colleagues from Uni Ghent/iMinds Lab, Ruben Verhack (joint PhD between TUB & iMind Lab Uni Ghent), Prof. Peter Lambert and Dr. Glenn van Walllendael. The award was given for his work on Video Compression with Steered-Mixture-of-Experts Networks. more to: Prof. Thomas Sikora of Technical University Berlin receives prestigious 2016 Google Faculty Research Award in Machine Perception

Best Paper Award @ IET ICDP 2015

Bild

We are delighted to announce that our paper "A Local Feature based on Lagrangian Measures for Violent Video Classification" won the Best Paper Award at the IET International Conference on Imaging for Crime Detection and Prevention, 15.07.2015 - 17.07.2015. Congratulations to Tobias Senst and the co-authors. more to: Best Paper Award @ IET ICDP 2015

Highly Recommended Paper Award @ PCS 2015

Bild

We are delighted to announce that our paper "Lossless Image Compression based on Kernel Least Mean Squares" won the Highly Recommended Paper Award at the IEEE Picture Coding Symposium, 31.05.2015 - 03.06.2015. Congratulations to Ruben Verhack and the co-authors. more to: Highly Recommended Paper Award @ PCS 2015

Top 10% Paper Award @ ICIP 2014

Bild

We are delighted to announce that our paper "LOSSY IMAGE CODING IN THE PIXEL DOMAIN USING A SPARSE STEERING KERNEL SYNTHESIS APPROACH" won the Top 10% Paper Award at the IEEE International Conference on Image Processing, 27.10.2014 - 30.10.2014. Congratulations to Ruben Verhack and the co-authors. more to: Top 10% Paper Award @ ICIP 2014

Scott Helt Memorial Award 2012

Bild

We are delighted to announce that Dr. Sebastian Knorr won the Scott Helt Memorial Award 2012 for the best journal paper published in the IEEE Transactions on Broadcasting in 2011. The title of the paper is "3D-TV Content Creation: Automatic 2D-to-3D Video Conversion". Congratulations to Dr. Knorr and the co-authors. more to: Scott Helt Memorial Award 2012

Related Publications

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2007

  • Ronald Glasberg
    Next Generation Search Engine Idea: How will you find the Content of tomorrow? (lecture)
    IFA Consumer Electronics, Berlin, 31.08.2007
    Details BibTeX
  • Ronald Glasberg, Pascal Kelm, Hao Qin, Thomas Sikora,
    Extensible Platform for Multimedia Analysis (XPMA)
    2007 IEEE International Conference on Multimedia and Expo, volume 2007, BEIJING, CHINA, 02.07.2007 - 05.07.2007, pp. 5 - 5
    Demo
    Details BibTeX

2006

Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe