TU Berlin

Communication Systems GroupScientific Publications

Page Content

to Navigation

Scientific Publications

Cross-Modal Categorisation of User-Generated Video Sequences
Citation key 1356Schmiedeke2012
Author Schmiedeke and Sebastian and Kelm and Pascal and Sikora and Thomas
Title of Book Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Pages 251–258
Year 2012
DOI 10.1145/2324796.2324828
Address New York, NY, USA
Month jun
Note isbn: 978-1-4503-1329-2 articleno: 25 numpages: 8 location: Hong Kong, China
Editor ACM
Abstract This paper describes the possibilities of cross-modal classification of multimedia documents in social media platforms. Our framework predicts the user-chosen category of consumer-produced video sequences based on their textual and visual features. These text resources–-includes metadata and automatic speech recognition transcripts–-are represented as bags of words and the video content is represented as a bag of clustered local visual features. The contribution of the different modalities is investigated and how they should be combined if sequences lack certain resources. Therefore, several classification methods are evaluated, varying the resources. The paper shows an approach that achieves a mean average precision of 0.3977 using user-contributed metadata in combination with clustered SURF.
Link to publication Download Bibtex entry


Quick Access

Schnellnavigation zur Seite über Nummerneingabe