COGNIMUSE Database

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

The "COGNIMUSE database" is a new multimodal video dataset annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. It can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. It contains annotations of 7 Hollywood movies clips (ca. 30 min./each), a full movie (ca. 100 min) and 5 travel documentaries (ca. 20 min./each).

The database was developed by the authors of the paper "COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization", A. Zlatintsi, P. Koutras, G. Evangelopoulos, N. Malandrakis, N. Eftymiou, K. Pastra, A. Potamianos and P. Maragos.

Abstract: Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods.

If you want to download the full content of the COGNIMUSE database please follow the next link:

Download COGNIMUSE Database (.rar ca. 16.5 MB) including Saliency annotations, Expert Summaries for evaluation, Audio-Visual Events annotations, Emotion annotations, Cosmoroe annotations as well as Subtitles/Transcripts and other related information, as described in the article. All included updates that are done after the first upload, can be found in the README file in the main folder.

For examples of automatically created summaries using the computational system described in the COGNIMUSE article please follow the next link:

Examples of video summaries

Acknowledgements

We acknowledge the contributions of all Computer Vision, Speech Communication and Signal Processing Group (CVSP) collaborators that participated in the annotation procedure and the students of NTUA for participating in the subjective evaluation and for their valuable comments regarding the produced summaries.

Additionally, our very special thanks to Elias Iosif for his contribution on affective text analysis and Tolis Apostolidis for providing the expert movie summaries.

This research work was supported by the project “COGNIMUSE” which was implemented under the “ARISTEIA” Action of the Operational Program Education and Lifelong Learning and was co-funded by the European Social Fund and Greek National Resources.

COGNIMUSE was a research project, where the multisensory and sensory-semantic information modeling was investigated, integrating all three modalities to detect salient events.

Reference to:
If you use the database please refer to:

A. Zlatintsi, P. Koutras, G. Evangelopoulos, N. Malandrakis, N. Efthymiou, K. Pastra, A. Potamianos and P. Maragos, COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization, EURASIP Journal on Image and Video Proc. (2017) 2017:54, DOI 10.1186/s13640-017-0194-1.

Communication:

For more information or details regarding the database contact:

Nancy Zlatinsi, PhD

Postdoctoral Researcher
Computer Vision, Speech Communication & Signal Processing Group (CVSP)
Intelligent Robotics & Automation Lab (IRAL)
National Technical University of Athens
URL: http://cvsp.cs.ntua.gr/nancy
e-mail: nzlat@cs.ntua.gr / nancy.zlatintsi@gmail.com