Back to Top

The project

WP 1: Features, Saliency, Fusion, Spatiotemporal Multisensory Integration for Perceptual Events

The basic problems here are to extract robust multicue features from the audio, visual and text streams, compute their unimodal instantaneous saliencies, fuse them to estimate a multimodal saliency, combine these with spatiotemporal synchrony issues, and group instantaneous salient keyframes into perceptual events. Wp1 includes the following tasks:

  • Task 1.1: Audio-Visual Signal Processing and Robust Feature/Event Detection Algorithms
  • Task 1.2: Audio-Visual-Text Monomodal Saliency Computation based on multiple cues
  • Task 1.3: Multimodal Saliency from Fusing Monomodal Saliencies
  • Task 1.4: Multisensory Time Perception and Spatio-Temporal Coherence
  • Task 1.5: Grouping and Discovery of Perceptual Events

WP 2: Information Extraction from Language-Text and Cross-media Semantics

In this WP, we extract selected snippets of semantic information from the audio, visual and text modalities, such as, concepts and actions. We then proceed to identify the correspondence between the actions in the visual and audio/text streams, i.e., perform cross-modal labeling, using the COSMOROE framework and statistical modeling. Finally we integrate these actions into events in time and compute their relative significance (semantic saliency) using event segmentation and machine learning algorithms.

  • Task 2.1: Semantic-Spotting in the Audio-Visual Streams (Action recognition from video and extraction of semantic relationships)
  • Task 2.2: Cross-modal Semantic Labelling
  • Task 2.3: Semantic Integration over Time

WP 3: Integrated Sensory-Semantic Events and Control-theoretic Modeling of Attentional Processes

The sensory-semantic integration requires fusion of two different continuous modalities (audio and vision) with discrete language symbols and semantics extracted from text. The audiovisual fusion will give us perceptual micro-events (time scale of a few keyframes), and our grand goal now is to group them with the discrete linguistic and semantic saliency into stable meso-events (time scale of video shots) via some conscious attention process over a longer time window. This could be viewed as a framework of heterogeneous hierarchical control. The research in this workpackage is divided into the followings tasks.

  • Task 3.1: Modeling of Integration of Sensory-Semantic Events via Lattice State-Space Representations
  • Task 3.2: Study of Control-theoretic properties of the Lattice Dynamical Model
  • Task 3.3: Feedback and Attention Control

WP 4: Showcase Applications: Video Summarization and Attention Tracking in Movies and TV Documentaries:

WP4 will provide the test-beds and showcases to test the computational, perception and cognitive ideas explored within the objectives of WP 1, 2 and 3 integrating the multiple levels of information processing. Selected showcases share the need for a cross-integration of multiple modalities and Anthropocentric applications. Today the vast majority of multimedia content does not come with rating and semantic annotation. Despite standardization efforts, e.g., MPEG7, it is predicted that in the near future users will both consume large amounts of multimedia content and produce/gather huge volumes of video through their digital cameras, but do not have the time to structure and label the data. A technical challenge and exciting application of multimodal analysis is automatic summarization of video content. Summaries provide the user with a short version of the video that ideally contains the most important information for understanding the content. At the applications level, COGNIMUSE targets the development of integrated computational-cognitive saliency and adaptive attention models for exploiting structured audio-visual-text contents in two different video domains: (i) movies and (ii) TV documentaries or news.

  • Task 4.1: Annotation of Saliency and Semantic Events Database for Multimodal Videos (SEDM)
  • Task 4.2: Multiple Levels Saliency (Perceptual + Cognitive Levels of Saliency)
  • Task 4.3: Video Summarization
  • Task 4.4: Evaluation

WP 5: Project Management and Dissemination

  • Task 5.1: Coordination - Monitoring of physical and financial objectives
  • Task 5.2: Dissemination of Results