Movie Summarization

Saliency curves, multimodal fusion, and manual versus automatic segment selection for movie summarization (800 frames, scene from the movie "300"). Key frames (top) correspond to the indicated saliency peaks.

We present a dynamic summarization algorithm that selects the most salient audio and video subclips in order to produce a coherent and informative summary. Clips are selected based on their attentional capacity through the computed multimodal, audio-visual-text (AVT) saliency.

For more information on Movie Summarization algorithm please see reference Evangelopoulos et al.

movie summarization