We present a dynamic summarization algorithm that selects the most salient audio and video subclips in order to produce a coherent and informative summary. Clips are selected based on their attentional capacity through the computed multimodal, audio-visual-text (AVT) saliency.
For more information on Movie Summarization algorithm please see reference Evangelopoulos et al.