FACTMS: Enhancing Multimodal Factual Consistency in Multimodal Summarization
Multimodal summarization (MS) generates text summaries from multimedia articles with textual and visual content.Therefore, MS can suffer from the multimodal factual inconsistency problem, where the generated summaries may distort or deviate from both the textual and visual content in the original multimodal input.Existing MS approaches mainly focus