Description
Multimedia automatic learning has drawn attention from companies and
governments for a significant number of applications for automated
recommendations, classification, and human brain understatement. In
recent years, and an increased amount of research has explored using
deep neural networks for multimedia related tasks.
Some government security and surveillance applications are automated
detections of illegal and violent behaviors, child pornography and
traffic infractions. Companies worldwide are looking for content-based
recommendation systems that can personalize clients consumption and
interactions by understanding the human perception of memorability,
interestingness, attractiveness, aesthetics. For these fields like
event detection, multimedia affect and perceptual analysis are turning
towards Artificial Neural Networks. In this talk, I will present the
theory behind multi-modal fusion using deep learning and some open
challenges and their state-of-the-art.
Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks.