Visual Dialog

Problem Statement

Visual Dialog: to teach machines to have natural language conversations with humans about Image.

Dataset

Dialog Dataset-:MS-COCO and QA Dataset -: Visual Dilog

Reference Paper for Visual Dialog

Das, Abhishek, Satwik Kottur, José MF Moura, Stefan Lee, and Dhruv Batra., ''Learning cooperative visual dialog agents with deep reinforcement learning." arXiv preprint arXiv:1703.06585 (2017).
Das, Abhishek, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José MF Moura, Devi Parikh, and Dhruv Batra. "Visual dialog." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2. 2017.
Jain, Unnat, Svetlana Lazebnik, and Alexander Schwing. "Two can play this Game: Visual Dialog with Discriminative Question Generation and Answering." arXiv preprint arXiv:1803.11186 (2018).
Strub, Florian, Harm De Vries, Jeremie Mary, Bilal Piot, Aaron Courville, and Olivier Pietquin. "End-to-end optimization of goal-driven and visually grounded dialogue systems." arXiv preprint arXiv:1703.05423 (2017).
Lu, Jiasen, Anitha Kannan, Jianwei Yang, Devi Parikh, and Dhruv Batra. "Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model." In Advances in Neural Information Processing Systems, pp. 313-323. 2017.
Kodra, Lorena, and Elinda Kajo Meçe. "Multimodal Attention Agents in Visual Conversation." In International Conference on Emerging Internetworking, Data & Web Technologies, pp. 584-596. Springer, Cham, 2018.
Chattopadhyay, Prithvijit, Deshraj Yadav, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, and Devi Parikh. "Evaluating visual conversational agents via cooperative human-ai games." arXiv preprint arXiv:1708.05122 (2017).
Seo, Paul Hongsuck, Andreas Lehrmann, Bohyung Han, and Leonid Sigal. "Visual reference resolution using attention memory for visual dialog." In Advances in neural information processing systems, pp. 3722-3732. 2017.
Massiceti, Daniela, N. Siddharth, Puneet Kumar Dokania, and Philip HS Torr. "FlipDial: A generative model for two-way visual dialogue." arXiv preprint arXiv:1802.03803 (2018).
Zhang, Jiaping, Tiancheng Zhao, and Zhou Yu. "Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog." arXiv preprint arXiv:1805.03257 (2018).
Zhang, Haichao, Haonan Yu, and Wei Xu. "Listen, Interact and Talk: Learning to Speak via Interaction." arXiv preprint arXiv:1705.09906 (2017).
Zhang, Junjie, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, and Anton van den Hengel. "Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards." arXiv preprint arXiv:1711.07614 (2017).
Huber, Bernd, Daniel McDuff, Chris Brockett, Michel Galley, and Bill Dolan. "Emotional Dialogue Generation using Image-Grounded Language Models." In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, p. 277. ACM, 2018.
Shekhar, Ravi, Tim Baumgartner, Aashish Venkatesh, Elia Bruni, Raffaella Bernardi, and Raquel Fernandez. "Ask No More: Deciding when to guess in referential visual dialogue." arXiv preprint arXiv:1805.06960 (2018).