What is Visual Dialog?
Visual Dialog is a novel task that requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the agent has to answer the question.
- VisDial dataset:
- 140k images from COCO
- 1 dialog / image
- 10 rounds of question-answers / dialog
- Total 1.4M dialog question-answers
Mar 2017 — VisDial v0.9 dataset and code for real-time chat interface used to collect data on AMT are now available!
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
* equal contribution
We thank Harsh Agrawal and Jiasen Lu for help on the AMT data collection interface; Xiao Lin, Ramprasaath Selvaraju and Latha Pemula for model discussions; Marco Baroni, Antoine Bordes, Mike Lewis, and Marc'Aurelio Ranzato for helpful discussions. Finally, we are grateful to the developers of Torch for building an excellent framework.