What is Visual Dialog?
Visual Dialog is a novel task that requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the agent has to answer the question.
- VisDial dataset:
- >200k images from COCO
- 1 dialog / image
- 10 rounds of question-answers / dialog
- Total >2M dialog question-answers
Later versions of the dataset, code, pretrained models and a Visual Chatbot on CloudCV coming soon!
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
* equal contribution
We thank Harsh Agrawal and Jiasen Lu for help on the AMT data collection interface; Xiao Lin, Ramprasaath Selvaraju and Latha Pemula for model discussions; Marco Baroni, Antoine Bordes, Mike Lewis, and Marc'Aurelio Ranzato for helpful discussions. Finally, we are grateful to the developers of Torch for building an excellent framework.