What is Visual Dialog?

Visual Dialog is a novel task that requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the agent has to answer the question.

    VisDial dataset:
  • 140k images from COCO
  • 1 dialog / image
  • 10 rounds of question-answers / dialog
  • Total 1.4M dialog question-answers


Email — [email protected]


Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

Abhishek Das*, Satwik Kottur*, José M.F. Moura, Stefan Lee and Dhruv Batra
* equal contribution
ArXiv 2017 [Bibtex] [PDF]

Visual Dialog

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh and Dhruv Batra
CVPR 2017 (Spotlight) [Bibtex] [PDF]


Acknowledgements

We thank Harsh Agrawal and Jiasen Lu for help on the AMT data collection interface; Xiao Lin, Ramprasaath Selvaraju and Latha Pemula for model discussions; Marco Baroni, Antoine Bordes, Mike Lewis, and Marc'Aurelio Ranzato for helpful discussions. Finally, we are grateful to the developers of Torch for building an excellent framework.