VisDial Dataset

Code for the real-time chat interface used to collect the VisDial dataset on Amazon Mechanical Turk

v1.0

Training set
Dialogs (354M)
123,287 images x 10 rounds
(v0.9 train + v0.9 val)

Validation set
Dialogs (17M), Images (331M)
2,064 images x 10 rounds
Dense answer annotations ^NEW

Testing set
Dialogs (10M), Images (1.2G)
8,000 images x 1 round

Readme

Images for the training set are from COCO train2014 and val2014, available here.
The VisDial evaluation server is hosted on EvalAI.
Numbers (in papers, etc.) should be reported on v1.0 test-std, and NOT on v0.9.
The Visual Dialog Challenge is conducted on v1.0 test-challenge.
For both test-std and test-challenge, predictions must be submitted on the full test set.
[NEW] Relevance scores from dense answer annotations on v1.0 val can be used to compute NDCG. Read more here.

Format

{
  'data': {
    'questions': [
      'does it have a doorknob',
      'do you see a fence around the bear',
      ...
    ],
    'answers': [
      'no, there is just green field in foreground',
      'countryside house',
      ...
    ],
    'dialogs': [
      {
        'image_id': <image id>,
        'caption': <image caption>,
        'dialog': [
          {
            'question': <index of question in `data.questions` list>,
            'answer': <index of answer in `data.answers` list>,
            'answer_options': <100 candidate answer indices from `data.answers`>,
            'gt_index': <index of `answer` in `answer_options`>
          },
          ... (10 rounds of dialog)
        ]
      },
      ...
    ]
  },
  'split': <VisDial split>,
  'version': '1.0'
}

v0.9 (deprecated)

Training set (235M)
82,783 images

Validation set (108M)
40,504 images

Readme

v0.9 Training is from COCO Training and v0.9 Validation set is from COCO Validation
Numbers (in papers, etc.) should be reported on ~~v0.9 val~~ v1.0 test-std

Format

{
  'data': {
    'questions': [
      'does it have a doorknob',
      'do you see a fence around the bear',
      ...
    ],
    'answers': [
      'no, there is just green field in foreground',
      'countryside house',
      ...
    ],
    'dialogs': [
      {
        'image_id': <COCO image id>,
        'caption': <image caption from COCO>,
        'dialog': [
          {
            'question': <index of question in `data.questions` list>,
            'answer': <index of answer in `data.answers` list>,
            'answer_options': <100 candidate answer indices from `data.answers`>,
            'gt_index': <index of `answer` in `answer_options`>
          },
          ... (10 rounds of dialog)
        ]
      },
      ...
    ]
  },
  'split': <COCO split>,
  'version': '0.9'
}

v0.5 (deprecated)

Training set (1.2G)
50,729 images

Validation set (168M)
7,663 images

Testing set (215M)
9,628 images

Readme

v0.5 Training and Validation sets are from COCO Training and v0.5 Testing set is from COCO Validation

Format

[
  {
    'image_id': <COCO image id>,
    'split': <COCO split>,
    'caption': <image caption from COCO>,
    'dialog': [
      {
        'question': '...',
        'answer': '...',
        'options': <100 candidate answers>,
        'gt_index': <index of `answer` in `options`>
      },
      ... (10 rounds of dialog)
    ]
  },
  ...
]