v1.0

Training set
Dialogs (354M)
1,23,287 images x 10 rounds
(v0.9 train + v0.9 val)
Validation set
Dialogs (17M), Images (331M)
2,064 images x 10 rounds
Testing set
Dialogs (10M), Images (1.2G)
8,000 images x 1 round

Readme

  • The VisDial evaluation server is hosted on EvalAI
  • Numbers (in papers, etc.) should be reported on v1.0 test-std, and NOT on v0.9
  • The Visual Dialog Challenge 2018 is being conducted on v1.0 test-challenge
  • For both test-std and test-challenge, predictions must be submitted on the full test set

Format

{
  'data': {
    'questions': [
      'does it have a doorknob',
      'do you see a fence around the bear',
      ...
    ],
    'answers': [
      'no, there is just green field in foreground',
      'countryside house',
      ...
    ],
    'dialogs': [
      {
        'image_id': <image id>,
        'caption': <image caption>,
        'dialog': [
          {
            'question': <index of question in `data.questions` list>,
            'answer': <index of answer in `data.answers` list>,
            'answer_options': <100 candidate answer indices from `data.answers`>,
            'gt_index': <index of `answer` in `answer_options`>
          },
          ... (10 rounds of dialog)
        ]
      },
      ...
    ]
  },
  'split': <VisDial split>,
  'version': '1.0'
}

v0.9 (deprecated)

Training set (235M)
82,783 images
Validation set (108M)
40,504 images

Readme

  • v0.9 Training is from COCO Training and v0.9 Validation set is from COCO Validation
  • Numbers (in papers, etc.) should be reported on v0.9 val

Format

{
  'data': {
    'questions': [
      'does it have a doorknob',
      'do you see a fence around the bear',
      ...
    ],
    'answers': [
      'no, there is just green field in foreground',
      'countryside house',
      ...
    ],
    'dialogs': [
      {
        'image_id': <COCO image id>,
        'caption': <image caption from COCO>,
        'dialog': [
          {
            'question': <index of question in `data.questions` list>,
            'answer': <index of answer in `data.answers` list>,
            'answer_options': <100 candidate answer indices from `data.answers`>,
            'gt_index': <index of `answer` in `answer_options`>
          },
          ... (10 rounds of dialog)
        ]
      },
      ...
    ]
  },
  'split': <COCO split>,
  'version': '0.9'
}

v0.5 (deprecated)

Training set (1.2G)
50,729 images
Validation set (168M)
7,663 images
Testing set (215M)
9,628 images

Readme

  • v0.5 Training and Validation sets are from COCO Training and v0.5 Testing set is from COCO Validation

Format

[
  {
    'image_id': <COCO image id>,
    'split': <COCO split>,
    'caption': <image caption from COCO>,
    'dialog': [
      {
        'question': '...',
        'answer': '...',
        'options': <100 candidate answers>,
        'gt_index': <index of `answer` in `options`>
      },
      ... (10 rounds of dialog)
    ]
  },
  ...
]