VisDial v0.9

Training set (235M)
82,783 images
Validation set (108M)
40,504 images

Readme

  • v0.9 Training is from COCO Training and v0.9 Validation set is from COCO Validation
  • Numbers (in papers, etc.) should be reported on v0.9 val

Format

[
  {
    'data': {
      'questions': [
        'does it have a doorknob',
        'do you see a fence around the bear',
        ...
      ],
      'answers': [
        'no, there is just green field in foreground',
        'countryside house',
        ...
      ],
      'dialogs': [
        {
          'image_id': <COCO image id>,
          'caption': <image caption from COCO>,
          'dialog': [
            {
              'question': <index of question in `data.questions` list>,
              'answer': <index of answer in `data.answers` list>,
              'answer_options': <100 candidate answer indices from `data.answers`>,
              'gt_index': <index of `answer` in `answer_options`>
            },
            ... (10 rounds of dialog)
          ]
        },
        ...
      ]
    },
    'split': <COCO split>,
    'version': '0.9'
  }
]

VisDial v0.5

Training set (1.2G)
50,729 images
Validation set (168M)
7,663 images
Testing set (215M)
9,628 images

Readme

  • v0.5 Training and Validation sets are from COCO Training and v0.5 Testing set is from COCO Validation

Format

[
  {
    'image_id': <COCO image id>,
    'split': <COCO split>,
    'caption': <image caption from COCO>,
    'dialog': [
      {
        'question': '...',
        'answer': '...',
        'options': <100 candidate answers>,
        'gt_index': <index of `answer` in `options`>
      },
      ... (10 rounds of dialog)
    ]
  },
  ...
]