v1.0
Readme
- Images for the training set are from COCO train2014 and val2014, available here.
 - The VisDial evaluation server is hosted on EvalAI.
 - Numbers (in papers, etc.) should be reported on v1.0 test-std, and NOT on v0.9.
 - The Visual Dialog Challenge is conducted on v1.0 test-challenge.
 - For both test-std and test-challenge, predictions must be submitted on the full test set.
 - [NEW] Relevance scores from dense answer annotations on v1.0 val can be used to compute NDCG. Read more here.
 
Format
{
  'data': {
    'questions': [
      'does it have a doorknob',
      'do you see a fence around the bear',
      ...
    ],
    'answers': [
      'no, there is just green field in foreground',
      'countryside house',
      ...
    ],
    'dialogs': [
      {
        'image_id': <image id>,
        'caption': <image caption>,
        'dialog': [
          {
            'question': <index of question in `data.questions` list>,
            'answer': <index of answer in `data.answers` list>,
            'answer_options': <100 candidate answer indices from `data.answers`>,
            'gt_index': <index of `answer` in `answer_options`>
          },
          ... (10 rounds of dialog)
        ]
      },
      ...
    ]
  },
  'split': <VisDial split>,
  'version': '1.0'
}
  Readme
- v0.9 Training is from COCO Training and v0.9 Validation set is from COCO Validation
 - Numbers (in papers, etc.) should be reported on 
v0.9 valv1.0 test-std 
Format
{
  'data': {
    'questions': [
      'does it have a doorknob',
      'do you see a fence around the bear',
      ...
    ],
    'answers': [
      'no, there is just green field in foreground',
      'countryside house',
      ...
    ],
    'dialogs': [
      {
        'image_id': <COCO image id>,
        'caption': <image caption from COCO>,
        'dialog': [
          {
            'question': <index of question in `data.questions` list>,
            'answer': <index of answer in `data.answers` list>,
            'answer_options': <100 candidate answer indices from `data.answers`>,
            'gt_index': <index of `answer` in `answer_options`>
          },
          ... (10 rounds of dialog)
        ]
      },
      ...
    ]
  },
  'split': <COCO split>,
  'version': '0.9'
}
  v0.5 (deprecated)
    Training set (1.2G)
50,729 images
  50,729 images
    Validation set (168M)
7,663 images
  7,663 images
    Testing set (215M)
9,628 images
9,628 images
Readme
- v0.5 Training and Validation sets are from COCO Training and v0.5 Testing set is from COCO Validation
 
Format
[
  {
    'image_id': <COCO image id>,
    'split': <COCO split>,
    'caption': <image caption from COCO>,
    'dialog': [
      {
        'question': '...',
        'answer': '...',
        'options': <100 candidate answers>,
        'gt_index': <index of `answer` in `options`>
      },
      ... (10 rounds of dialog)
    ]
  },
  ...
]