The Flickr30K Dataset contains 31,014 images sourced from online photo-sharing websites (Young et al., 2014). Each image is paired with five English descriptions, which were collected from Amazon Mechanical Turk2. The dataset contains 145,000 training, 5,070 development, and 5,000 test descriptions. The Multi30K dataset extends the Flickr30K dataset with translated and independent German sentences. French translations were realized by LIUM Laboratory.