An encoder-decoder network to perform image captioning built using VGG16 and LSTMs. Data used for training is 8000 Flickr and 24k image captions - View it on GitHub
Star
0
Rank
12133043