How it works...

The task of Visual Question Answering is tackled by using a combination of different deep neural networks. A pre-trained VGG16 has been used to extract features from images, and a sequence of LSTM has been used to extract features from questions previously mapped into an embedding space. VGG16 is a CNN used for image feature extraction, while LSTM is an RNN used for extracting temporal features representing the sequences. The combination of these two is currently the state of the art for dealing with this type of network. A multi-layered perceptron with dropout is then added on top of the combined models in order to form our deep network.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.98.34