Text-to-image synthesis using the GAN architecture

Let's look at image generation from a textual description using a GAN. The following image shows the full architecture of such a GAN:

This is a type of conditional GAN. The generator network takes input text with a noise vector to generate the image. The generated image is conditioned on the input text. The image description is converted into a dense vector using embedding layer φ(t). It's compressed using a fully connected layer and is then concatenated with the noise vector. The detector network is a CNN, and the architecture of the generator network uses deconvolution layers with the same filters as the ones used in CNN networks. Deconvolution is basically a transposed convolution—we'll discuss that later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.198.138