페이지

2022년 4월 2일 토요일

Vanilla GAN

 We have covered quite a bit of ground in understanding the basics of GANs. In ghis section, we will apply that understanding and build a GAN from scratch. This generative model will consist of a repreating block architecture, similar to the one presented in the original paper. We will try to replicate the task of generating MNIST digits using our network.

The overall GAN setup can be seen in Figure 6.8. The figure outlines a generator model with moise vector z as input and repeating blocks that transform and scale up the vector to the required dimensions. Each block consists of a dense layer followed by Leaky ReLU activation and a batch-normalization layer, We simply reshape the output from the final block to transform it into the required output image size.

The descriminator, on the other hand, is a simple feedforward network. This model takes an image as input( a real image or the fake output from the generator) and classifies it as real or fake. This simple setup of two competing models helps us to train the overall GAN.

We will be relying on TensorFlow 2 and using the high-level Keras API wherever possible. The first step is to define the discriminator model. In this implementation, we will use a very basic multi-layer perceptron(MLP) as the discriminator model:

def build_discriminator(input_shape=(28,28,), verbose=True):

    """

    Utility method to build a MLP discriminator

    Parameters:

        input_shape:

            type:tuple, shape of input image for classification. 

                Default shape is (28,28)-> MNIST

        verbose:

            type:boolean. Print model summary if set to true.

    Returns:    

        tensorflow.keras.model object

"""

    model = Sequential()

    model.add(Input(shape=input_shape))

    model.add(Flatten())

    model.add(Dense(512))

    model.add(LeakyReLU(alpha=0.2))

    model.add(Dense(1, activation='sigmoid'))


    if vervose:

        model.summary()

    return model

We will use the sequential API to prepare this simple model, with just four layers and the final output layer with sigmoid activation. Since we have a binary classification task, we have only one unit in the final layer, We will use binary cross-entropy loss to train the discriminator model.

The generator model is also a multi-layer perceptron with multiple layers scaling up the  noise vector z to the desired size. Since our task is to generate MNIST-like output samples, the final reshape layer will convert the flat vector into a 28*28 output shape. Note that we will make use of batch normalizaiton to stabilize model training. The following snippet shows a utility method for building the gene4rator model:

def build_generator(z_dim=100, output_shape=(28,28), verbose=True):

    """

    Utility mothod to build a MLP generator

    Parameters:

        z_dim:

            type:int(positive). Size of input noise vector to be used as model input.

                default value is 100

        output_shape:    type:tuple. Shape of output image.

                                Default shape is (28,28)->MNIST

    Returns:

        tensorflow.keras.model object

    """

    model = Sequential()

    model.add(Input(shape=(z_dim,)))

    model.add(Dense(256, input_dim=z_dim))

    model.add(LeakyReLU(alpha=0.2))

    model.add(BatchNormalization(momentum=0.8))

    model.add(Dense(512))

    model.add(LeakyReLU(alpha=0.2))

    model.add(BatchNormalization(momentum=0.8))

    model.add(Dense(np.prod(output_shape), activation='tanh'))

    model.add(Reshape(output_shape))

    

    if verbose:

        model.summary()

    return model

We simply use these utility methods to create generator and discriminator model objects. The following snippet uses these two model objects to create the GAN object as well:

discriminator = build_discriminator()

discriminator.compile(loss='binary_crossentropy',

                                optimizer=adam(0.0002, 0.5),

                                metrics=['accuracy'])

generator = build_Generator()

z_dim = 1000 #noise

z = Input(shape=(z_dim,))

img = generator(z)

#For the combined model  we will only train the generator

discriminator.trainable = False

# The discriminator takes generated images as  input

# and determines validity

validity =- discriminator(img)

#The combined model (stacked generator and discriminator)

# Trains the generator to fool the discriminator

gen_model = Model(z, validity)

gan_model.compile(loss='binary_crossentropy', optimizer=Adam(0.0002, 0.5))

The final piece of the puzzle is defining the training loop. As described in the previous section, we will train both(discriminator and generator) models alternatingly. Doing so is straightforward with high-level Keras APIs. The following code snippet first loads the MNIST dataset and scales the pixel valuyes between -1 and +1:




2022년 3월 25일 금요일

Maximum likelihood game

 The minimax game can be transformed into a maximum likelihood game where the aim is to maximize the likelihood of the generator probability density. This is done to ensure that the generator probability density is similar to the real/training data probability density. In other words, the game can be transformed into minimizeing the divergence between Pz and Pdata. To do so, we make use of kullback-Leibler divergence(KL divergence) to calculate the similarity betwen two distributions of interest. The overall value function can be denoted as:

The cost function for the generator transforms to:

One important point to note is that KL divergence is not a symmetric measure, that is, KL(Pdata || pg) != KL(Pg||Pdata). Themodel typically uses KL(Pg||Pdata) to achieve better results.

The three different cost function discussed so far have slightly different trajectories and thus load to different properties at different stages of training. These three functions can be visualized as shown in Figure 6.7:


Non-saturating generator cost

 I practice, we do not train the generator to minimize log(1-D(G(z))) as this function does not provide sufficient gradients for learning. During the initial learning phases, where G is poor, the discriminator is able to classify the fake from the real with high confidence. This leads to the saturation of log(1-D(G(z))), which hinders improvements in the generator model. We thus tweak the generator to maximize log(D(G(z))) instead:

This provides stronger gradients for the generator to learn. This is shown in Figure 6.6. The x-axis denotes D(G(z)). The top line shows the objective, which is minimizing the likelihood of the discriminator being correct. The bottom line(updated objective) works by maximizing the likelihood of the discirimiator being wrong

Figure 6.6 illustrates how a slight change helps achieve better gradients during the initial phases of training.

Training GANs

 Training a GAN is like playing this game of two adversries. The generator is learning to generate good enough fake samples, while the discriminator is working hard to discriminate between real and fake. More formally, this is termed as the minimax game, where the value function V(G,D) is described as follows:

This is also called the zero-sum game, which has an equilibrium that is the same as the Nash equilibrium. We can better understand the value function V(G,D) by separating out the objective function for each of the players. The following equations describe individual objective functions:

where Jd is the discriminator objective function in the classical sense, Jg is the generator objective equal to the negative of the discriminator, and Pdata is the distribution of the training data. The rest of the terms have their usual meaning. This is one of the simplest ways of defining the game or corresponding objective functions. Over the years, different ways have been studied, some of which we will cover in this chapter.

The objective functions help us to understand the aim of each of the players. If we assume both probability densities are non-zero everywhere, we can get the optimal value of D(x) as:

We will revisit this equation in the latter part of the chapter. For now, the next step is to present a training algorithm whrein the discriminator and generator models frain towards their repspective objectives. The simplest yet widely used way of training a GAN(and by for the most successful one) i s as follows.

Repeat the following steps N times. N is the number of total iterations:

1. Repeat steps k tiems:

* Sample a minibatch of size m from the generator:{z1,z2...zm} = Pmodel(z)

* Sample a minibatch of size m from the actual data:{x1,x2,..xm} = Pdata(x)

* Update the discriminator loss, Jd

2. Set the discriminator as non-trainable

3. Sample a minibatch of size m from the generator: {z1, z2,...zm}=Pmodel(z)

4. Update the generator loss, Jg

In their original paper, Goodfellow et al. used k=1, that is, they trained discriminator and generator models alternately. There are some variants and hacks where it is observed that training the discriminator more often than the generator helps with better convergence.


The following figure(Figure 6.5) showcases the training phases of the generator and discriminator models. The smaller dotted line is the discriminator model, the solid line is the generator model, and the larger dotted line is the actual training data. The vertical lines at the bottom demonstrate the sampling of data points from the distribution of z, that is, x=pmodel(z). The line point to the fact that the generator contracts in the regions of high density and expands in the regions of low density. Part(a) shows the initial stages of the training phases where the discriminator (D) is a partially correct classifier. Parts(b) and (c) show thow improvements in D guide changes in the generator, G. Finally, in part(d) you can see where pmodel=pdata and the discriminator is no longer able to differentiate between fake and real samples, that is D(x)=1/2



2022년 3월 24일 목요일

The generator model

 This is the primary model of interest in the whole game. This model generates samples that are intended to resemble the samples from our training set. The model takes random unstructured noise as input (typically denoted as z) and tries to create a varied set of output. The generator model is usually a differentiable function; it is often represented by a deep neural network but is not restricted to that.

We denote the generator as G and its output as G(z). We typically use a lower-dimensional z as compared to the dimension of the orginal data, x, that is, Zdim <= Xdim. This is done as a way of compressing or encoding real-world information into lower-dimensional space.

In simple words, the generator trains to generate samlples good enough to fool the discriminator, while the discriminator trains to properly classify  real(training samples) versus fake (output from the generator). Thus, this game of adversaries uses a generator model, G, which tries to make D(G(z)) as close to 1 as possible. The discriminator is incentivized to make D(C(z)) close to 0, where 1 denotes real and 0 denotes fake samples. The GAN model achieves equlibrium when the generator starts to easily fool the discriminator, that is, the discriminator reaches its saddle point. While, in theory, GANs Have several advantages over other methods in the family tree described previously, they pose their own set of problems. We will discuss some of them in the upcoming sections.


2022년 3월 22일 화요일

The discriminator model

 This model represents a differentiable function that tries to maximize a probability of 1 for samples drawn from the training distribution. This can be any classification model, but we usually prefer a deep neural network for this. This is the throw-away model(similar to the decoder part of autoencodeers).

The discriminator is also used to classify whether the output from the generator is real or fake. The main utility of this model is to help develop a robust generator. We denote the discriminator model as D and its output as D(x). When it is used to classify output from the generator model. the discriminator model is denoted as D(G(z)), where G(z) is the output from the generator model.

Generative adversarial networks

 GANs have a pretty interesting origin story. It all began as a discussion / argument in a bar with lan Goodfellow and friends discussing work related to generating data using neural networks. The argument ended with everyone downplaying each other's methods. Goodfellow went back home and coded the first version of what we now calls a GAN. To his amazement, the code worked on the first try. Amore verbose description of the chain of events was shared by Goodfellow himself in an interview with Wired magazine.

As mentioned, GANs are implicit density functions that sample directly from the underlying distribution. They do this by defining a two-player game of adversaries. The adversaries compete against each other under welll-defined reward functions and each player tries to maximize its rewards. Without going into the details of game theory, the framework can be explained as follows.