페이지

2022년 2월 12일 토요일

Generating images

 A challenge to generating images such as the Portraint of Edmond Belamy with the approach used for the MNIST dataset is that frequently, images have no labels (such as a digit); rather, we want to map the space of random numbers into a set of artificial images using a latent vector, Z, as I described earlier in the chapter.


A further constraint is that we want to promote diversity of these images. If we input numbers within a certain range, we would like to know that they generate different outputs, and be able to tune the resulting image features. For this purpose, VAEs were developed to generate diverse and photorealistic image(Figure 1.5).


In the context of image classification tasks, being able to generate new images can help us increase the number of examples in an existing dataset, or reduce the bias if our existing dataset is heavily skewed toward a particular kind of photograph.

Applications could include generating alternative poses(angles, shades, or prespective shots) for product photographs on a fashion e-commerce website(Figure 1.6):






Building a better digit classifier

 A classic problem used to benchmark algorithms in machine learning and computer vision is the task of classifying which handwritten digit from 0-9 is represented in a pixelated image from the MNIST dataset. A large breakthrough on this problem occured in 2006, when researches at the University of Toronto and the National University of Singapore discovered a way to train deep neural networks to perform this task.


One of their ciritical observaltions was that instead of training a network to directly predict the most likely digit(Y) given an image(X), it was more effective to first train a network that could generate images, and then classify them as a secound step.

In Chapter 4, Teaching Networks to Generate Digits, I  will describe how this model improved upon past attempts, and how to create your own restricted Boltzmann machine and deep Boltzmann machine models that can generate new MNIST digit images.


Generating images

A challenge to generating images such as the Portraint of Edmond Belamy with the approach used for the MNIST dataset is that frequently, images have no labels (such as a digit); rather, we want to map

The promise of deep learning

 As noted already, many of the models we will survey in the book are deep, multilevel neural networks. The last 15 years have seen a renaissance in the development of deep learning models for image classification, natural language processing and understanding, and reinforcement learning. These advances were enabled by breakthroughs in traditional challenges in tuning and optimizing very complex models, combined with access to larger datasets, distributed computational power in the cloud, and frameworks such as TensorFlow make it easier to prototype and reproduce research.


2022년 2월 11일 금요일

Why use generative models?

 Now that we have reviewed what generative models are and defined them more formally in the language of probability, why would we have a need for such models in the first place? What value do they provide in practical applications? To answer this question, let's take a brief tour of the topics that we will cover in more detail in the rest of this book.

Discriminative and generative modeling and Bayues' theorem

 Now let's consider h ow these rules of conditional and joint probability relate to the kinds of predictive models that we build for various machine learning applications. In most cases-such as predicting whether an email is fraudulent or the dollar amount of the future lifetime value of a customer-we are interested in the conditional probability, P(Y|X=x), where Y is the set of outcomes we are tryuing to model, X represents the input features, and x is a patricular value of the input features. As discussed, this approach is known as discriminative modeling.

discriminative modeling attempts to learn a direct mapping between the data, X, and the outcomes, Y.

Another way to understand discriminative modeling is in the context of Bayers theorem, which relates the conditional an  joint probabilities of adataset:

P(Y|X) = P(X|Y)P(X) . P(X)  = P(X,Y) . P(X)

In Bayues' formula, the expression P(X|Y) . P(X) is known as the likelihood or the supporting evidence that the observation X gives to the likehood of observing Y.

P(Y) is the prior or the plausibility of the outcome, and P(Y|X) is the posterior or the probaility of the  plausibility of the outcome, and P(Y|X).P(X) is the posterior or the probability of the outcome given all the independent data we have observed related to the outcome thus far. Conceptually, Bayes' theorem states that the probability of an outcome is the product of its baseline probability and the probability of the input data conditional on this outcome.


The theorem was published two years after the author's death, and in a foreword Richard Price described it as a mathematical argument for the existence of God, which was perhaps appropriate given that Thomas Bayes served as a reverend during his life.


In the context of discriminative learning, we can thus see that a discriminative model directly computes the posterior; we could have a model of the likelihood or prior, but it is not required in this approach. Even though you may not have realized it, most of the models you have probably used in the machine learning toolkit are discriminative, such as the following:

- Linear regression

- Logistic regression

- Random forests

-Gradient-boosted decision trees(GBDT)

- Support vector machines(SVM)


The first two (linear and logistic regression) model the outcome, Y, conditional on the data, X, using a normal or Gaussian (linear regression) or sigmoidal(logistic regression) probability function. In contrast, the last three have no formal probability model-they compute a function (an ensemble of trees for random forests or GDBT, or an inner product distribution for SVM) that maps X to Y, using a loss or error function to tune those estimates, Given this nonparametric nature, some authors have argued that these constitute a separate class of non-model discriminative algorithms.


In contrast, a generative model attempts to learn the joint distribution P(Y,X) of the labels and the input data. Recall that using the definition of joint probability:

P(X,Y) = P(X|Y)P(Y)

We can rewrite Bayes' theorem as follows:

P(Y|X) = P(X,Y).P(X)

Instead of learning a direct mapping of X to Y using P(X|Y), as in the discriminative case, our goal is to model the joint probabilities of X and Y using P(X,Y). While we can use the resulting joint distribution of X and Y to compute the posterior, P(Y|X), and learn a targeted model, we can also use this distribution to sample new instances of the data by either jointly sampling new tuples(x,y), or sampling new data inputs using a target label, Y, with the following expression:

P(X|Y=y) = P(X,Y).P(Y)

Examples of generative models include the following:

- Naive BVayes classifiers

-Gaussian mixture models

-Latent Dirichlet Allocation(LDA)

-Hidden Markov models

-Deep Boltzmann machines

-VAEs

-GANs

Naive Bayes classifiers, though named as a discriminative model, utilize Bayes' theorem to learn the joint distribution of X and Y under the assumption that the X variables are independent, Similarly, Gaussian imxture models describe the likelihood of a data point belonging to one of a group of normal distributions using the joint probability of the label and these distributions.


LDA represents a document as the joint probability of a word and a set of underlying keyword lists (topics) that are used in a document, Hidden Markov models express the joint probability of a state and the next state of data, such as the weather on successive days of the week. As you will see in Chapter 4, Teaching Networks to Generate Digits, deep Boltzmann machines learn the joint probability of a label and the data vector it is associated with. The VAE and GAN models we will cover in Chapters 5,6,7, and 11 also utilize joint distributions to map between complex data types. This mapping allows us to generate data from random vectors or transform one kind of data into another.


As already mentioned, another view of generative models is that they allow us to generate samples of X if we know an outcome, Y. In the first four models in the previous list, this conditional probability is just a component of the model formula, with the posterior estimates still being the ultimate objective. However, in the last three examples, which are all deep neural network models, learning the conditional of X dependent upon a hidden, or latent, variable, Z, is actually the main objective, in order to generate new data samples, Using the rich structure allowed by  multilayered neural networks, these models can approximate the distribution of complex data types such as images, natural language, and sound. Also, instead of being a target value, Z, is often a random umber in these applications, serving merely as an input from which to generate a large space of hypothetical data points. To the extent we have a label (such as whether a generated image should be of a dog or dolphin, or the genre of a generated song), the model is P(X|Y=y, Z=z), where the label Y controls the generation of data that is otherwise unrestricted by the random nature of Z.




2022년 2월 5일 토요일

The rules of probability

 At the simplest level, a model, be it for machine learning or more classical method such as linear regression, is a mathematical description of how various kinds of data relate to one another.

In the task of modeling, we usually think about separating the variables of our dataset into two broad classes:

1. Independent data, which primarily means inputs to a model, are denoted by X. These could be categorical features(such as a "0" or "1" in six columns indicating which of six schools a student attends), continuous(such as the heights or text scores of the same students), or ordinal(the rank or a student in the class).

2. Dependent data, conversely, are the outputs of our models, and are denoted by Y.(Note that in some cases Y is label that can be used to condition a generative output, such as in a conditional GAN.) As with the independent variables, these can be continuous, categorical, or ordinal, and they can be an individual element or multidimensional matrix(tensor) for each element of the datast.

So how can we describe the data in our model using statistics? In other words, how can we quantiatively describe what values we are likely to see, and how frequently, and which values are more likely to appear together? One way is by asking the likelihood of observing a particular value in the data, or the probability of that value.

For example, it we were to ask what the probability is of observing a roll of 4 on a six-sided die, the answere is that, on average, we would observe a 4 once every six rols. We wite this as follows:

P(X=4)=1 = 16 = 16.67%

where P denotes probability of.

What defines the allowed probability values for a particular dataset? If we imagine the set of all possible values of a dataset, such as all values of a die, then a probability maps each value to a number between 0 and 1. The minimum is 0 because we can't have a negative chance of seeing a result; the most unlikely result is that we would never see a particular value, or 0% probability, such as rollihng a 7 on a six-side die, Similarly, we can't have greater than 100% probability of observing a result, represented by the value 1; an oucome with probabily 1 is absolutely certain. This set of probability values associated with a dataset belong to discrete classes(such as the faces of a die) or an infinite set of potential values(such as variations in height or weight). In either case, however, these values have to follow certain rules, the Probability Axioms described by the mathematician Andrey Kolmogorov in 1933:

1, The probaility of an observation (a die role, a particular height, and so on) is a non-negative, finite number between 0 and 1.

2. The probability of at least one of the observations in the space of all possible observations occurring is 1.

3. The joint probabiulity of distinct, mutuallyu exclusive events is the sum of the probability of the individual events.

While these rules might seem abstract, you will see in Chapter 3, Building Blocks of Deep Neural Networks, that they have direct erelevance to developing neural network models. For example, an application of rule 1 is to generate the probability between 1 and 0 for a particular outcome in a softmax function for predicting target classes.

Rule 3 is used to normalize these outcomes into the range 0-1, under the guarantee that they are mutually distinct predictions of a deep neural network(in other words, a real-world image logically can't be classified as both a dog and a cat, but rather a dog or a cat, with the probability of these two outcomes additive). Finally, the secound rule provides the theoretical guarantees that we can generate data at all using these models.

However, in the context of machine learning and modeling, we are not usually interested in just the probability of observing a piece of input data, X; we instead want to knoow the conditional probability of an outcome, Y, given the data, X. In other words, we want to know how likely a label is for a set of data, based on that data. We write this as the probability of Y given X, or the probability of Y conditional on X:

P(Y|X)

Another question we could ask about Y and X is how likely they are to occur together or their joint probability, which can be expressed using the preceding conditional probability expression as follows:

P(X,Y) = P(Y|X)P(X) =P(X|Y)(Y)

This formula expressed the probability of X and Y. In the case of X and Y being completely independent of one another, this is simple their product:

P(X|Y)P(Y) = P(Y|X)P(X) = P(X)P(Y)

You will see that these expressions become important in our discussion of complementary priors in Chapter 4, Teaching Networks to Generate Digits, and the ability of restricted Boltzmann machines to simulate independent data samples.

They are also important as building blocks of Bayes' theorem, which we will discuss next.






2022년 2월 4일 금요일

Implementing generative models

 While generative models could theoretically be implemented using a wide variety of machine learning algorithms, in practice, they are usually built with deep neural networks, which are well suited to capturing complex variations in data such as images or language.

In this book, we will focus on implementing theses deep generative models for many different applications using TensorFlow2.0. TensorFlow is a C++ framework, with APIs in the Python programming language, used to develop and productionize deep learning modles. It was open sourced by Google in 2013, and has become one of the most popular libraries for the research and deployment of neural network models.

With the 2.0 release, much of the boilerplate code that characterized development in earlier versions of the library was cleaned up with high-level abstractions, allowing us to focus on the model rather than the plumbing of the computations. 

The latest version also introduced the concept of eager execution, allowing network computations to be run on demand, which will be an important benefit of implementing some of our models.

In upcoming chapters, yuuou will learn not only the underlying theory behind these models, but the practical skills needed to implement them in popular programming frameworks. In Chapter 2, Setting up a TensorFlow Lab, you will learn how to set up a cloud environment that will allow you to run TensorFlow in a distributed fashion, using the Kubeflow framework to catalog your experiments.


Indeed, as I will describe in more detail in Chapter 3, Building B locks of Deep Neural Networks, since 2006 an explosion of research into deep learning using large neural network models has produced a wide variety of generative modeling applications.

The first of these was the restricted Boltzmann machine, which is stached in multiple layers to create a deep belief network. I will describe both of these models in Chapter 4, Teaching Networks to Generate Digits. Later innovations incluided Variational Autoencoders(VAEs), which can efficiently generate complex data samples from random numbers, using techniques that I will describe in Chapter 5, Painting Pictures with Neural Networks Using VAEs.

We will also cover the algorithm used to create The Portrait of Edmond Belamy, the GAN, in more detail in Chapter6, Image Generation with GANs, of this book.

Conceptually, the GAN model creates a competition between two neural networks. One(termed the generator) produces realistic(or, in the case of the experiments by Obvious, artistic) images starting from a set of random numbers and applying a mathematical transformation. In a sense, the generator is like an art student, producing new paintings from brushstrokes and creative inspiration.


The second network, known as the discriminator, attempts to classify whether a picture comes from a set of real-world images, or whether it was created by the generator. Thus, the discriminator acts like a teacher, grading whether the student has produced work comparable to the paintings they are attempting to mimic. As the generator becomes better at fooling the discriminator, its ouput becomes closer and closer to the historical examples it is designed to copy.

There are many classes of GAN models, with additional variants covered in Chapter7, Style Transfer with GANs, and Chapter 11. Composing Music with Generative Models, in our discussion of advanced models. Another key innovation in generative models is in the domain of natural language data. By representing the complex interrelationship between words ina sentence in a computationally scalable way the Transformer network and the Bidirectional Encoder from Trasformers(BERT) model built on top of it present powerful building block to generate textual data in applications such as chatbots, which we'll cover in more detail in Chapter 9, The Rise of Methods for Text Generation, and Chapter 10, NLP2.0: Using Transformers to Generate Text


In Chapter 12, Play Video Games with Generative AI: GAIL, you will also see how models such as GANs and VAEs can be used to generate not just images or text, but sets of rules that allow game-playuing networks developed with reinforcement learning algorithms to process and navigate their enviroment more efficientlyin essence, learning to learn. Generative models are a huge field or research that is constantly growing, so unfortunately, we can't cover every topic in this book. For the interested reader, references to further topics will be provided in Chapter 13, Emerging Applications in Generative AI.

To get started with come background information, let's discuss the rule of probability.