페이지

2022년 2월 13일 일요일

Fake news and chatbots

 Humans have always wanted to talk to machines; the first chatbot, ELIZA, was written at MIT in the 1960s and used a simple program to transform a user's input and generate a response, in the mode of a therapist who frequently responds in the form of a question.


More sophisticated models can generate entirely novel text, such as Google's BERT and GPT-2, which use a unit called a transformer, A transformer module in a neural network allow a network to propose a new word in the context of preceding words in a piece of text, emphasizing, those that are more relevant in transformer units into a powerful multi-dimensional encoding of natural language patterns and contextual significance. This approach can be used in document creation for natural language processing(NLP) tasks, or for chatbot dialogue systems(Figure 1.3).

Style transfer and image transformation

 In addition to mapping artificial images to a space of random numbers, we can also use generative models to learn a mapping between one kind of image and a secound.

This kind of model can, for example, be used to convert an image of a horse into that of a zebra(Figure 1.7), create deep fake videos in which one actor's face has been replaced with another's, or transform a photo into a painting(Figures 1.2 and 1.4):


Another fascinating, example of applyingg generative modeling is a study in which lost masterpieces of the artist Pablo Picasso were discovered to have been painted over with another image. After X-ray imaging of The Old Guitarist and The Crouching Beggar indicated that earlier images of a woman and a landscape lay underneath(Figure 1.8), to train a neural style transfer model that transforms black-and-white images (the X-ray radiographs of the overlying paintings) to the coloration of the original artwork. Then, applying this transfer model to the hidden images allowed them to reconstruct colored-in versions of the lost paintings:


All of these models use the previously mentioned GANs, a type of deep learning model proposed in 2014 In addition to changing the contents of an images (such as dogs and humans with similar facial features, as in Figure 1.9), or generate textual descriptions from images(Figure 1.10):


We could also condition the properties of the generated images on some auxiliary information such as labels, an approach used in the GANCogh algorithm, which synthesizes images in the style of different artists by supplying the desired artist as an input to the generative model(Figure 1.4). I will describe these application in Chapter 6, Image Generation with GANs, and Chapter 7, Style Transfer with GANs.

2022년 2월 12일 토요일

Generating images

 A challenge to generating images such as the Portraint of Edmond Belamy with the approach used for the MNIST dataset is that frequently, images have no labels (such as a digit); rather, we want to map the space of random numbers into a set of artificial images using a latent vector, Z, as I described earlier in the chapter.


A further constraint is that we want to promote diversity of these images. If we input numbers within a certain range, we would like to know that they generate different outputs, and be able to tune the resulting image features. For this purpose, VAEs were developed to generate diverse and photorealistic image(Figure 1.5).


In the context of image classification tasks, being able to generate new images can help us increase the number of examples in an existing dataset, or reduce the bias if our existing dataset is heavily skewed toward a particular kind of photograph.

Applications could include generating alternative poses(angles, shades, or prespective shots) for product photographs on a fashion e-commerce website(Figure 1.6):






Building a better digit classifier

 A classic problem used to benchmark algorithms in machine learning and computer vision is the task of classifying which handwritten digit from 0-9 is represented in a pixelated image from the MNIST dataset. A large breakthrough on this problem occured in 2006, when researches at the University of Toronto and the National University of Singapore discovered a way to train deep neural networks to perform this task.


One of their ciritical observaltions was that instead of training a network to directly predict the most likely digit(Y) given an image(X), it was more effective to first train a network that could generate images, and then classify them as a secound step.

In Chapter 4, Teaching Networks to Generate Digits, I  will describe how this model improved upon past attempts, and how to create your own restricted Boltzmann machine and deep Boltzmann machine models that can generate new MNIST digit images.


Generating images

A challenge to generating images such as the Portraint of Edmond Belamy with the approach used for the MNIST dataset is that frequently, images have no labels (such as a digit); rather, we want to map

The promise of deep learning

 As noted already, many of the models we will survey in the book are deep, multilevel neural networks. The last 15 years have seen a renaissance in the development of deep learning models for image classification, natural language processing and understanding, and reinforcement learning. These advances were enabled by breakthroughs in traditional challenges in tuning and optimizing very complex models, combined with access to larger datasets, distributed computational power in the cloud, and frameworks such as TensorFlow make it easier to prototype and reproduce research.


2022년 2월 11일 금요일

Why use generative models?

 Now that we have reviewed what generative models are and defined them more formally in the language of probability, why would we have a need for such models in the first place? What value do they provide in practical applications? To answer this question, let's take a brief tour of the topics that we will cover in more detail in the rest of this book.

Discriminative and generative modeling and Bayues' theorem

 Now let's consider h ow these rules of conditional and joint probability relate to the kinds of predictive models that we build for various machine learning applications. In most cases-such as predicting whether an email is fraudulent or the dollar amount of the future lifetime value of a customer-we are interested in the conditional probability, P(Y|X=x), where Y is the set of outcomes we are tryuing to model, X represents the input features, and x is a patricular value of the input features. As discussed, this approach is known as discriminative modeling.

discriminative modeling attempts to learn a direct mapping between the data, X, and the outcomes, Y.

Another way to understand discriminative modeling is in the context of Bayers theorem, which relates the conditional an  joint probabilities of adataset:

P(Y|X) = P(X|Y)P(X) . P(X)  = P(X,Y) . P(X)

In Bayues' formula, the expression P(X|Y) . P(X) is known as the likelihood or the supporting evidence that the observation X gives to the likehood of observing Y.

P(Y) is the prior or the plausibility of the outcome, and P(Y|X) is the posterior or the probaility of the  plausibility of the outcome, and P(Y|X).P(X) is the posterior or the probability of the outcome given all the independent data we have observed related to the outcome thus far. Conceptually, Bayes' theorem states that the probability of an outcome is the product of its baseline probability and the probability of the input data conditional on this outcome.


The theorem was published two years after the author's death, and in a foreword Richard Price described it as a mathematical argument for the existence of God, which was perhaps appropriate given that Thomas Bayes served as a reverend during his life.


In the context of discriminative learning, we can thus see that a discriminative model directly computes the posterior; we could have a model of the likelihood or prior, but it is not required in this approach. Even though you may not have realized it, most of the models you have probably used in the machine learning toolkit are discriminative, such as the following:

- Linear regression

- Logistic regression

- Random forests

-Gradient-boosted decision trees(GBDT)

- Support vector machines(SVM)


The first two (linear and logistic regression) model the outcome, Y, conditional on the data, X, using a normal or Gaussian (linear regression) or sigmoidal(logistic regression) probability function. In contrast, the last three have no formal probability model-they compute a function (an ensemble of trees for random forests or GDBT, or an inner product distribution for SVM) that maps X to Y, using a loss or error function to tune those estimates, Given this nonparametric nature, some authors have argued that these constitute a separate class of non-model discriminative algorithms.


In contrast, a generative model attempts to learn the joint distribution P(Y,X) of the labels and the input data. Recall that using the definition of joint probability:

P(X,Y) = P(X|Y)P(Y)

We can rewrite Bayes' theorem as follows:

P(Y|X) = P(X,Y).P(X)

Instead of learning a direct mapping of X to Y using P(X|Y), as in the discriminative case, our goal is to model the joint probabilities of X and Y using P(X,Y). While we can use the resulting joint distribution of X and Y to compute the posterior, P(Y|X), and learn a targeted model, we can also use this distribution to sample new instances of the data by either jointly sampling new tuples(x,y), or sampling new data inputs using a target label, Y, with the following expression:

P(X|Y=y) = P(X,Y).P(Y)

Examples of generative models include the following:

- Naive BVayes classifiers

-Gaussian mixture models

-Latent Dirichlet Allocation(LDA)

-Hidden Markov models

-Deep Boltzmann machines

-VAEs

-GANs

Naive Bayes classifiers, though named as a discriminative model, utilize Bayes' theorem to learn the joint distribution of X and Y under the assumption that the X variables are independent, Similarly, Gaussian imxture models describe the likelihood of a data point belonging to one of a group of normal distributions using the joint probability of the label and these distributions.


LDA represents a document as the joint probability of a word and a set of underlying keyword lists (topics) that are used in a document, Hidden Markov models express the joint probability of a state and the next state of data, such as the weather on successive days of the week. As you will see in Chapter 4, Teaching Networks to Generate Digits, deep Boltzmann machines learn the joint probability of a label and the data vector it is associated with. The VAE and GAN models we will cover in Chapters 5,6,7, and 11 also utilize joint distributions to map between complex data types. This mapping allows us to generate data from random vectors or transform one kind of data into another.


As already mentioned, another view of generative models is that they allow us to generate samples of X if we know an outcome, Y. In the first four models in the previous list, this conditional probability is just a component of the model formula, with the posterior estimates still being the ultimate objective. However, in the last three examples, which are all deep neural network models, learning the conditional of X dependent upon a hidden, or latent, variable, Z, is actually the main objective, in order to generate new data samples, Using the rich structure allowed by  multilayered neural networks, these models can approximate the distribution of complex data types such as images, natural language, and sound. Also, instead of being a target value, Z, is often a random umber in these applications, serving merely as an input from which to generate a large space of hypothetical data points. To the extent we have a label (such as whether a generated image should be of a dog or dolphin, or the genre of a generated song), the model is P(X|Y=y, Z=z), where the label Y controls the generation of data that is otherwise unrestricted by the random nature of Z.