Limit(0)

2022년 2월 13일 일요일

Summary

In this chapter, we discussed what generative modeling is, and how it fits into the landscape of more familiar machine learning methods. I used probability theory and Bayes' theorum to describe how these models approach prediction in an opposite mananer to discriminative learning.

We reviewed use cases for generative learning, both for specific kinds of data and general prediction tasks. Finally, we examined some of the specialized challenges that arise from building these models.

In the next chapter, we will begin our parctical implementation of these models by exploring how to set up a developement environment for TensorFlow 2.0 using Docker and Kubeflow.

Unique challenges of generative models

Given the powerful applications that generative models have, what are the major challenges i n implementing them? As described, most of these models utilize complex data, requiring us to fit large models to capture all the nuances of their features and distribution. This has implications both for the number of examples that we must collect to adequately represent the kind of data we are trying to generate, and the computational resources needed to build the model. We will discuss techniques in Chapter 2, Setting frameworks and graphics processing units (GPIs).

A more subtle problem that comes from having complex data, and the fact that we are trying to generate data rather than a numerical label or value, is that our notion of model accuracy is much more complicated: we cannot simply calculate the distance to a single label or scores.

We will discuss in Chapter 5, Painting Pictures with Neural Networks Using VAEs, and Chapter 6, Image Generation with GANs, how deep generative models such as VAE and GAN algorithms take different approaches to determine whether a generated image is comparable to a real-world image. Finally, as mentioned, our models need to allow us to generate both large and diverse samples, and the various methods we will discuss take different approaches to control the diversity of data.

The rules of the game

The preceding applications concern data types we can see, hear, or read. However, generative models also have applications to generate rules. This is useful in a populat application of deep learning: using algorithms to play board games or Atari video games.

While these applications have traditionally used reinforcement learning (RL) techniques to train netwo4rks to employ the optirnal strategy in these games, new research has suggested using GANs to propose novel rules as part of the training process, or to generate synthetic data to prime the overall learning process. We will examine both applications in Chapter 12, Play Video Games with Generative AI: GAIL.

Sound composition

Sound, like images or text, is a complex, high-dimensional kind of data. Music in particular has many complexities: it could involve on or serveral musicians, has a temporal structure, and can be divided into thematically related segments, All of these components are incorporated into models such as MuseGAN, as mentioned earlier, which uses GANs to generate these various components and synthesize them into realistic, yet synthetic, musical tracks. I will describe the implementation of MuseGAN and its variants in Chapter 11, Composing Music with Generative Models.

Fake news and chatbots

Humans have always wanted to talk to machines; the first chatbot, ELIZA, was written at MIT in the 1960s and used a simple program to transform a user's input and generate a response, in the mode of a therapist who frequently responds in the form of a question.

More sophisticated models can generate entirely novel text, such as Google's BERT and GPT-2, which use a unit called a transformer, A transformer module in a neural network allow a network to propose a new word in the context of preceding words in a piece of text, emphasizing, those that are more relevant in transformer units into a powerful multi-dimensional encoding of natural language patterns and contextual significance. This approach can be used in document creation for natural language processing(NLP) tasks, or for chatbot dialogue systems(Figure 1.3).

Style transfer and image transformation

In addition to mapping artificial images to a space of random numbers, we can also use generative models to learn a mapping between one kind of image and a secound.

This kind of model can, for example, be used to convert an image of a horse into that of a zebra(Figure 1.7), create deep fake videos in which one actor's face has been replaced with another's, or transform a photo into a painting(Figures 1.2 and 1.4):

Another fascinating, example of applyingg generative modeling is a study in which lost masterpieces of the artist Pablo Picasso were discovered to have been painted over with another image. After X-ray imaging of The Old Guitarist and The Crouching Beggar indicated that earlier images of a woman and a landscape lay underneath(Figure 1.8), to train a neural style transfer model that transforms black-and-white images (the X-ray radiographs of the overlying paintings) to the coloration of the original artwork. Then, applying this transfer model to the hidden images allowed them to reconstruct colored-in versions of the lost paintings:

All of these models use the previously mentioned GANs, a type of deep learning model proposed in 2014 In addition to changing the contents of an images (such as dogs and humans with similar facial features, as in Figure 1.9), or generate textual descriptions from images(Figure 1.10):

We could also condition the properties of the generated images on some auxiliary information such as labels, an approach used in the GANCogh algorithm, which synthesizes images in the style of different artists by supplying the desired artist as an input to the generative model(Figure 1.4). I will describe these application in Chapter 6, Image Generation with GANs, and Chapter 7, Style Transfer with GANs.

2022년 2월 12일 토요일

Generating images

A challenge to generating images such as the Portraint of Edmond Belamy with the approach used for the MNIST dataset is that frequently, images have no labels (such as a digit); rather, we want to map the space of random numbers into a set of artificial images using a latent vector, Z, as I described earlier in the chapter.

A further constraint is that we want to promote diversity of these images. If we input numbers within a certain range, we would like to know that they generate different outputs, and be able to tune the resulting image features. For this purpose, VAEs were developed to generate diverse and photorealistic image(Figure 1.5).

In the context of image classification tasks, being able to generate new images can help us increase the number of examples in an existing dataset, or reduce the bias if our existing dataset is heavily skewed toward a particular kind of photograph.

Applications could include generating alternative poses(angles, shades, or prespective shots) for product photographs on a fashion e-commerce website(Figure 1.6):