Sound, like images or text, is a complex, high-dimensional kind of data. Music in particular has many complexities: it could involve on or serveral musicians, has a temporal structure, and can be divided into thematically related segments, All of these components are incorporated into models such as MuseGAN, as mentioned earlier, which uses GANs to generate these various components and synthesize them into realistic, yet synthetic, musical tracks. I will describe the implementation of MuseGAN and its variants in Chapter 11, Composing Music with Generative Models.
2022년 2월 13일 일요일
Fake news and chatbots
Humans have always wanted to talk to machines; the first chatbot, ELIZA, was written at MIT in the 1960s and used a simple program to transform a user's input and generate a response, in the mode of a therapist who frequently responds in the form of a question.
More sophisticated models can generate entirely novel text, such as Google's BERT and GPT-2, which use a unit called a transformer, A transformer module in a neural network allow a network to propose a new word in the context of preceding words in a piece of text, emphasizing, those that are more relevant in transformer units into a powerful multi-dimensional encoding of natural language patterns and contextual significance. This approach can be used in document creation for natural language processing(NLP) tasks, or for chatbot dialogue systems(Figure 1.3).
Style transfer and image transformation
In addition to mapping artificial images to a space of random numbers, we can also use generative models to learn a mapping between one kind of image and a secound.
This kind of model can, for example, be used to convert an image of a horse into that of a zebra(Figure 1.7), create deep fake videos in which one actor's face has been replaced with another's, or transform a photo into a painting(Figures 1.2 and 1.4):
Another fascinating, example of applyingg generative modeling is a study in which lost masterpieces of the artist Pablo Picasso were discovered to have been painted over with another image. After X-ray imaging of The Old Guitarist and The Crouching Beggar indicated that earlier images of a woman and a landscape lay underneath(Figure 1.8), to train a neural style transfer model that transforms black-and-white images (the X-ray radiographs of the overlying paintings) to the coloration of the original artwork. Then, applying this transfer model to the hidden images allowed them to reconstruct colored-in versions of the lost paintings:
All of these models use the previously mentioned GANs, a type of deep learning model proposed in 2014 In addition to changing the contents of an images (such as dogs and humans with similar facial features, as in Figure 1.9), or generate textual descriptions from images(Figure 1.10):
We could also condition the properties of the generated images on some auxiliary information such as labels, an approach used in the GANCogh algorithm, which synthesizes images in the style of different artists by supplying the desired artist as an input to the generative model(Figure 1.4). I will describe these application in Chapter 6, Image Generation with GANs, and Chapter 7, Style Transfer with GANs.
2022년 2월 12일 토요일
Generating images
A challenge to generating images such as the Portraint of Edmond Belamy with the approach used for the MNIST dataset is that frequently, images have no labels (such as a digit); rather, we want to map the space of random numbers into a set of artificial images using a latent vector, Z, as I described earlier in the chapter.
A further constraint is that we want to promote diversity of these images. If we input numbers within a certain range, we would like to know that they generate different outputs, and be able to tune the resulting image features. For this purpose, VAEs were developed to generate diverse and photorealistic image(Figure 1.5).
In the context of image classification tasks, being able to generate new images can help us increase the number of examples in an existing dataset, or reduce the bias if our existing dataset is heavily skewed toward a particular kind of photograph.
Applications could include generating alternative poses(angles, shades, or prespective shots) for product photographs on a fashion e-commerce website(Figure 1.6):
Building a better digit classifier
A classic problem used to benchmark algorithms in machine learning and computer vision is the task of classifying which handwritten digit from 0-9 is represented in a pixelated image from the MNIST dataset. A large breakthrough on this problem occured in 2006, when researches at the University of Toronto and the National University of Singapore discovered a way to train deep neural networks to perform this task.
One of their ciritical observaltions was that instead of training a network to directly predict the most likely digit(Y) given an image(X), it was more effective to first train a network that could generate images, and then classify them as a secound step.
In Chapter 4, Teaching Networks to Generate Digits, I will describe how this model improved upon past attempts, and how to create your own restricted Boltzmann machine and deep Boltzmann machine models that can generate new MNIST digit images.
Generating images
A challenge to generating images such as the Portraint of Edmond Belamy with the approach used for the MNIST dataset is that frequently, images have no labels (such as a digit); rather, we want to map
The promise of deep learning
As noted already, many of the models we will survey in the book are deep, multilevel neural networks. The last 15 years have seen a renaissance in the development of deep learning models for image classification, natural language processing and understanding, and reinforcement learning. These advances were enabled by breakthroughs in traditional challenges in tuning and optimizing very complex models, combined with access to larger datasets, distributed computational power in the cloud, and frameworks such as TensorFlow make it easier to prototype and reproduce research.
2022년 2월 11일 금요일
Why use generative models?
Now that we have reviewed what generative models are and defined them more formally in the language of probability, why would we have a need for such models in the first place? What value do they provide in practical applications? To answer this question, let's take a brief tour of the topics that we will cover in more detail in the rest of this book.