페이지

2022년 2월 18일 금요일

Deep neural network development and TensorFlow

 As we well see in Chapter 3, Building Blocks of Deep Neural Networks, a deep neural network in essence consists of matrix operations(addition, subtraction, multiplication), nonlinear transformations, and gradient-based updates computed by  using the derivatives of theses components.

In the world of academia, researchers have historically often used efficient prototyping tools such as MATLAB' to run models and prepare analyses. While this approach allows for rapid experimentation, it lacks elements of industrial software development, such as object-oriented(oo) development, that allow for reproducibility and clean software abstractions that allow tools to be adopted by large organizations. These tools also had difficulty scaling to large dataset and could carry heavy licensing fees for such industrial use cases. However, prior to 2006, this type of computational tooling was largely sufficient for most use cases. 

However, as the datasets being tackled with deep neural network algorithms grew, groundbreaking results were achieved such as:

- Image classification on the ImageNet dataset

- Large-scale unsupervised discovery of image patterns in YouTube videos

- The creation of artificial agents capable of playing Atari video games and the Asian board game GO with human-like skill

- State-of-the-art language translation via the BERT model developed by Google


The model developed in these studies exploeded in complecity along with the size of the datasets they were applied to (see Table 2.2 to get a sense of the immense scale of some of these models). As industrial use case required robust and scalable frameworks to develop and deploy new neural networks, serveral academic groups and large technology companies invested in the development of generic toolkits for the implementation of deep learning models. These software libraries codified common patterns into reusable abstrations, allowing even complex models to be oftern embodied in relatively simple expreimental scripts.

Model Name             Year            #Parameters

AlexNet                2012            61M

YouTube CNN        2012            1B

Inception               2014            5M

VCG-16                2014            138M

BERT                    2018            340M

GPT-3                2020                175B


Some of early examples of these frameworks include Theano, a Python package developed at the University of Montreal, and Torch, a library written in the Lua language that was later ported to Python by researchers at Facebook, and TensorFlow, a C++ runtime with Python binding developed by Google.


In this book, we will primarily use TensorFlow 2.0, due to its widespread adoption and its convenient high-level interface, Keras, which abstracts much of the repetitive plumbing of defining runtime layers and model architecture.


TensorFlow is an open-source verion of an internal tool developed at Google called DisBelief. The DisBelief framework consisted of distributted workers(independent computaional processes running on a cluster of machines) that would compute forward and backward gradient descent passes on a network(a common way to train neural networks we will discuss in Chapter 3, Building Blocks of Deep Neural Networks), and send the results to a Parameter Server that aggregated the updates. The neural networks in the DisBelief framework were represented as a Directed Acyclic Graph(DAG), termination in a loss function that yielded a scalar(numerical value) comparing the network predictions with the observed target(such as image class of the probability distribution over a vocabulary representing the most probable next word in a sentence in a traslation model).


A DAG is a software data structure consisting of nodes(operations) and data (edges) where information only flows ina single direction along, the edges (thus directed)And where are no loops(hence acyclic).


While DisBelief allowed Google to productionize serveral large models, it had limitations:

- First, the Python scriping interfface was developed with a set of pre-defined layers corresponding to underlying implementations in C++; adding novel layer types required coding, in C++, which represented a barrier to productivity.

- Secondly, while the system was well adapted for training feed-forward networks using, basic Stochastic Gradient Descent(SGD) (an algorithm we will describe in more detail in Chater 3, Building Blocks of Deep Neural Networks) on Large-scale data, it lacked flexibliity for accommodation recurrent, reinforcement learning, or adverarial learning paradigms - the latter of which is crucial to many of the algorithms we will implement in this book.

- Finallyu, this system was difficult to scale down - to run the same job, for example, on a desktop with GOUs as well as a distributed environment with multiple cores per machine, and deployment also required a different technical stack.

Jointly, these considerations prompted the development of TensorFlow as a generic deep learning computational framework: one that could allow scientists to flexibly experiment with new layer architectures or cutting-edge training paradigms, while also allowing this experimentation to be run with the same tools on botha laptop (for early-stage work) and a computing cluster (to scale up more mature models). while also easing the transition between research and development code by providing a common runtime for both.


Though both libraries share the concept of the computation graph (networks represented as a graph of operations (nodes)and data (edges)) and a dataflow programming model (where matrix operations pass through the directed edges of a graph and have operations applied to them), TensorFlow, unlike DistBelief, was designed with the edges of the graph being tensors (n-dimensional matrices) and nodes of the graph being atomic operations (addition, subtraction, nonlinear operations - this allows for much greater flexibility in defining new computations and even allowing for mutation and stateful updates (these being simple additional nodes in the graph).


The dataflow graph in essence servers as a "placeholder" where data is slotted into defined variables and can be executed on single or multiple machine. TensorFlow optimizes the constructed dataflow graph in the C++ runtime upon execution, allowing optimization, for example, in issuing commands to the GPU, The different computations of the graph can also be executed across multiple machines and hardware, including, CPUs, GPUs, and TPUs (custom tensor processing chips developed by Google and available in the Google Cloud computing environment), as the same computations described at a high level in TensorFlow are implemented to execute on multiple backend system.


Because the dataflow graph allows mutable state, in essence, there is also no longer a centralized parameter server as was the case for DisBelief (though TensorFlow can also be run in a distributed manner with a parameter server configuration), since different nodes that hold state can execute the same opertions as any other worker nodes. Further, countrol flow operations such as loops allow for the training of variable-length inputs such as in recurrent networks (see Chapter 3, Building Blocks of Deep Neural Networks). In the context of training neural networks, the gradients of each layer are simply represented as additional operations in the graph, allowing optimizations such as velocity (as in the RMSProp or ADAM optimizers, described in Chapter 3, Building Blocks of Deep Neural Networks) to be included using the same framework rather than modifyingg the parameter server logic. In the context of distributed training, TensorFlow also has several checkpointing and redundancy mechanisms("backup" workers in case of a single task failure) that make it suited to robust training in distributed environments.




2022년 2월 13일 일요일

2. Setting Up a TensorFlow Lab

 Now that you have seen all the amazing applications of generative models in Chapter1, An Introduction to Generative AI: "Drawing" Data from Models, you might be wondering how to get started with implementing these projects that use these kinds of algorithms. In this chapter, we will walk through a number of tools that we will use throughout the rest of the book to implement the deep neural networks that are used in various generative AI models. Our primary tool is the TensorFlow 2.0 additional resources to make the implementation process easier(summarized in Table 2.1).


We can broadly categorize these tools:

- Resources for replicabel dependency management(Docker, Anaconda)

- Exploatory tools for data munging and algorith hacking (Jupyter)

- Utilities to deploy these resource to the cloud and manage their lifecydle(Kubernetes, Kubeflow, Terraform)


Tool            Proejct site            use

Docker        www.docker.com    Application runtime dependency encapuslation

Anaconda    www.anaconda.com Python language package management

Jupyter        jupyter.org             Interactive Python runtime and plotting 

                                            data exploration tool

Kubernetes kubernetes.io          Docker container orchestration and reousrce                                                           management

Kuberflow  www.kubeflow.org    Machine learning workflow engine developed on

                                             kubernetes

Terraform   www.terraform.io    Infrastructure scripting, language for configurable and

                                           consistent deployments of Kubeflow and Kubernbetes

VSCode    code.visualstudio.com Integrated development environment(IDE)


On our journey to bring our code from our laptops to the cloud in this chapter, we will first describe some background on how TensorFlow works when running locally. We will then describe a wide array of software tools that will make it easier to run an end-to-end TensorFlow lab locally or in the cloud, such as notebooks, containers, and cluster managers. Finally, we will walk through a simple practical example of setting up a reproducible research environment, running local and distributed training, and recording our result. We will also examine how we might parallelize TensorFlow across multiple CPI GPU units within a machine (vertical scaling) and multiple machines in the cloud(horizontal scaling) to accelerate training. By the end of this chapter, we will be all ready to extend this laboratory framework to tackle implementing projects using various generative AI models.


First, let's start by diving more into the details of TensorFlow, the library we will use to develop models throughout the rest of this book. What problem does TensorFlow solve for neural network model development? What approaches does it use? How has it evolved over the years? To answer these questions, let us review some of the history behind deep neural network libraries that led to the development of TensorFlow.


Summary

 In this chapter, we discussed what generative modeling is, and how it fits into the landscape of more familiar machine learning methods. I used probability theory and Bayes' theorum to describe how these models approach prediction in an opposite mananer to discriminative learning.


We reviewed use cases for generative learning, both for specific kinds of data and general prediction tasks. Finally, we examined some of the specialized challenges that arise from building these models.


In the next chapter, we will begin our parctical implementation of these models by exploring how to set up a developement environment for TensorFlow 2.0 using Docker and Kubeflow.

Unique challenges of generative models

 Given the powerful applications that generative models have, what are the major challenges i n implementing them? As described, most of these models utilize complex data, requiring us to fit large models to capture all the nuances of their features and distribution. This has implications both for the number of examples that we must collect to adequately represent the kind of data we are trying to generate, and the computational resources needed to build the model. We will discuss techniques in Chapter 2, Setting frameworks and graphics processing units (GPIs).


A more subtle problem that comes from having complex data, and the fact that we are trying to generate data rather than a numerical label or value, is that our notion of model accuracy is much more complicated: we cannot simply calculate the distance to a single label or scores.


We will discuss in Chapter 5, Painting Pictures with Neural Networks Using VAEs, and Chapter 6, Image Generation with GANs, how deep generative models such as VAE and GAN algorithms take different approaches to determine whether a generated image is comparable to a real-world image. Finally, as mentioned, our models need to allow us to generate both large and diverse samples, and the various methods we will discuss take different approaches to control the diversity of data.


The rules of the game

 The preceding applications concern data types we can see, hear,  or read. However, generative models also have applications to generate rules. This is useful in a populat application of deep learning: using algorithms to play board games or Atari video games.

While these applications have traditionally used reinforcement learning (RL) techniques to train netwo4rks to employ the optirnal strategy in these games, new research has suggested using GANs to propose novel rules as part of the training process, or to generate synthetic data to prime the overall learning process. We will examine both applications in Chapter 12, Play Video Games with Generative AI: GAIL.

Sound composition

 Sound, like images or text, is a complex, high-dimensional kind of data. Music in particular has many complexities: it could involve on or serveral musicians, has a temporal structure, and can be divided into thematically related segments, All of these components are incorporated into models such as MuseGAN, as mentioned earlier, which uses GANs to generate these various components and synthesize them into realistic, yet synthetic, musical tracks. I will describe the implementation of MuseGAN and its variants in Chapter 11, Composing Music with Generative Models.

Fake news and chatbots

 Humans have always wanted to talk to machines; the first chatbot, ELIZA, was written at MIT in the 1960s and used a simple program to transform a user's input and generate a response, in the mode of a therapist who frequently responds in the form of a question.


More sophisticated models can generate entirely novel text, such as Google's BERT and GPT-2, which use a unit called a transformer, A transformer module in a neural network allow a network to propose a new word in the context of preceding words in a piece of text, emphasizing, those that are more relevant in transformer units into a powerful multi-dimensional encoding of natural language patterns and contextual significance. This approach can be used in document creation for natural language processing(NLP) tasks, or for chatbot dialogue systems(Figure 1.3).