Limit(0)

2022년 2월 19일 토요일

Important Docker commands and syntax

To understand how Docker works, it is useful to walk through the template used for all Docker containers, a Dockerfile. As an example, we will use the TensorFlow container notebook example from the Kubeflow project(https://github.com/kubeflow/kubeflow/blob/master/components/example-notebook-servers/jupyter-tensorflow-full/cpu.Dockerfile).

This file is a set of instructions for h ow Docker should take a base operating environment, add dependencies, and execute a piece of software once it is packaged:

FROM public.ecr.aws/jlr09q0g6/notebook-servers/jupyter-tensorlflow:master-abf9ec48

# install - requirements.txt

COPY -- chown=jovyan:users requirements.txt /tmp/requirements.txt

RUN python3 -m pip install -r /tmp/requirements.txt --quiet --no-cache-dir \

&& rm -f /tmp/requirements.txt

While the exact commands will differ between containers, this will give you a flavor for the way we can use containers to manage an application - in this case running a Jupyternotebook for interactive machine learning experimentation useing a consistent set of libraries. Once we have installed the Docker runtime for our particular operating system, we would execute such a file by running:

Docker build -f <Dockerfilename> -t <image name:tag>

When we do this, a number of things happen. First, we retrieve the base filesystem, or image, from a remote repository, which is not unlike the way we collect JAR files from Artifactory when using Java build tools such as Gradle or Maven, or Python's pip installer. With this filesystem or image, we then set required variables for the Docker build command such as the username and TensorFlow version, and runtime environment variables for the container. We determine what shell program will be used to run the command, then we install dependencies we will need to run TensorFlow and the notebook application, and we specify the command that is run when the Docker container is started. Then we save this snapshot with an identifier composed of a base image name and one or more tags (such as version numbers, or, in many cases, simply a timestamp to uniquely identify this image). Finally, to actually start the notebook server running this container, we would issue the command:

Docker run <image name:tag>

By default, Docker will run the executable command in the Dockerfile file; in our present example, that is the command to start the notebook server. However, this does not hazve to be the case; we could have a DockerFile that simply builds an execution envirnment for an application, and issue a command to run within that environment. In that case, the command would look like:

Docker run <image name:tag> <command>

Docker push <image name:tag>

Note that the iamge name can contain a reference to a particular registry, such as a local registry or one hosted on one of the major cloud providers such as Elastic Container Server(ECS) on AWS, Azure Kubernetes Service(AKS), or Google Container Registry. Publishing to a remote registry allows developers to share images, and us to make containers accessible to deploy in the cloud.

Docker: A lightweight virtualization solution

A consistent challenge in developing robust software applications is to make them run the same on a machine different than the one on which they are developed. These differences in environments could encompass a number of variables: operating systems, programming language library versions, and hardware such as CPU models.

Traditionally, one approach to dealing with this heterogeneity has been to use Virtual Machine(VM). While VMs are useful to run applications on diverse hardware and operating systems, they are also limited by being resource-intensive(Figure 2.3): each VM running on a host requires the overhead resources to run a completely separate operating system, along with all the applications of dependencies within the guest system.

However, in some cases this is an unnecessary level of overhead; we do not necessarily need to run an entirely separate operating system, rather than just a consistent environment, including libraries and dependencies within a single operating system. This need for a lightweight framework to specify runtime envirnments prompted the creation of the Docker project for containerization in 2013. In essence, a container is an environment for running an application, including all dependencies and libraries, allowing reproducible deployment of web applications and other programs, such as a database or the computations in a machine learning pipeline. For our use case, we will use it to provide a reproducible Python execution environment (Python language version and libraries) to run the steps in our generative machine learning pipelines.

We will need to have Docker installed for many of the examples that will appear in the rest of this chapter and the projects in this book. For instructions on how to install Docker for your particular operating system, please refer to the directions at (https://docs.docker.com/ install). To verify that you have installed the applications successfully, you should b able to run the following command on your terminal, which will print the available options:

docker run hello-world

VSCode

Visual Studio Code(VSCode) is an open-source code editor developed by Microsoft Corporation which can be used with many programming languages, including Python. It allows debugging adn is integrated with version control tools such as Git; we can even run Jupyter notebooks (which we will describe later in this chapter) within VSCode. Instructions for installation very by whether you are using a Linux, macOS, or Windows operating system: please wee individual instructions at https://code.visualstudio.com for your system, Once installed, we need to clone a copy of the source code for the projects in this book using Git, with the command:

git clone git@github.com:PackPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2.git

This command will copy the source code for the projects in this book to our laptop, allowing us to locally run and modify the code. Once you have the code copied, open the GitHub repository for this book using VSCode(Figure 2.1).We are now ready to start installing some of the tools we will need; open the file install.sh

One feature that will be of particular use to us is the fact that VSCode has an integrated(Figure 2.2) terminal where we can run commands: you can access this by selecting View, then Terminal from the drop-down list, which will open a command-line prompt:

Select the TERMINAL. tab, and bash for the interpreter; you should now be able to enter normal commands. Change the directory to Chapter_2, where we will run our installation script, which you can open in VSCode.

The installation script we will run will download and install t he various components we will need in our end-to-end TensorFlow lab; the overarching framework we will use for these experiments will be the Cubeflow library, which handles the various data and training pipeliens that we will utilize for our projects in the later chapters of this volume. In the rest of this chapter, we will describe how Kubeflow is built on Docker and Kubernetes, and how to set up Kuberflow on serveral popular cloud providers.

Kubernetes, the technology which Kuberflow is based on, is fundamentally a way to manage containerized applications created using Docker, which allows for reproducible, lightweight execution environments to be created and presisted for a variety of applications. While we will make use of Docker for creating reproducible experimental runtimes, to understand its place in the overall landscape of virtualization solutions(and why it has become so important to modern application development), let us take a detour to describe the background of Docker in more detail.

2022년 2월 18일 금요일

TensorFlow 2.0

While representing operations in the dataflow graph as primitives allows flexibility in defining new layers within the Pyuthon client API, it also can result in a lot of "boilerplate" code and repetitive syntax. For this reason, the high-level API Keras was developed to provide a high-level abstration; layer are represented using Python classes, while a particular runtime environment (such as TensorFlow operators can have different underlying implementations on CPUs, GPUs, or TPUs.

While developed as a framework-agnostic library, Keras has been included as part of TensorFlow's main release in version 2.0. For the purposes of readability, we will implement most of our models in this book in Keras, while revertingh to the underlying TensorFlow 2.0 code where it is necessary to implement paticular operations or highlight the underlying logic. Please see Table2.3 for a comparsion between how various neural network algorithm concepts are implement at a low(TensorFlow) of high (Keras) level in theses libraries.

Object TensorFlow implementation Keras implementation

Neural network layer Tensor computation Python layer classes

Gradient calculation Graph runtime operator Python optimizer class

Loss function Tensor computation Python loss function

Neural network model Graph runtime session Python model class instance

To show you the difference between the abstraction that Keras makes versus TensorFlow 1.0 in implementing basic neural network models, let's look at an example of writing, a convolutional layer (see Chapter 3, Building Blocks of Deep Neural Networks) using both of these frameworks. In the first case, in TensorFlow 1.0, you can see that a lot of the code involves explicitly specifying variables, functions, and matrix operations, along with the gradient function and runtime session to compute the updates to the networks.

This is multilayer perceptron in TensorFlow 1.0

X = tf.placeholder(dtype=tf.float64)

Y = tf.placeholder(dtype=tf.float64)

num_hidden=128

# Build a hidden layer

w_hidden = tf.Variable(np.random.randn(784, num_hidden))

b_hidden = tf.Variable(np.random.randn(num_hidden))

p_hidden = tf.nn.sigmoid( tf.add(tf.matmul(X, W_hidden), b_hidden))

# Build another hidden layer

w_hidden2 = tf.Variable(np.random.radn(num_hidden, num_hidden))

b_hidden2 = tf.Variable(np.random.radn(num_hidden))

p_hidden2 = tf.nn.sigmoid( tf.add(tf.matmul(p_hidden, w_hjidden2), b_hidden2) )

# Build the output layer

w_output = tf.Variable(np.random.radn(num_hidden, 10))

b_output = tf.Variable(np.random.randn(10))

p_output = tf.nn.softmax( tf.add(tf.matmul(p_hidden2, w_output), b_output))

loss = tf.reduce_mean(tf.losses.mean_squared_error(labels=Y, predictions=p_output))

accuracy=1-tf.sqrt(loss)

feed_dict = {

X: x_train.reshape(-1, 784),

Y: pd.get_dummies(y_train)

}

with tf.Session() as session:

session.run(tf.global_variables.initializer())

for step in range(10000):

J_value = session.run(loss, feed_dict)

acc = session.run(accuracy, feed_dict)

if step % 100 == 0:

print("Step:", step, " Loss:", J_value, " Accuracy:", acc)

session.run(minization_op, feed_dict)

pred00 = session.run([p_output], feed_dict={X: x_test, reshape(-1, 784)}}

In contrast, the implementation of the same convolutional layer in Keras is vastly simplified through the use of abstract concepts embodied in Python classes, such as layers, models, and optimizers. Underlying details of the computation are encapsulated in these classes, making the logic of the code more readable.

Note also that in TensorFlow 2.0 the notion of running sessions (lazy execution, in which the network is only computed if explicitly compiled and called) has been dropped in favor of eager execution, in which the session and graph are called dynamically when network functions such as call and compile are executed, with the network behaving like any other Python class without explicity creating a session sceop. The notion of a global namespace in which variables are declared with tf.Variable() has also been replaced with a default garbage collection mechanism.

This is multilayer perceptron layer in Keras.

import TensorFlow as tf

from TensorFlow.keras.layers import Input, Dense

from keras.models import Model

l = tf.keras.layers

model = tf.keras.Sequential([

l.Flatten(input_shape=(784,)),

l.Dense(128, activation='relu'),

l.Dense(10, activation='softmax')

])

model.comile(loss='categorical_crossentropy',

optimizer='adam',

metrics = ['accuracy'])

model.summary()

model.fit(x_train.reshape(-1,784),pd.get_dummies(y_train), nb_epoch=15, batch_size=128,verbose=1)

Now that we have coverd some of the details of what the TensorFlow library is and why it is well-suited to the development of deep neural network models (including the generative models we will implement in this book), let's get started building up our research environment, While we could simply use a Python package manager such as pip to install TensorFlow ono our laptop, we want to make sure our process is as robust and reproducible as possible-this weill make it easier to package our code to run on different machines, or keep our computations consistent by specifying the exact verstions of each Python library we use in an experiment. We will start by installing an Integrated Development Environment(IDE) that will make will make our research easier - VSCode.

Deep neural network development and TensorFlow

As we well see in Chapter 3, Building Blocks of Deep Neural Networks, a deep neural network in essence consists of matrix operations(addition, subtraction, multiplication), nonlinear transformations, and gradient-based updates computed by using the derivatives of theses components.

In the world of academia, researchers have historically often used efficient prototyping tools such as MATLAB' to run models and prepare analyses. While this approach allows for rapid experimentation, it lacks elements of industrial software development, such as object-oriented(oo) development, that allow for reproducibility and clean software abstractions that allow tools to be adopted by large organizations. These tools also had difficulty scaling to large dataset and could carry heavy licensing fees for such industrial use cases. However, prior to 2006, this type of computational tooling was largely sufficient for most use cases.

However, as the datasets being tackled with deep neural network algorithms grew, groundbreaking results were achieved such as:

- Image classification on the ImageNet dataset

- Large-scale unsupervised discovery of image patterns in YouTube videos

- The creation of artificial agents capable of playing Atari video games and the Asian board game GO with human-like skill

- State-of-the-art language translation via the BERT model developed by Google

The model developed in these studies exploeded in complecity along with the size of the datasets they were applied to (see Table 2.2 to get a sense of the immense scale of some of these models). As industrial use case required robust and scalable frameworks to develop and deploy new neural networks, serveral academic groups and large technology companies invested in the development of generic toolkits for the implementation of deep learning models. These software libraries codified common patterns into reusable abstrations, allowing even complex models to be oftern embodied in relatively simple expreimental scripts.

Model Name Year #Parameters

AlexNet 2012 61M

YouTube CNN 2012 1B

Inception 2014 5M

VCG-16 2014 138M

BERT 2018 340M

GPT-3 2020 175B

Some of early examples of these frameworks include Theano, a Python package developed at the University of Montreal, and Torch, a library written in the Lua language that was later ported to Python by researchers at Facebook, and TensorFlow, a C++ runtime with Python binding developed by Google.

In this book, we will primarily use TensorFlow 2.0, due to its widespread adoption and its convenient high-level interface, Keras, which abstracts much of the repetitive plumbing of defining runtime layers and model architecture.

TensorFlow is an open-source verion of an internal tool developed at Google called DisBelief. The DisBelief framework consisted of distributted workers(independent computaional processes running on a cluster of machines) that would compute forward and backward gradient descent passes on a network(a common way to train neural networks we will discuss in Chapter 3, Building Blocks of Deep Neural Networks), and send the results to a Parameter Server that aggregated the updates. The neural networks in the DisBelief framework were represented as a Directed Acyclic Graph(DAG), termination in a loss function that yielded a scalar(numerical value) comparing the network predictions with the observed target(such as image class of the probability distribution over a vocabulary representing the most probable next word in a sentence in a traslation model).

A DAG is a software data structure consisting of nodes(operations) and data (edges) where information only flows ina single direction along, the edges (thus directed)And where are no loops(hence acyclic).

While DisBelief allowed Google to productionize serveral large models, it had limitations:

- First, the Python scriping interfface was developed with a set of pre-defined layers corresponding to underlying implementations in C++; adding novel layer types required coding, in C++, which represented a barrier to productivity.

- Secondly, while the system was well adapted for training feed-forward networks using, basic Stochastic Gradient Descent(SGD) (an algorithm we will describe in more detail in Chater 3, Building Blocks of Deep Neural Networks) on Large-scale data, it lacked flexibliity for accommodation recurrent, reinforcement learning, or adverarial learning paradigms - the latter of which is crucial to many of the algorithms we will implement in this book.

- Finallyu, this system was difficult to scale down - to run the same job, for example, on a desktop with GOUs as well as a distributed environment with multiple cores per machine, and deployment also required a different technical stack.

Jointly, these considerations prompted the development of TensorFlow as a generic deep learning computational framework: one that could allow scientists to flexibly experiment with new layer architectures or cutting-edge training paradigms, while also allowing this experimentation to be run with the same tools on botha laptop (for early-stage work) and a computing cluster (to scale up more mature models). while also easing the transition between research and development code by providing a common runtime for both.

Though both libraries share the concept of the computation graph (networks represented as a graph of operations (nodes)and data (edges)) and a dataflow programming model (where matrix operations pass through the directed edges of a graph and have operations applied to them), TensorFlow, unlike DistBelief, was designed with the edges of the graph being tensors (n-dimensional matrices) and nodes of the graph being atomic operations (addition, subtraction, nonlinear operations - this allows for much greater flexibility in defining new computations and even allowing for mutation and stateful updates (these being simple additional nodes in the graph).

The dataflow graph in essence servers as a "placeholder" where data is slotted into defined variables and can be executed on single or multiple machine. TensorFlow optimizes the constructed dataflow graph in the C++ runtime upon execution, allowing optimization, for example, in issuing commands to the GPU, The different computations of the graph can also be executed across multiple machines and hardware, including, CPUs, GPUs, and TPUs (custom tensor processing chips developed by Google and available in the Google Cloud computing environment), as the same computations described at a high level in TensorFlow are implemented to execute on multiple backend system.

Because the dataflow graph allows mutable state, in essence, there is also no longer a centralized parameter server as was the case for DisBelief (though TensorFlow can also be run in a distributed manner with a parameter server configuration), since different nodes that hold state can execute the same opertions as any other worker nodes. Further, countrol flow operations such as loops allow for the training of variable-length inputs such as in recurrent networks (see Chapter 3, Building Blocks of Deep Neural Networks). In the context of training neural networks, the gradients of each layer are simply represented as additional operations in the graph, allowing optimizations such as velocity (as in the RMSProp or ADAM optimizers, described in Chapter 3, Building Blocks of Deep Neural Networks) to be included using the same framework rather than modifyingg the parameter server logic. In the context of distributed training, TensorFlow also has several checkpointing and redundancy mechanisms("backup" workers in case of a single task failure) that make it suited to robust training in distributed environments.

2022년 2월 13일 일요일

2. Setting Up a TensorFlow Lab

Now that you have seen all the amazing applications of generative models in Chapter1, An Introduction to Generative AI: "Drawing" Data from Models, you might be wondering how to get started with implementing these projects that use these kinds of algorithms. In this chapter, we will walk through a number of tools that we will use throughout the rest of the book to implement the deep neural networks that are used in various generative AI models. Our primary tool is the TensorFlow 2.0 additional resources to make the implementation process easier(summarized in Table 2.1).

We can broadly categorize these tools:

- Resources for replicabel dependency management(Docker, Anaconda)

- Exploatory tools for data munging and algorith hacking (Jupyter)

- Utilities to deploy these resource to the cloud and manage their lifecydle(Kubernetes, Kubeflow, Terraform)

Tool Proejct site use

Docker www.docker.com Application runtime dependency encapuslation

Anaconda www.anaconda.com Python language package management

Jupyter jupyter.org Interactive Python runtime and plotting

data exploration tool

Kubernetes kubernetes.io Docker container orchestration and reousrce management

Kuberflow www.kubeflow.org Machine learning workflow engine developed on

kubernetes

Terraform www.terraform.io Infrastructure scripting, language for configurable and

consistent deployments of Kubeflow and Kubernbetes

VSCode code.visualstudio.com Integrated development environment(IDE)

On our journey to bring our code from our laptops to the cloud in this chapter, we will first describe some background on how TensorFlow works when running locally. We will then describe a wide array of software tools that will make it easier to run an end-to-end TensorFlow lab locally or in the cloud, such as notebooks, containers, and cluster managers. Finally, we will walk through a simple practical example of setting up a reproducible research environment, running local and distributed training, and recording our result. We will also examine how we might parallelize TensorFlow across multiple CPI GPU units within a machine (vertical scaling) and multiple machines in the cloud(horizontal scaling) to accelerate training. By the end of this chapter, we will be all ready to extend this laboratory framework to tackle implementing projects using various generative AI models.

First, let's start by diving more into the details of TensorFlow, the library we will use to develop models throughout the rest of this book. What problem does TensorFlow solve for neural network model development? What approaches does it use? How has it evolved over the years? To answer these questions, let us review some of the history behind deep neural network libraries that led to the development of TensorFlow.