페이지

2022년 2월 26일 토요일

3. Building Blocks of Deep Neural Networks

 The wide range of generative AI models that we will implement in this book are all built on the foundation of advances over the last decade in deep learning and neural networks. While in practice we could implement these projects without reference to historical developements, it will give you a richer understanding of how and why these models work to retrace their underlying components. In this chapter, we will dive into this backgournd, showing you how generative AI models are built from the ground up, how smailer units are assembled into complex architectures, how the loss functions in these models are optimized, and some current theories as to why these models are so effective. Armed with this background knowledge, you should be able to understand in greater depth the reasoning behind the more advanced models and topics that start in Chapter 4, Teaching Networks to Generate Digits, of this book. Generally speaking, we can group the building blocks of neural network models into a number of choices regarding how the model is constructed and trained, which we will cover in this chapter:


Which neural network architecture to use:

- Perceptron

- Multilayer perceptron (MLP)/FEEDFORWARD

- Convolutional Neural Networks (CNNs)

- Recurrent Neural Networks (RNNs)

- Long Short-Term Memory Networks (LSTMs)

- Gated Recurrent Units (GRUs)


Which activation functions to use in the network:

- Linear

- Sigmoid

- Tanh

- ReLU

- PReLU


What optimization algorithm to use to tune the parameters of the network:

- Stochastic Gradient Descent (SGD)

- RMSProp

- AdaGrad

- ADAM

- AdaDelta

- Hessian-free optimization


How to initialize the parameters of the network:

- Random

- Xavier initialization

- He initalization

As you can appreciate, the products of these decisions can lead to a huge number of potential neural network variants, and one of the challenges of developing these models is determining the right search space witin each of these choices. In the course of describing the history of neural networks we will discuss the implications of each of these model parameters in more detail. Our overview of this field begins with the origin of the discipline: the humble perceptron model.


Summary

 In this chapter, we have covered an overview of what TensorFlow is and how it serves and an improvement over earlier frameworks for deep learning research.

We also explored setting up an IDE, VSCode, and the foundation of reproducible applications, Docker containers. To orchestrate and deploy Docker containers, we discussed the Kubernetes framework, and how we can scale groups of containers using its API. Finally, I described Kubeflow, a maching learning framework built on Kubernetes which allows us to run end-to-end pipelines, distributed training. and parameter search, and serve trained models. We then set up a Kubeflow deployment using Terraform, an IaaS technology.

Before jumping into specific projects, we will enxt cover the basics of neural network theory and the TensorFlow and Keras commands that yuu will need to write basic training jobs on Kubeflow.


Using Kubeflow Katib to optimize model hyperparameters

 Katib is a framework running multiple instances of the same job with differing inputs, such as in neural architecture search ( for determining the right number and size of layers in a neural network) and hyperparameter search (finding the right learning rate, for example, for an algorithm). Like the other Kustomize templates we have seen, the TensorFlow job specifies a generic TensorFlow job, with placeholders for the parameters:


    apiVersion: "kubeflow.org/v1alpha3"

    kind: Experiment

    metadata:

        namespcae: kubeflow

        name: tfjob-example

    spec:

        parallelTrialCount: 3

        maxTrialCount: 12

        maxFaildTrialCount: 3

        objective:

            type: maximize

            goal: 0.99

            objectiveMetricName: accuracy_1

        algorithm:

            glgorithmName: random

        metricsCollectorSpec:

            source:

                fileSystemPath:

                    path: /train

                    kind: Directory

                collector:

                    king: TensorFlowEvent

            parameters:

                -name: --learning_rate

                parameterType: double

                feasibleSpace:

                    min: "0.01"

                    max: "0.05"

                -name: --batch_size

                parameterType: int

                feasibleSpce:

                    min: "100"

                    max: "200"

            trialTemplate:

                goTemplate:

                    rowTemplate: | -

                        apiVersion: "kubeflow.ortg/v1"

                        kind: TFJob

                        metadata:

                            name: {{.Trial}}

                            namespcae: {{.NameSpcae}}

                        spec:

                            tfReplicas: 1

                            restartPolicy: OnFailure

                            template:

                                spec:

                                    containers:

                                        -name: tensorflow

                                        image: gcr.io/kubeflow-ci/tf-manist-with-summaries:1.0

                                        imagePullPolicy: Always

                                        command:

                                            -"python"

                                            -"/var/tf_mnist/mnist_with_summaries.py"

                                            -"--log_dir=/train/metrics"

                                            {{- with .HyperParameters}}

                                            {{- range .}}

                                            - "{{.Name}}-{{.Value}}"

                                            {{- end}}

                                            {{- end}}

Which we can run using the familiar kubectl syntax:

kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alph3/tfjob-example.yaml


of though the UI

where you can see a visual of the outcome of these multi-parameter experiments, or a table.



Kubeflow pipelines

 For notebook servers, we gave an example of a single container (this notebook instace) application. Kubeflow also gives us the ability to run multi-container application worksflows(such as input data, training, and deployment) using the piplines functionality. Pipelines are Python functions that follow a Domain Specific Language(DSL) to specify components that will be compiled into containers.

If we click piplies on the UI, we are brought to a dashboard

Selecting one on these pipelines, we can ses a visual overview of the component containers


After create a new run, we can specify parameters for a particular instace of this  pipeline.

Once the pipeline is created, we can use the user interface to visualize the results.

Under the hood, the Python code to generate this pipline is compiled using the pipelines SDK. We could specify the components to come either from a container with Python code:


@kfp.dsl.componet

def my_component(my_pram):

    ...

    return kfp.dsl.ContainerOp(

        name='My componet name',

        image='gcr.io/path/to/container/image'

    )

    or a function written in Python itself:

    @kfp.dsl.python_component(

        name='My awesome component',

        description='Come and play',

    )

    def my_python_func(a: str, b: str) -> str:


For a pure Python function, we could turn this into an operation with the compiler:

my_op    =    compiler.build_python_component(

        component_func=my_python_func,

        staging_gcs_path=OUTPUT_DIR,

        target_imge=TARGET_IMAGE)


We then use the dsl.pipeline decorator to add this operation to a pipeline:

    @kfp.dsl.pipeline(

        name='My pipeline',

        description='My machine learning pipline'

    )

    def my_pipline(param_1: PipelineParam, param_2: PipelineParam):

        my_step = my_op(a='a', b='b')


We compile it using the following code:

    kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip')

and run it with this code:

    client = ktf.Client()

    my_experiment = client.create_experiment(name='demo')

    my_run=client.run_pipeline(my_experiment.id, 'my-pipelie', 'my-pipeline.zip')

We can also upload this ZIP file to the pipelines UI, where Kubeflow can use the generated YAML, from compilation to instantiate the job.

Now that you have seen the process for generating results for a single pipeline, our next problem is how to generate the optimal parameters for such a pipeline. As you will see in Chapter 3, Building Blocks of Deep Neural Networks, neural network models typically have a number of layers, layer size, and connectivity) and training paradigm (such as learning rate and optimizer algorithm). Kubeflow has a built-in utility for optimizing models for such parameter grids, called Katib.

Kubeflow notebook servers

 We can use Kubeflow to start a Jupyter notebook server in a namespace, where we can run experimental code; we can start the notebook by clicking the Notebook Server tab in the user interface and selecting NEW SERVER

We can then specify parameters, such as which container to run(which could include the TensorFlow container we examined earlier in our discussion of Kocker), and how many resources to allocate.


You can also specify a Persistent Volumn(PV) to store data that remains even if the notebook server is turned off, and special resources such as GPUs.

Once started, if you have specified a container with TensorFlow resources, you cna begin running models in the notebook server.

A brief tour of Kubeflow's components

 Now that we have installed Kubeflow locally or in the cloud, let us take a look aganin at the Kubeflow dashboard

Let's walk through what is available in this toolkit. First, notice in the upper pannel we have a dropdown with the name anonymous specified-this is the namepsce for Kubernetes referred to earlier. While our default is anonymous, we could create several namespaces on our Kubeflow instance to accommodate different users or projects. This can be done at login, where we set up a profile

Alternatively, as with other operations in Kubernetes, we can apply a namespace using a YAML file:

apiVersion: kubeflow.org/v1beta1

kind: Profile

metadata:

    name: profileName

spec:

    owner:

        kind: User

        name: userid@eamil.com

Using the kubectl command:

kubectl create -f profile.yaml

What can we do once we have a namespace? Let us look through the available tools.

Installing Kubeflow using Terraform

 For each of these cloud providers, you'll probably notice that we have a common set of commands; creating a Kubenetes cluster, installing Kubeflow, and starting the application. While we can use scripts to automate this process, if would be desirable to, like our code, have a way to version control and persist different infrastructure configurations, allowing a reproducible recipe for creaating the set of resources we need to urn Kubeflow. If would also help us potentially move between cloud providers without completely rewriting our installation logic.

The template language Terraform(https://www.terraform.io/)was created by HashiCorp as a tool for Infrastructure as Service(IaaS). In the same way that Kubernetes has an API to update resources on a cluster, Terraform allows us to abstract interactions with different underlying cloud providers using an API and a template language using a command-line utility and core components written in GoLang(Figure 2.7). Terraform can be extended using user-written plugins.

Terraform Core  <---->     Providers

                       RPC       Provisioners                        Upstream APIs

                                      Plugins

                                    

                                    Client Library       

Let's look at one example of installing Kubeflow using Terraform instuctions on AWS, located at https://github.com/aws-samples/amazon-eks-machine-learning-with-terraform-and-kubeflow. Once you have established the required AWS resources and installed terraform on an EC2 container, the aws-eks-cluster-and-nodegroup. tf Terraform file is used to create the Kubeflow cluster using the command:

terraform apply

In this file are a few key components. One is variables that specify aspects of the deployment:

variable "efs_throughput_mode" {

    description = "EFS performance mode"

    default = "burstring"

    type = string

}

Another is specification for which cloud provider we are using:

provider "aws" {

    region    =    var.region

    shared_credentials_file    = var.credentials 

    resrouce "aws_eks_cluster"    "eks_cluster" {

        name    =    var.cluster_name

        role_arn    =    aws_iam_role.cluster.role.arn

        version     =    var.k8s_version


    vpc_config {

        security_group_ids    =    [aws_security_group.cluster_sg.id]

        subnet_ids    =    flatten([aws_subnet.subnet.*.id])

    }

    depends_on    = [

        aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,

        aws_iam_role_policy_attachment.cluster_AmazonKSServicePolicy,

    ]

    provisioner    "local-exec"       {

        command    =    "aws --region ${var.region} eks update-kubeconfig --name ${aws_eks_cluster.eks_cluster.name}"

    }

    provisioner    "local-exec"    {

        when    =    destroy

        command    =    "kubectl config unset current-context"

    }

}

    profile    =    var.profile

}

And another is resources such as the EKS cluster:

resource    "aws_eks_cluster"    "eks_cluster"{

    name    =    var.cluster_name

    role_arn    =    aws_iam_role.cluster_role.arn

    version    =    var.k8s_version


    vpc_config {

        security_group_ids    =    [aws_security_group.cluster_sg.id]

        subnet_ids    =    flatten([aws_subnet.subnet.*.id])

    }

    depends_on    =    [

        aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,

        aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,

    }

    provisioner    "local-exec"    {

        command    =    "aws --region ${var.region} eks update-kubeconfig --name ${aws_eks_cluster.eks_cluster.name}"

    }

    provisioner    "local-exec"    {

        when    =    destroy   

        command    =    "kubectl config unset current-context"

    }

}

Every time run the Terraform apply command, it walks through this to determine what resources to create, which underlying AWS services to call to create them, and with which set of configuration they should be provisioned. This provides a clean way to orchestrate complex installations such as Kubeflow in a versioned, extensible template language.

Now that we have successfully installed Kubeflow either locally or on a managed Kubernetes control plane in the cloud, let us take a look at what tools are abailable on the platform.