페이지

2022년 2월 26일 토요일

From tissues to TLUs

 The recent popularity of AI algorithms might give the false impression that this field is new. Many recent models are based on discoveries made decades ago that have been reinvigorated by the massive computational resources available in the could and customized hardware for parallel matrix computations such as Graphical Processing Units(GPUs, Tensor Processing Units(TPUs), and Field Programmable Gate Array(FPGAs). If we consider research on neural networks to include their biological inspiration as will as computaitonal theory, this field is over a hundred years old. Indeed, one of the first neural networks described appears in the detaild anatomical illustrations of 19th Century scientist Santiago Ramon y Cajal, whose illustrations based on experimental observation of layers of interconnected neuranal cells inspired the Neuraon Doctrine - the idea that the brain is composed of individual, physically distinct and specialized cells, rather than a single continuous network. The distinct layers of the retina observed by Cajal were also the inspiration for particular neural network architectures such as the CNN, which we will discuss later in this chapter.

This observation of simple neuranal cells interconnected in large networks led computaional researchers to hypothesize how mental activity might bve represented by simple, logical operations that, combined, yield complex mental phenomena, The original "automata theory" is usually traced to a 1943 article by Warren McCulloch and Walter Pitts of the Massachusetts Institue of Technology. They described a simple model know as the Threshold Logic Unit(TLU), in which binary inputs are translated into a binary output based on a threshold:
where I is the input values, W is the weights with ranges from (0,1) or (-1,1), and f is a threshold function that converts these inputs into a binary output depending upon whether they exceed a threshold T.

f(x) = 1 if x > T, else 0

Visually and conceptually, there is some similarity between McCulloch and Pitts model and the biological neuron that inspired it. Their model integrates inputs into an output signal, just as the natural dendrites (short, input "arms" of the neuron that receive signals from other cells) of a neuraon synthesize inputs into a single output via the axon (this long "tail" of the cell, which passes signals received from the dendrites along to other neurons). We might imagine that, just as neuraonal cells are composed into networks to yield complex biological circuits, these simple units might be connected to simulate sophisticated decision processes.

Indeed, using this simple model, we can already start to represent several logical operations. If we consider a simple case of a neuron with one input, we can see that a TLU can solve an identity or negation function.

For an identity operation that simple returns the input as output, the weight matrix would have Is on the diagonal(or be simply the scalar 1, for a single numerical input, as illustrated in Table 1);


Similarly, for a negation operation, the weight matrix could be a negative identity matrix, with a threshold at 0 flipping the sign of the output from the input:


Given two inputs, a TLU could also represent operations such as AND and OR.

Here, a threshold could be set such that combined input values either have to exceed 2(to yield an output of 1)for an AND operation or 1(to yield an output of 1 if either of the two inputs are 1) in an OR operation.

However, a TLU cannot capture patterns such as Exclusive OR(XOR), which emits 1 if and only if the OR condition is true.


To see why this is true, consider a TLU with two inputs and positive weights of 1 for each unit. If the threshold value T is 1, then inputs of (0,0), (1,0), and (0,1) will yield the correct value. What happens with (1,1) though? Because the threshold function returns 1 for any inputs summing to greater than 1, it cannot represent XOP(Table 3.5), which would require a second threshold to compute a different output once a different, higher value is exceeded. Changing one or both of the weights to negative values won't help either; the problem is that the decision threshold operates only in one direction and can't be reversed for larger inputs.

Similarly, the TLU can't represent the negation of the Exclusive NOR, XNOR As with the XOR operation, the impossibility of the XNOR operation being represented by a TLU function can be illustrated by considering a weight matrix of two 1s; for two inputs (1,0) or (0,1), we obtain the correct value if we set a threshold of 2 for outputting 1. As with the XOR operation, we run into a problem with an input of (0,0), as we can't set a second threshold to output 1 at a sum of 0.






Perceptrons - a brain in a function

 The simplest neural network architecture - the perceptron - was inspired by biological research to understand the basis of mental processing in an attempt to represent the function of the brain with mathematical formulae. In this section we will cover some of this early research and how it inspired what is now the field of deep learning and generative AI.


3. Building Blocks of Deep Neural Networks

 The wide range of generative AI models that we will implement in this book are all built on the foundation of advances over the last decade in deep learning and neural networks. While in practice we could implement these projects without reference to historical developements, it will give you a richer understanding of how and why these models work to retrace their underlying components. In this chapter, we will dive into this backgournd, showing you how generative AI models are built from the ground up, how smailer units are assembled into complex architectures, how the loss functions in these models are optimized, and some current theories as to why these models are so effective. Armed with this background knowledge, you should be able to understand in greater depth the reasoning behind the more advanced models and topics that start in Chapter 4, Teaching Networks to Generate Digits, of this book. Generally speaking, we can group the building blocks of neural network models into a number of choices regarding how the model is constructed and trained, which we will cover in this chapter:


Which neural network architecture to use:

- Perceptron

- Multilayer perceptron (MLP)/FEEDFORWARD

- Convolutional Neural Networks (CNNs)

- Recurrent Neural Networks (RNNs)

- Long Short-Term Memory Networks (LSTMs)

- Gated Recurrent Units (GRUs)


Which activation functions to use in the network:

- Linear

- Sigmoid

- Tanh

- ReLU

- PReLU


What optimization algorithm to use to tune the parameters of the network:

- Stochastic Gradient Descent (SGD)

- RMSProp

- AdaGrad

- ADAM

- AdaDelta

- Hessian-free optimization


How to initialize the parameters of the network:

- Random

- Xavier initialization

- He initalization

As you can appreciate, the products of these decisions can lead to a huge number of potential neural network variants, and one of the challenges of developing these models is determining the right search space witin each of these choices. In the course of describing the history of neural networks we will discuss the implications of each of these model parameters in more detail. Our overview of this field begins with the origin of the discipline: the humble perceptron model.


Summary

 In this chapter, we have covered an overview of what TensorFlow is and how it serves and an improvement over earlier frameworks for deep learning research.

We also explored setting up an IDE, VSCode, and the foundation of reproducible applications, Docker containers. To orchestrate and deploy Docker containers, we discussed the Kubernetes framework, and how we can scale groups of containers using its API. Finally, I described Kubeflow, a maching learning framework built on Kubernetes which allows us to run end-to-end pipelines, distributed training. and parameter search, and serve trained models. We then set up a Kubeflow deployment using Terraform, an IaaS technology.

Before jumping into specific projects, we will enxt cover the basics of neural network theory and the TensorFlow and Keras commands that yuu will need to write basic training jobs on Kubeflow.


Using Kubeflow Katib to optimize model hyperparameters

 Katib is a framework running multiple instances of the same job with differing inputs, such as in neural architecture search ( for determining the right number and size of layers in a neural network) and hyperparameter search (finding the right learning rate, for example, for an algorithm). Like the other Kustomize templates we have seen, the TensorFlow job specifies a generic TensorFlow job, with placeholders for the parameters:


    apiVersion: "kubeflow.org/v1alpha3"

    kind: Experiment

    metadata:

        namespcae: kubeflow

        name: tfjob-example

    spec:

        parallelTrialCount: 3

        maxTrialCount: 12

        maxFaildTrialCount: 3

        objective:

            type: maximize

            goal: 0.99

            objectiveMetricName: accuracy_1

        algorithm:

            glgorithmName: random

        metricsCollectorSpec:

            source:

                fileSystemPath:

                    path: /train

                    kind: Directory

                collector:

                    king: TensorFlowEvent

            parameters:

                -name: --learning_rate

                parameterType: double

                feasibleSpace:

                    min: "0.01"

                    max: "0.05"

                -name: --batch_size

                parameterType: int

                feasibleSpce:

                    min: "100"

                    max: "200"

            trialTemplate:

                goTemplate:

                    rowTemplate: | -

                        apiVersion: "kubeflow.ortg/v1"

                        kind: TFJob

                        metadata:

                            name: {{.Trial}}

                            namespcae: {{.NameSpcae}}

                        spec:

                            tfReplicas: 1

                            restartPolicy: OnFailure

                            template:

                                spec:

                                    containers:

                                        -name: tensorflow

                                        image: gcr.io/kubeflow-ci/tf-manist-with-summaries:1.0

                                        imagePullPolicy: Always

                                        command:

                                            -"python"

                                            -"/var/tf_mnist/mnist_with_summaries.py"

                                            -"--log_dir=/train/metrics"

                                            {{- with .HyperParameters}}

                                            {{- range .}}

                                            - "{{.Name}}-{{.Value}}"

                                            {{- end}}

                                            {{- end}}

Which we can run using the familiar kubectl syntax:

kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alph3/tfjob-example.yaml


of though the UI

where you can see a visual of the outcome of these multi-parameter experiments, or a table.



Kubeflow pipelines

 For notebook servers, we gave an example of a single container (this notebook instace) application. Kubeflow also gives us the ability to run multi-container application worksflows(such as input data, training, and deployment) using the piplines functionality. Pipelines are Python functions that follow a Domain Specific Language(DSL) to specify components that will be compiled into containers.

If we click piplies on the UI, we are brought to a dashboard

Selecting one on these pipelines, we can ses a visual overview of the component containers


After create a new run, we can specify parameters for a particular instace of this  pipeline.

Once the pipeline is created, we can use the user interface to visualize the results.

Under the hood, the Python code to generate this pipline is compiled using the pipelines SDK. We could specify the components to come either from a container with Python code:


@kfp.dsl.componet

def my_component(my_pram):

    ...

    return kfp.dsl.ContainerOp(

        name='My componet name',

        image='gcr.io/path/to/container/image'

    )

    or a function written in Python itself:

    @kfp.dsl.python_component(

        name='My awesome component',

        description='Come and play',

    )

    def my_python_func(a: str, b: str) -> str:


For a pure Python function, we could turn this into an operation with the compiler:

my_op    =    compiler.build_python_component(

        component_func=my_python_func,

        staging_gcs_path=OUTPUT_DIR,

        target_imge=TARGET_IMAGE)


We then use the dsl.pipeline decorator to add this operation to a pipeline:

    @kfp.dsl.pipeline(

        name='My pipeline',

        description='My machine learning pipline'

    )

    def my_pipline(param_1: PipelineParam, param_2: PipelineParam):

        my_step = my_op(a='a', b='b')


We compile it using the following code:

    kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip')

and run it with this code:

    client = ktf.Client()

    my_experiment = client.create_experiment(name='demo')

    my_run=client.run_pipeline(my_experiment.id, 'my-pipelie', 'my-pipeline.zip')

We can also upload this ZIP file to the pipelines UI, where Kubeflow can use the generated YAML, from compilation to instantiate the job.

Now that you have seen the process for generating results for a single pipeline, our next problem is how to generate the optimal parameters for such a pipeline. As you will see in Chapter 3, Building Blocks of Deep Neural Networks, neural network models typically have a number of layers, layer size, and connectivity) and training paradigm (such as learning rate and optimizer algorithm). Kubeflow has a built-in utility for optimizing models for such parameter grids, called Katib.

Kubeflow notebook servers

 We can use Kubeflow to start a Jupyter notebook server in a namespace, where we can run experimental code; we can start the notebook by clicking the Notebook Server tab in the user interface and selecting NEW SERVER

We can then specify parameters, such as which container to run(which could include the TensorFlow container we examined earlier in our discussion of Kocker), and how many resources to allocate.


You can also specify a Persistent Volumn(PV) to store data that remains even if the notebook server is turned off, and special resources such as GPUs.

Once started, if you have specified a container with TensorFlow resources, you cna begin running models in the notebook server.