The simplest neural network architecture - the perceptron - was inspired by biological research to understand the basis of mental processing in an attempt to represent the function of the brain with mathematical formulae. In this section we will cover some of this early research and how it inspired what is now the field of deep learning and generative AI.
2022년 2월 26일 토요일
3. Building Blocks of Deep Neural Networks
The wide range of generative AI models that we will implement in this book are all built on the foundation of advances over the last decade in deep learning and neural networks. While in practice we could implement these projects without reference to historical developements, it will give you a richer understanding of how and why these models work to retrace their underlying components. In this chapter, we will dive into this backgournd, showing you how generative AI models are built from the ground up, how smailer units are assembled into complex architectures, how the loss functions in these models are optimized, and some current theories as to why these models are so effective. Armed with this background knowledge, you should be able to understand in greater depth the reasoning behind the more advanced models and topics that start in Chapter 4, Teaching Networks to Generate Digits, of this book. Generally speaking, we can group the building blocks of neural network models into a number of choices regarding how the model is constructed and trained, which we will cover in this chapter:
Which neural network architecture to use:
- Perceptron
- Multilayer perceptron (MLP)/FEEDFORWARD
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Long Short-Term Memory Networks (LSTMs)
- Gated Recurrent Units (GRUs)
Which activation functions to use in the network:
- Linear
- Sigmoid
- Tanh
- ReLU
- PReLU
What optimization algorithm to use to tune the parameters of the network:
- Stochastic Gradient Descent (SGD)
- RMSProp
- AdaGrad
- ADAM
- AdaDelta
- Hessian-free optimization
How to initialize the parameters of the network:
- Random
- Xavier initialization
- He initalization
As you can appreciate, the products of these decisions can lead to a huge number of potential neural network variants, and one of the challenges of developing these models is determining the right search space witin each of these choices. In the course of describing the history of neural networks we will discuss the implications of each of these model parameters in more detail. Our overview of this field begins with the origin of the discipline: the humble perceptron model.
Summary
In this chapter, we have covered an overview of what TensorFlow is and how it serves and an improvement over earlier frameworks for deep learning research.
We also explored setting up an IDE, VSCode, and the foundation of reproducible applications, Docker containers. To orchestrate and deploy Docker containers, we discussed the Kubernetes framework, and how we can scale groups of containers using its API. Finally, I described Kubeflow, a maching learning framework built on Kubernetes which allows us to run end-to-end pipelines, distributed training. and parameter search, and serve trained models. We then set up a Kubeflow deployment using Terraform, an IaaS technology.
Before jumping into specific projects, we will enxt cover the basics of neural network theory and the TensorFlow and Keras commands that yuu will need to write basic training jobs on Kubeflow.
Using Kubeflow Katib to optimize model hyperparameters
Katib is a framework running multiple instances of the same job with differing inputs, such as in neural architecture search ( for determining the right number and size of layers in a neural network) and hyperparameter search (finding the right learning rate, for example, for an algorithm). Like the other Kustomize templates we have seen, the TensorFlow job specifies a generic TensorFlow job, with placeholders for the parameters:
apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
namespcae: kubeflow
name: tfjob-example
spec:
parallelTrialCount: 3
maxTrialCount: 12
maxFaildTrialCount: 3
objective:
type: maximize
goal: 0.99
objectiveMetricName: accuracy_1
algorithm:
glgorithmName: random
metricsCollectorSpec:
source:
fileSystemPath:
path: /train
kind: Directory
collector:
king: TensorFlowEvent
parameters:
-name: --learning_rate
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.05"
-name: --batch_size
parameterType: int
feasibleSpce:
min: "100"
max: "200"
trialTemplate:
goTemplate:
rowTemplate: | -
apiVersion: "kubeflow.ortg/v1"
kind: TFJob
metadata:
name: {{.Trial}}
namespcae: {{.NameSpcae}}
spec:
tfReplicas: 1
restartPolicy: OnFailure
template:
spec:
containers:
-name: tensorflow
image: gcr.io/kubeflow-ci/tf-manist-with-summaries:1.0
imagePullPolicy: Always
command:
-"python"
-"/var/tf_mnist/mnist_with_summaries.py"
-"--log_dir=/train/metrics"
{{- with .HyperParameters}}
{{- range .}}
- "{{.Name}}-{{.Value}}"
{{- end}}
{{- end}}
Which we can run using the familiar kubectl syntax:
kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alph3/tfjob-example.yaml
of though the UI
where you can see a visual of the outcome of these multi-parameter experiments, or a table.
Kubeflow pipelines
For notebook servers, we gave an example of a single container (this notebook instace) application. Kubeflow also gives us the ability to run multi-container application worksflows(such as input data, training, and deployment) using the piplines functionality. Pipelines are Python functions that follow a Domain Specific Language(DSL) to specify components that will be compiled into containers.
If we click piplies on the UI, we are brought to a dashboard
Selecting one on these pipelines, we can ses a visual overview of the component containers
After create a new run, we can specify parameters for a particular instace of this pipeline.
Once the pipeline is created, we can use the user interface to visualize the results.
Under the hood, the Python code to generate this pipline is compiled using the pipelines SDK. We could specify the components to come either from a container with Python code:
@kfp.dsl.componet
def my_component(my_pram):
...
return kfp.dsl.ContainerOp(
name='My componet name',
image='gcr.io/path/to/container/image'
)
or a function written in Python itself:
@kfp.dsl.python_component(
name='My awesome component',
description='Come and play',
)
def my_python_func(a: str, b: str) -> str:
For a pure Python function, we could turn this into an operation with the compiler:
my_op = compiler.build_python_component(
component_func=my_python_func,
staging_gcs_path=OUTPUT_DIR,
target_imge=TARGET_IMAGE)
We then use the dsl.pipeline decorator to add this operation to a pipeline:
@kfp.dsl.pipeline(
name='My pipeline',
description='My machine learning pipline'
)
def my_pipline(param_1: PipelineParam, param_2: PipelineParam):
my_step = my_op(a='a', b='b')
We compile it using the following code:
kfp.compiler.Compiler().compile(my_pipeline, 'my-pipeline.zip')
and run it with this code:
client = ktf.Client()
my_experiment = client.create_experiment(name='demo')
my_run=client.run_pipeline(my_experiment.id, 'my-pipelie', 'my-pipeline.zip')
We can also upload this ZIP file to the pipelines UI, where Kubeflow can use the generated YAML, from compilation to instantiate the job.
Now that you have seen the process for generating results for a single pipeline, our next problem is how to generate the optimal parameters for such a pipeline. As you will see in Chapter 3, Building Blocks of Deep Neural Networks, neural network models typically have a number of layers, layer size, and connectivity) and training paradigm (such as learning rate and optimizer algorithm). Kubeflow has a built-in utility for optimizing models for such parameter grids, called Katib.
Kubeflow notebook servers
We can use Kubeflow to start a Jupyter notebook server in a namespace, where we can run experimental code; we can start the notebook by clicking the Notebook Server tab in the user interface and selecting NEW SERVER
We can then specify parameters, such as which container to run(which could include the TensorFlow container we examined earlier in our discussion of Kocker), and how many resources to allocate.
You can also specify a Persistent Volumn(PV) to store data that remains even if the notebook server is turned off, and special resources such as GPUs.
Once started, if you have specified a container with TensorFlow resources, you cna begin running models in the notebook server.
A brief tour of Kubeflow's components
Now that we have installed Kubeflow locally or in the cloud, let us take a look aganin at the Kubeflow dashboard
Let's walk through what is available in this toolkit. First, notice in the upper pannel we have a dropdown with the name anonymous specified-this is the namepsce for Kubernetes referred to earlier. While our default is anonymous, we could create several namespaces on our Kubeflow instance to accommodate different users or projects. This can be done at login, where we set up a profile
Alternatively, as with other operations in Kubernetes, we can apply a namespace using a YAML file:
apiVersion: kubeflow.org/v1beta1
kind: Profile
metadata:
name: profileName
spec:
owner:
kind: User
name: userid@eamil.com
Using the kubectl command:
kubectl create -f profile.yaml
What can we do once we have a namespace? Let us look through the available tools.