페이지

2022년 6월 1일 수요일

2.2 Optimization Method

 Now let's summarize the preceding solution: we need to find the optimal parameters w and b, so that the input and output meet a linear relationship y = wx + b, i ∈ [1,n]. However, due to the existence of observation errors e, it is necessary to sample a data set D = {(x(1),y(1)),(x(2),y(2)), x(3),y(3)...,(x(n),y(n))}, composed of a sufficient number of data samples, to find an optimal set of parameters w and b to minimize the mean squared error L = 1/n(wx(i) + b - y(i))2.

For a single-input neuron model, only two samples are needed to obtain the exact solution of the equations by the elimination method. This exact solution derived by a strict formula is called an analytical solution. However, in the case of multiple data points (n 2), there is probably no analytical solution. We can only use numerical optimization methods to obtain an approximate nuimerical solution. Why is it called optimization? This is because the computer's calculation speed is very fast. We can use the powerful computing power to "search" and "try" multiple times, thereby reducing the error L step by step. The simplest optimization method is brute-force search or random experiment. For example, to find the most suitable w and b, we can randomly sample any w and b from the real number space and calculate the error value L of the corresponding model. Pick out the semallest error L from all the experiments {L}, and its corresponding, w and b are the optimal parameters we are looking for.

This brute-force algorithm is simple and straightforward, but it is extremely inefficient for large-scale, high-dimensional optimization problems. Gradient descent is the most commonly used optimization algorithm in neural network training. With the parallel acceleration capability of powerful graphics processing unit(GPU) chips, it is very suitable for optimizing neural network models with massive data.

Naturaaly it is also suitable for optimizing our simple linear neuron model. Since the gradient descent algorithm is the core algorithm of deep learning, we will first apply the gradient descent algorithm to solve simple nueuron models and then detail ists application in neural network in Chapter 7.

With the concept of derivative, if we want to solve the maximum and minimum values of a function, we can simply set the derivative function to be 0 and find the corresponding independent variable a values, that is, the stagnation point, and then check the stagnation type. Taking the function f(x) = x2.sin(x) as an example, we can plot the functjion and its derivative in the interval x  |-10, 10|, where the blue solid line is f(x) and the yellow dotted line is df(x)/dx as shown in Figure 2-5. It can be seen that the points where the derivative (dashed line) is 0 are the stagnation points, and both the maximum and minimum values of f(x) appear in the stagnation points.

The gradient of a function is defined as a vector of partial derivatives of the function on each independent variable. Considering a three-dimensional function z = f(x,y), the partial derivative of the function with respect to the independent variable x is dz/dx, the partial derivative of the function with respect to the independent variable y is recorded as dz/dy, and the gradient f is a vector (dz/dx, dz/dy). Let's look at a specific function f(x,y) = -(cos2x + cos2y)2. As shown in Figure 2-6, the length of the red arrow in the plane represents the modulus of the gradient vector, and the direction of the arrow represents the direction of the gradient vector. It can be seen that the direction of the arrow always points to the function value increasing direction. The steeper the function surface, the longer the length of the arrow, and the larger the modulus of the gradient.

Through the preceding example, we can intuitively feel that the gradient direction of th efunction always points to the direction in which the function value increases. Then the opposite direction of the gradient should point to the direction in which the function value decreases.


To take advantage of this property, we just need to follow the preceding equation to iteratively update x. Then we can get smaller and smaller function values. n is used to scale the gradient vector, which is known as learning rate and generally set to a smaller value, such as 0.01 or 0.001. In particular, for one-dimensional functions, the preceding vector form can be written into a scalar form:

x' = x -n.dy/dx

By iterating and updating x several times through the preceding formula, the function value y' at x' is always more likely to be smaller than the function value at x.

The method of optimizing parameters by the formula(2.1) is called the gradient descent algorithm. It calculates the gradient f of the function f and iteratively updates the parameters to obtain the optimal numberical solution of the parameters when the function f reaches its minimum value. It should be noted that model input in deep learning is generally represented as x and the parameters to be optimized are generally represented by 0,w, and b.

Now we will apply the gradient descent algorithm to calculate the optimal parameters w' and b in the beginning of this session. Here the mean squared error function is minimized:

The model parameters that need to be optimized are w and b, so we update them iteratively using the following equations:


x

2022년 5월 28일 토요일

2.1 Neuraon Model

 An adult brain contains about 100 billion neuraons. Each neuraon obtains input signals through dendrites and transmits output signals through axons. The neurons are interconnected to form a huge neural network, thus forming the human brain, the basis of perception and consciousness. Figure 2-1 is a typical biological neuron structure. In 1943, the psychologist Warren McCulloch and mathematical logician Walter Pitts proposed a mathematical model of artificial neural networks to simulate the mechanism of biological neuraons. This research was further developed by the American neurologist Frank Rosenblatt into the perceptron model, which is also the cornerstone of modern deep learning.

Starting from the structure of biological neurons, we will revisit the exploration of scientific pioneers and gradually unveil the mystery of automatic learning machines.

First, we can abstract the neuron model into the mathematical structure as shown in Figure 2-2. The neuron input vector x = [x1, x2, x3,...xn]T maps to y through function f:x->y, where θ represents the parameters in the function f. Consider a simplified case, such as linear transformation: f(x) = wtx + b. The expanded form is

f(x) = w1x1 + w2x2 +.... +wnxn +b

The preceding calculation logic can be intuitively shown in Figure 2-2.

The parameters θ = {w1, w2, w3,...,wn,b} determine the state of the neuron, and the processing logic of this neuron can be determined by fixing those parameters. When the number of input nodes n = 1 (single input), the neuron model can be further simplified as 

y = ws +b

 Then we can plot the change of y as a function of x as shown in Figure 2-3. As the input signal x increases, the output also increases linearly. Here parameter w can be understood as the slope of the straight line, and b is the bias of the straight line.

For a certain neuron, the mapping relationship f between x and y is unknown but fixed. Two pints can determine a straight line. In order to estimate the value of w and b, we only need to sample any two data points(x(1), y(1)) and (x(2), y(2)) from the straight line in Figure 2-3, where the superscript indicates the data point number:

y(1) = wx(1) +b

y(2) = wx(2) + b

If(x(1), y(1))  (x(2), y(2)), we can solve the preceding equations to get the value of w and b. Let's consider a specific example: x(1) = 1, y(1) = 1.567, x(2) = 2, y(2) = 3.043. Substituting the numbers in the preceding formulas gives

1.567 = w.1 + b

3.043 = w.2 + b

This is the system of binary linear equations that we learned in junior or high school. The analytical solution can be easily calculated using the elimination method, that is, w = 1.477, b=0.089.

You can see that we only need two different data points to perfectly solve the parameters of a single-input lineary neuron model. For linear neuron models with N input, we only need to sample N + 1 different data points. It seems thjat the linear neuron models can be perfectly resolved. So what's wrong with the preceding method? Considering that there may be observation errors for any sampling point, we assume that the observation error variable e follows a normal distribution N(μσ2) with μ as mean and σ2 as variance. Then the samples follow:

y = wx + b + e, e - N(μσ2)

Once the observation error is introduced, event if it is as simple as a linear model, if only two data  ppoints are smapled, it may bring a large estimation bias. As shown in Figure 2-4, the data points all have observation errors. IF the estimatino is based on the two blue rectangular data points, the estimatied blue dotted line woould have a large deviation from the true orange straight line. In order to reduce the estimation bias introduced by observation errors, we can sample multiple data points D = {(x(1), y(1)), (x(2),y(2)), (x(3),y(3))...,(x(n),y(n))} and then find a "best" straight line, so that it minimizes the sum of errors between all sampling points and the straight line.

Due to the existence of observation errors, there may not be a straight line that perfectly passes through all the sampling points D. Therefore, we hope to find a "good" straight line close to all sampling points. How to measure "good" and "bad"? A natural idea is to use the mean squared error (MSE) between the predicted vaslue wx(i) + b and the true value y(i) at all sampling points as the total error, that is

Then search a set of parameters w and b to minimize the total error L. The straight line corresponding to the minimal total error is the optimal straight line we are looking for, that is

Here n represents the number of sampling points.


2022년 5월 24일 화요일

CHATER 2 Regression

 Some people worry that artificaial intelligence will make us feel inferior, but then, anybody in his right mind should have an inferiority complex every time he looks at a flower. -Alan Kay

1.6.4 Common Editor Installation

 There are many ways to write programs in Python. You can use IPython or Jupyter Notebook to write code interactively. You can also use Sublime Text, PyCharm, and VS Code to develop medium and large projects. This book recommends using PyCharm to write and debug code and using VSCode for interactive project development. Both of them are free. Users can download and install them by themselves.

Next, let's start the deep learning journey!

1.6.3 TensorFlow Installation

 TensorFlow, like other Python libraries, can be installed using the Python package management tool "pip install" command. When installing TensorFlow, you need to determine whether to install a more powerful GPU version or a general-performance CPU version based on whether your omputer has an NVIDA GPU graphics card

# Install numpy

pip install numpy

With the preceding command, you should be able to automatically download and install the numpy library. Now let's install the latest GPU verison of TensorFlow. The command is as follows:

# Install TensorFlow GPU version

pip install -U tensorflow

The preceding command should automatically download and install the TensorFlow GPU version, which is currently the official version of TensorFlow 2.x. The "-U" parameter secifies that if this package is installed, the upgrade command is executed.

Now let's test whether the GPU version of TensorFlow is successfully installed. Enter "ipython" on the "cmd" command line to enter the ipython interactive terminal, and thenm enter the "import tensorflow as tf" command. If no errors occur, continue to enter "tf.test.is_gpu_available()" to test whether the GPU is available. This command will print a series of information. The information beginning with "I"(Information) contains information about the available GPU graphics devices and will return "True" or "False" at the end, indicating whether the GPU device is available, as shown in Figure 1-35. If True, the TensorFlow GPU version is successfully installed; if False, the installation fails.

You may need to check the steps of CUDA, cuDNN, and environment variable configuration again or copy the error and seek help from the search engine.

If you don't have GPU, you can install the CPU version. The CPU version cannot use the GPU to accelerate calculations, and the conputational seed is relatively slow. However, because the models introduced as learning purposes in this book are generally not omputationally expensive, the CPU version can also be used. If it also possible to add the NVIDA GPU device after having better understanding of deep learning in the future. If the installation of the TensorFlow GPU version fails, we can also use the CPU version directly. The command to install the CPU version is

# Install TensorFlow CPU version

pip install -U tensorflow-cpu

After installation, enter the "import tensorflow as tf" command in the ipython terminal to verify that the CPU version is successfully installed. Afeter TensorFlow is installed, you can view the version number through "tf._version_". Figure 1-36 shows an example. Note that even the code works for all TensorFlow 2.x versions.

The preceding manaual process of installing CUDA and cuDNN, configuring the Path environment variable, and installing TensorFlow is the standard installation method. Although the steps are tedious, it is of great help to understand the functional role of each library. In fact, for the novice, you can complete the preceding steps by two commands as follows:

# Create virtual environment tf2 with tensorflow-gpu setup required

# to automatically install CUDA, cuDNN, and TensorFlow GPU

conda create -n tf2 tensorflow-gpu

#Activate tf2 environment

conda activate tf2

This quick installation method is called the minimal installation method. This is also the convenience of using the Anaconda distribution.

TensorFlow installed though the minimal version requires activation of the corresponding vertual environment before use, which needs to be distinguished from the standard version. The standard version is installed in Anaconda's default environment base and generally does not require manual activation of the base environment.

Common Python libraries can also be installed by default. The command is as follows:

# Install common python libraries

pip install -U ipython numpy matplotilib pillow pandas

When TensorFlow is running, it will consume all GPU resources by default, which is very computationally unfiendly, especially when the computer has multiple users or programs using GPU resources at the same time. Occuping all GOU resources will make other programs unable to run. Therefore, it is generally recommeded to set the GPU memory usage of TensorFlow to the growth mode, that is, to apply for GPU memory resources based on the actual model size. The code implementation is as follows:

# Set GPU resource usage method

# Get GPU device list

gpus = tf.config.experimental.list_physical_devices('GPU')

if gpus:

    try:

        # Set GPU usage to growth mode

        for gpu in gpus:

            tf.config.experimental.set_memory_growth(gpu, True)

    except RuntimeError as e:

        # print error

        print(e)

2022년 5월 23일 월요일

1.6.2 CUDA Installation

 Most of the current deep learning frameworks are based on NVIDIA's GPU graphics card for accelerated calculations, so you need to install the GPU acceleration library CUDA provided by NVIDIA. Before installing CUDA, make suer your computer has an NVIDIA graphics device that supports the CUDA program. If you computer does not have an NVIDIA grahics card-for example, some computer graphics card manufactures are AMD or Intel - the CUDA program won't work, and you can skip this step and directly install the TensorFlow CPU version.

The installation of CUDA is divided into three steps: CUDA software installation, cuDNN deep neural network acceleration library installation, and environment variable configuration. The installation process is a bit tedious. We will go through them step by step using the Windows 10 system as an example.

CUDA Sotfware Installation Open the official downloading website of the CUDA program: https://developer.nvidia.com/duca-10.0-download-archive. Here we use CUDA 10.0 version: select thw Windows platform, x86_64 architecture, 10 system, and exe (local) installation package and then select "Download" to download the CUDA installation software. After the download is cimplete, open the software. As shown in Figure 1-25, select the "Custom" option and click the "NEXT" button to enter the installation program selection list as shown in Figure 1-26. Here you can select the components that need to be installed and unselect those that do not need to be installed. Under the "CUDA" category, unselect the "Visual Studio Integration" item. Under the "Driver components" category compare the version number of "Current Version" is greater than "New Version," you need to uncheck the "Display Driver." If "Current Version is less than or equal to "New Version," leave "Display Driver." If Current Version" is less than or equal to "New Version," leave "Display Driver" checked, as shown in Figure 1-27. After the setup is complete, you can click "NEXT" and follow the instructions to install.

After the installation is complete, let's test whether the CUDA software is successfully installed. Open the "cmd" terminal and enter "nvcc-V" to print th ecurrent CUDA version information, as shown in Figure 1-28. If the command is not recongnized, the installation has filed. We can find the "nvcc.exe" program from the CUDA installation path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\V10.0\bin", as shown in Figure 1-29.

cuDNN Neural Network Acceleration Library Installation. CUDA is not a special GPU acceleration library for neural networks; it is designed for a variety of applications that require paralled computing. If you want to accelerate for neural network applications, you need to install an additional cuDNN library. It should be noted that the cuDNN library is not an executable program. You only need to download and decompress the cuDNN file and configure the Path environment variable.

Open the website https://developer.nvidia.com/cudnn and select "Download cuDNN." Due to NVIDIA regulations, users need to log in or create a new user to continue downloading. After logging in, enter the cuDNN download interface and check "I Agree To the Terms of the cuDNN Software License Agreement," and the cuDNN version download option will pop up, Select the cuDNN version that matches CUDA 10.0, and click the "cuDNN Library for Windows 10" link to download the cuDNN file, as shown in Figure 1-30. It should be noted that cuDNN itself has a version number, and it also needs to match the CUDA version number.

After downloading the cuDNN file, unzip it and rename the folder "cuda" to "cudnn765". Then copy the "cudnn765" folder to the CUDA installation path "C\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0". A dialog box that requires adminstrator rights mayu pop up here. Select continue to paste.

Environment Variable Configuration. We have completed the installation of cuDNN, but in order for the system to be aware of the location of the cuDNN file, we need to configure the Path environment variable as follows. Open the file brower, right-click "My Computer," select "Properties," select "Advanced system settings," and select "Environment Variables," as shown in  Figure 1-32. Select the "Path" environment variable in the "System variables" column and select "Edit," as shown in Figure 1-33. Select "New," enter the cuDNN installation path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\cudnn756\bin", and use the "Move up" button to move this item to the top.

After the CUDA installation is complete, the environment variables should include "C:\Program File\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin," "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp", and "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\cudnn756\bin". The preceding path may differ slightly according to the actual path, as shown in Figure 1-34. After confirmation, click "OK" to close all dialog boxes.


1.6.1 Anaconda installation

 The Pythonb interpreter is the bridge that allows code written in Python to be executed by CPU and is the core software of the Python language. Users candownload the appropriate version(Python 3.7 is used here) of the interpreter form www.python.org/. After the installation is completed, you can call the python.exe program to execute the source code file written in Python(.py files).

Here we choose to install Anaconda software that integrates a series of auxiliary functions such as the Python interpreter, package management, and virtual environment. We can download Andacoda from www.anaconda.com/distribution/#download-section and select the latest version of Python to download and install. As shown in Figure 1-22, check the "Add Anaconda to my PATH environment variable" option, so that you can call the Anacondat program through the command line. As shown in Figure 1-23, the installer asks whether to install the VS code software together. Select Skip. The entire installation process lasts about 5 minutes, and the specific time depends on the computer performance.

After the installation is complete, how can we verify that Anaconda was successfully installed? Pressing the Windows+R key combination on the keyboard, you can bring up the running program dialog box, enter "cmd," and press Enter to open the command-line program "cmd.exe" that comes with Windows. Or click the Start menu and enter "cmd" to find the "cmd.exe" program and open it. Enter the "conda list" command to view the installed libraries in the Python environment. If it is a newly installed Python environment, the listed libraries are all libraries that come with Anaconda, as shown in Figure 1-24. If the "conda list" can pop up a series of library list information normally, the Anaconda software installation is successful. Otherwise, the installation failed, and you need to reinstall.