페이지

2022년 5월 16일 월요일

1.3.3 Network Scale

 Early perceptron models and multilayer neural networks only have one or two to four layers, and the network parameters are also around tens of thousands. With the development of deep learning and the improvement of computing capabilities, models such as AlexNet(8layers), VGG16(16 layers), GoogleNet(22 layers), REsNet50(50 layers), and DenseNet121(121 layers) have been proposed successively, while the size of inputting pictures has also gradually increased from 28 * 28 to 244 * 244 to 299 * 299 and even alrger. These changes make the total number of parameters of the network reach ten million levels, as shown in Figure 1-13.

The increase of network scale enhances the capacity of the neural networks correspondingly, so that the networks can learn more complex data modalities and the model performance can be improved accordingly. On the other hand, the increase of the network scale also means that we need more training data and computational power to avoid overfitting.

1.3.2 Computing Power

 The increase in computing power is an important factor in the third artificial intelligence renaissance. In fact, the basic theory of modern deep learning was proposed in the 1980s, but the real potential of deep learning was not realized until the release of AlexNet based on training on two GTX580 GPUs in 2012. Traditional machine learning algorithms do not have stringent requirements on data volume and computing power like deep learning. Usually, serial training on CPU can get satisfactory results. But deep elarning relies heavily on parallel acceleration computing devices. Most of current neural networks use parallel acceleration chips such as NVIDIA GPU and Google TPU to train model parameters. For example, the AlphaGo Zero program needs to be trained on 64 GPUs from scratch for 40 days before surpassing all AlphaGo historical versions. The automatic network structure search algorithm used 800 GPU s to optimize a better network strtucture.

At present, the deep elarnign acceleration hardware device that ordinary consumers can sue are mainly from NVIDIA GPU from 2008 to 2017. It can be seen that the curve of x86 CPU changes relatively slowly, and the floating-point computing capacity of NVIDIA GPU grows exponentially which is mainly driven by the increasing business of game and deep learning computing.

1.3.1 Data Volume

 Early machine learning algorithms are relatively simple and fast to train, and the size of the required dataset is relatively small, such as the Iris flower dataset collected by the Brithish statistician Ronald Fisher in 1936, which contains only three categories of flowers, with each category having 50 samples, With the development of computer technology, the designed algorithms are more and more complex, and the demand for data volume is also increasing. The MNIST handwritten digital picture dataset collected by Yann LeCun in 1998 contains a total of ten categories of numbers from 0 to 9, with up to 7,000 pictures in each category. With the rise of neural networks, especially deep learning networks, the number of network layers is generally large, and the number of model parameters can reach one million, ten million, or even one billion. To prevent overfitting, the isze of the training dataset is usually huge. The popularity of modern social media also makes it possible to collect huge amounts of data. For example, the ImageNet dataset released in 2010 included a toal of 14,197,122 pictures, and the compressed file size of the entire dataset was 154GB, Figures 1-10 and 1-11 list the number of samples and the size of the data set over time.

Although deep learning has a high demand for large datasets, collecting data, especially collecting labeled data, is often very expensive. The formation of dataset usually requires manual collection, crawling of raw data and cleaning out invalid samples, and then annotating the data samples with human intelligence, so subjective bias and random errors are inevitably introduced. Therefore, algorithms with small data volume requirement are very hot topics.


1.3 Deep Learning Characteristics

 Compared with traditional machine learning algorithms and shallow neural networks, modern deep learning algorithms usually have the following characteristics.

1.2.2 Deep Learning

 In 2006, Geoffirey Hinton et al. found that multilayer neural networks can be better trained through layer-by-layer pre-training and achieved a better error rate than SVM on the  MNIST handwritten digital picture data set, turning on the third artificial intelligence revival. In that paper, Geoffrey Hinton first proposed the concept of deep learning. In 2011, Xavier Glorot proposed a Rectified Linear Unit (ReLU) activation function, which is one of the most widely used activation functions now. In 2012, Alex Krizhevsky propeosed an eight-layer deep neural network AlexNet, which used the ReLU activation functio nand Dropout technology to prevent overfitting. At the same time, it abandoned the layer-byt-layer pre-training method and directly trained the network on two NVIDIA GTX580 GPUs. AlexNet won the first place in the ILSVRC-2012 picture recognition competition, showing a stunning 10.9% reduction in the top-5 error rate compared with the second place.

Since the AlexHNet model was developed, various models have been published successively, including VGG series models increase the number of layers in the network to hundreds or even thousands while manintaining the smae or even better performance, which is the most representative model of deep learning.

In addition to the amazing results in supervised learning, huge achievements have also been made in unsupervised learning and reinforcement learning. In 2014, Ian Goodfellow proposed generative adversarial networks(GANs), which learned the true distribution of samples through adversarial training to generate samples with higher approximation. Since then, a large number of GAN models have been proposed. The latest image generation models can generate images that reach a degree of fidelity hard to discern from the naked eye. In 2016, DeepMind applied deep neural networks to the field of reinforcement learning and proposed the DQN algorithm, which achieved a level comparable to or even higher than that of humans in 49 games in the Atarigame platform. In the field of Go, AlphaGo and AlphaGo Zero intelligent programs from Deep Mind have successively defeated hyuman top Go players Li Shishi, Ke jie, etc. In the multi-agent collaboration Dota 2 game platform, OpenAI five intelligent programs developed by OpenAI defeated the T18 champion team OG in a restricted game environment, showing a large number of professional high-level intelligent operations. Figure 1-9 lists the major time points between 2006 and 2019 for AI development.

2022년 5월 15일 일요일

1.2.1 Shallow Neural Netorks

 In 1943, psychologist Warrent McCulloch and logician Walter Pitts proposed the earliest mathematical model of neuraons based on tghe structure of biological neuraons, called MP neuraon models after their last name initials. The model f(x)=h(g(x)), where g(x)=iXi, Xi∈{0,1}, takes values from g(x) to predict output values as shown in Figure 1-4. If g(x) >=0, output is 1; if g(x) < 0, output is 0. The MO neuraon models have no learning ability and can onlyu complete fixed logic judgments.

In 1958, American psychologist Frank Rosenblatt proposed the first neuron model that can automatically learn weights, called perceptron. As xshown in Figure 1-5, the error between the output value 0 and the true value y is used to adjust the weights of the neuraons {w1, w2, w3...wn}. Frank Rosenblatt then implemented the perceptron model based on the "Mark 1 perceptron" hardware. As shown in Figures 1-7 and 1-7, the input is an image sensor with 400 pixels, and the output has eight nodes. It can successfully identify some English letters. It is generally believed that 1943-1969 is the first prosperous period of artificial intelligence development.

In 1969, the American scientist Marvin Minsky and others pointed out the main flaw of linear models such as perceptrons in the book Perceptrons. They found that perceptrons cannot handle simple linear inseparable problems such as XOR. This directly led to the trough period of perceptron-related research on neural networks. It is generally considered that 1969-1982 was the first winter of artificial intelligence.

Although it was in the trough period of AI, there were still many significant studies published one after another. The most important one is the backpropagation(BP) algorithm, which is still the core foundation of modern deep learning algorithms. In fact, the mathematical idea of the BP algorithm has been derived as early as the 1960s, but it had not been applied to neural networks at that time. In 1974, American scientist Paul Werbos first proposed that the  BP algorithm can be applied to neural networks in his doctoral dissertation. Unfortunately, this result has not received enough attention. In 1986, David Rumelhard et al. published a paper using the BP algorithm for feature learning in Nature, Since then, the BP algorithm started gaining widespread attention.

In 1982, with the introduction of John Hopfield[s cyclically connected Hopfield network, the second wave of artificial intelligence renaissance was started from 1982 to 1995. During this period, convolutional neural networks, recurrent neural networks, and backpropatation algorithms were developed one after another. In 1986, David Rumelhart, Geoffreey Hiton, and other applied the BP algorithm to multilayer perceptrons, in 1989, Yann LeCun and other applied the BP algorithm to handwritten digital image recognition and acghieved great success, which is known as LeNet. The LeNet system was successfully commericalled in zip code recognition, bank check recognition, and many other systems. In 1997, one of the most widely used recurrent neural network variants, Long ShortTerm Memory(LTSM), was proposed by Jurgen Schmidhuber. In the same year, a bidrectional recurrent neural network was also proposed.

Unfortunately, the study of neural networks has graduallyu entered a though with the rise of traditional machine learning algorithms represented by support vector machines(SVNs), which is known as the second winder of artificial intelligence. Suppport vaector amchines have a rigoroutstheoretical founda5tion, requre a small number of training samples,a nd also have good generalization capabilities. In contrast, neural networks lack theorerical foundation and are hard to interpret. Deep networks are difficult to train, and the performance is normal. Figure 1-8 shows the significant time of AI developemnt between 1943and 2006



2022년 5월 14일 토요일

1.2 The History of Neural Networks

 We divide the development of neural networks into shallow neural networks stages and deep learning stages, with 2006 as the dividing point. Before 2006, deep learning developed under the name of neural networks and experienced two ups and two downs. In 2006, Geoffrey Hinton first named deep neural networks as deep learning, which started its third revival.