페이지

2022년 5월 21일 토요일

CHAPTER1. Introduction to Deep Learning

 LEARNING OBJECTIVES

After reading through this chapter, the reader will understand the following:

- The need for Deep Learning

- What is the need of transition from Machine Learning to Deep Learning?

- The tools and languages available for Deep Learning

- Further reading


2022년 5월 17일 화요일

1.5 Deep Learing Framework

 If a workman wants to be good, he must first sharpen his weapon. After learning about the basic knowledge of deep learning, let's pick the tools used to implement deep learning algorithms.

1.4.3 Reinforcement Learning

 Virtual Games. Compared with the real environment, virtual game platforms can both train and test reinforcement learning algorithms and can avoid interference from irrelevant factors while also minimizing the cost of experiments. Currently, commonly used virtual game platforms include OpenAI Gym, OpenAI Universe, OpenAI Roboschool, DeepMind OpenSpiel, and MuJoCo, and commonly used reinforcement learning algorithms include DQN, A3C, A2C, and PPO. In the field of Go, the DeepMind AlphaGo program has surpassed human Go experts. In Dota2 and StarCraft games, the intelligent programs developed by OpenAI and DeepMind have also defeated professional teams under restriction rules.

Robotics. In the real environment, the ocntrol of robots has also made some progress. For example, UC Berkeley Lab has made a lot of progress in the areas of imitation learning, meta learning, and few-shot learning in the field of robotics. Boston Dynamics has made gratifying achievements in robot applications. The robots it manufactures perform well on tasks such as complex terrain walking and multi-agent collaboration (Figure 1-19).

Autonomous driving is considered as an application direction of reinforcement learning in the short term. Many companies have invested a lot of resources in autonomous driving, such as Baidu, Uber, and Google. Apollo from Baidu has begun trial operations in Beijing, Xiong'an, Wuhan, and other places. Figure 1-20 shows Baidu's self-driving car Apollo.

1.4.2 Natural Language Processing

 Machine Translation. In the past, machine translation algorithms were usually based on statistical machine translation models, which were also the technology used by Google's translation system before 2016. In November 2016, Google launched the Google Neural Machine Translation(GNMT) system based on the Seq2Seq model. For the first time, the direct translation technology from source lanuage to target language was realized with 50~90% improvement on multiple tasks. Commonly used machine translation models are Seq2Seq, BERT, GPT, and GPT-2, Among them, the GPT-2 model proposed by OpenAI has about 1.5 billion parameters. At the begining, OpenAI refused to open-source the GPT-2 model due to technical security reasons. Chatbot is also a mainstream task of natural language processing. Machines automatically learn to talk to humans, provide satisfactory automatic responses to simple human demands, and improve customer service efficiency and service quality. Chatbot is often used in consulting esystems, entertrainment systems, and smart homes.

1.4.1 Computer Vision

 Image classification is a common classification problem. The input of the neural network is pictures, and the output value is the probability that the current sample belongs to each category. Generally, the category with the highest probability is selected as the predicted category of the sample.

Image recognition is one of the earliest successful applications of deep learning. Classic neural network models include VGG series, Inception series, and ResNet series.

Object detection refers to the automatic detection of the approximate locationof common objects in a picture by an algorithm. It is usually represented by a bounding box and classifies the category information of objects in the bounding box, as shown in Figure 1-15. Common object detection algorithms are RCNN, Fast RCNN, Faster RCNN, Mask RCNN, SDD, and YOLO eries.

Semantic segmentation is an algorithm to automatically segment and identify the content in a picture. We can understand semantic segmentation as the classification of each pixel and analyze the category information of each pixel, as shown in Figure 1-16. Common semantic segmentationi models include FCN, U-net, SegNet, and DeepLab series.

Video Understanding. As Deep learning achieves better result on 2D picture-related tasks, 3D video understanding tasks with temporal dimention information (the third dimention is sequence of frames) are receiving more and more attention. Common video understanding tasks include video classification, behavior detection, and video subject extraction. Common models are C3D, TSN, DOVF, and TS_LSTM.

Image generation learns the distribution of real pictures and samples from the learned distribution to obtain highly realistic generated pictures. At present, common image generation models include VAE series and GAN series. Among them, the GAN series of algorithms have made great progress in recent years. The picture effect produced by the latest GAN model has reached a level where it is difficult to distingush the authenticity with the naked eye, as shown in Figure 1-17.

In addition to the preceding applications, deep learning has also achieved significant results in other areas, such as artistic style transfer(Figure 1-18), super-resolution, picture de-nosing/hazing, grayscale picture coloring and many others.


1.4 Deep Learning Applications

 Deep learning algorithms have been widely used in our daily life, such as vocie assistants in mobile phones, intelligent assisted driving in cars, and face payments. We will introduce some mainstream applications of deep learning starting with computer vision, natural language processing, and reinforcement learning.

1.3.4 General Intelligence

 In the past, in order to improve the performance of an algorithm on a certain task, it is often necessary to use prior knowledge to manually design corresponding features to help the algorithm better converge to the optimal solution. This type of feature extraction method is often strongly related to the specific task. Once the scenario changes, these artificially designed features and prior settings cannot adapt to the new scenario, and people often need to redesign the algorithms.

Designing a universal intelligent mechanism that can automatically learn and self-adjust like the human brain has always been the common vision of human beings. Deep learning is one of the algorithms closest to general intelligence. In the computer vision field, previous methods that need to desing features for specific tasks and add a priori assumptions have been abandoned by deep learning algorithms. At present, almost all algorithms in image recognition, object detection, and semantic segmentation are based on end-to-end deep learning models, which present good performance and strong adaptability. On the Atari game platform, the DQN algorithm designed by DeepMind can reach human equivalent level in 49 games under the same algorithm, model structure, and hyperparameter settings, showing a certain degree of general intelligence. Figure 1-14 is the network structure of the DQN algorithm. It is not designed for a certain game but can control 49 games on the Atari game platform