페이지

2019년 9월 8일 일요일

Recurrent Neural Networks

-> google translator relies heavily on recurrent neural networks
-> we can use recurrent neural networks to make time series analysis(주식 분석)

Turing-Test: a computer passes the Turing-test if a human is unable to distinguish the computer from a human in a blind test

~ recurrent neural networks are able to pass this test: a well-trained recurrent network is able to,, understand" English for example

LEARN LANGUAGE MODELS!!

We would like to make sure that the network is able to learn connections in the data even when they are far away from each other

,, I am from Hungary. Lorem ipsum dolor sit amet, consetetur adipiscing elit, sed do eiusmod tempor incididnt

Recurrent neural networks are able to deal with relationships for away from eah other

~it is able to guess the last word: humgarian


Combining convolutional neural networks with recurrent neural networks is quite powerful

~ we can generate image descriptions with this hibrid approach


With Multilayer Neural Networks ( or deep networks) we make predictions independent of each other
p(t) is not correlated with p(t-1) or p(t-2)...

-> training examples are independent of each other
               Tigers, elephants, cats ..  nothing to do with each other

THESE PREDICTIONS ARE INDEPENDENT !!!!

With Recurrent Neural Networks we can predict the next word in a given sentence:
     it is important in natural language processing ~or we want to predict the stock prices tomorrow

p(t) depends on p(t-1), p(t-2)....

TRAINING EXAMPLES ARE CORRELATED!!!


x: input
h: activation after applying the activation function on the output

How to train a recurrent neural network?
~we can unroll it in time in order to end up with a standard feedforward neural network:
we know how to deal with it

How to train a recurrent neural network?

~ we cna unroll it in time in order to end up with a standard feedforward neural network: we know how to deal with it

As you can see, serveral parameters are shared accross every single layer!!!

for a feed-forward network these weights are different

Vanishing/exploding gradients problem
When dealing with backpropagation we have to calculate the gradients

~ we just have to apply the chain rule several times
We multiply the weights several times: if you multiply x < 1 several times the result will get smaller and smaller

VANISHING GRADIENT PROBLEM

Backpropagation Through Time(BPTT): the same as backpropagation but these gradients/error signals will also flow backward from future time-steps to current time-steps

We multiply the weights several times: if you multiply x > 1 sereral times the result will get bigger and bigger

It is a problem when dealing with Recurrent Neural Networks usually
~because these networks are usaually deep!!!

-> why vanishing gradient is problem?
Because gradients become too small: difficult to model long-range dependencies

-> for recurrent neural networks, local optima are a much more significant problem than with feed-forward neural networks
~ error function surface is quite complex

These complex surfaces have several local optima and we want to find the global one: we can use meta-heuristic approaches as well

EXPLODING GRADIENTS PROBLEM
-> truncated BPTT algrithm: we use simple backpropagation but
We only do bckpropagation through k time-steps

-> adjust the learning rate with RMSProp(adaptive algorithm)
We normalize the gradients: using moving average over the root mean squared gradients

VANISHING GRADIENTS PROBLEM
-> initialize the weights properly(Xavier-initialization)
-> proper activation functions such as ReLU function
-> using other architectures:LSTM or GRUs