-> google translator relies heavily on recurrent neural networks
-> we can use recurrent neural networks to make time series analysis(주식 분석)
Turing-Test: a computer passes the Turing-test if a human is unable to distinguish the computer from a human in a blind test
~ recurrent neural networks are able to pass this test: a well-trained recurrent network is able to,, understand" English for example
LEARN LANGUAGE MODELS!!
We would like to make sure that the network is able to learn connections in the data even when they are far away from each other
,, I am from Hungary. Lorem ipsum dolor sit amet, consetetur adipiscing elit, sed do eiusmod tempor incididnt
Recurrent neural networks are able to deal with relationships for away from eah other
~it is able to guess the last word: humgarian
Combining convolutional neural networks with recurrent neural networks is quite powerful
~ we can generate image descriptions with this hibrid approach
With Multilayer Neural Networks ( or deep networks) we make predictions independent of each other
p(t) is not correlated with p(t-1) or p(t-2)...
-> training examples are independent of each other
Tigers, elephants, cats .. nothing to do with each other
THESE PREDICTIONS ARE INDEPENDENT !!!!
With Recurrent Neural Networks we can predict the next word in a given sentence:
it is important in natural language processing ~or we want to predict the stock prices tomorrow
p(t) depends on p(t-1), p(t-2)....
TRAINING EXAMPLES ARE CORRELATED!!!
x: input
h: activation after applying the activation function on the output
How to train a recurrent neural network?
~we can unroll it in time in order to end up with a standard feedforward neural network:
we know how to deal with it
How to train a recurrent neural network?
~ we cna unroll it in time in order to end up with a standard feedforward neural network: we know how to deal with it
As you can see, serveral parameters are shared accross every single layer!!!
for a feed-forward network these weights are different
Vanishing/exploding gradients problem
When dealing with backpropagation we have to calculate the gradients
~ we just have to apply the chain rule several times
We multiply the weights several times: if you multiply x < 1 several times the result will get smaller and smaller
VANISHING GRADIENT PROBLEM
Backpropagation Through Time(BPTT): the same as backpropagation but these gradients/error signals will also flow backward from future time-steps to current time-steps
We multiply the weights several times: if you multiply x > 1 sereral times the result will get bigger and bigger
It is a problem when dealing with Recurrent Neural Networks usually
~because these networks are usaually deep!!!
-> why vanishing gradient is problem?
Because gradients become too small: difficult to model long-range dependencies
-> for recurrent neural networks, local optima are a much more significant problem than with feed-forward neural networks
~ error function surface is quite complex
These complex surfaces have several local optima and we want to find the global one: we can use meta-heuristic approaches as well
EXPLODING GRADIENTS PROBLEM
-> truncated BPTT algrithm: we use simple backpropagation but
We only do bckpropagation through k time-steps
-> adjust the learning rate with RMSProp(adaptive algorithm)
We normalize the gradients: using moving average over the root mean squared gradients
VANISHING GRADIENTS PROBLEM
-> initialize the weights properly(Xavier-initialization)
-> proper activation functions such as ReLU function
-> using other architectures:LSTM or GRUs