페이지

2022년 2월 26일 토요일

From TLUs to tuning perceptrons

 Besides these limitations for representing the XOR and XNOR operations, there are additional simplifications that cap the representational power of the TLU model; the weights are fixed, and the output can only be binary (0 or 1). Clearly, for a system such as a neuron to "learn," it needs to respond to the environment and determine the relevance of different inputs based on feedback from prior experiences. This idea was captured in the 1949 book Organization of Behavior by Canadian Psychologist Donald Hebb, who proposed that the activity of nearby neuraonal cells would tend to synchronize over time, sometimes paraphrased at Hebb's Law: Neurons that fire together wire together. Building on Hubb's proposal that weights changed over time, researcher Frank Rosenblatt of the Cornell Aeronautical Laboratory proposed the perceptron model in the 1950s. He replaced the fixed weights in the TLU model with adaptive weights and added a bias term, giving a new function:

We note that the inputs I have been denoted X to underscore the fact that they could be any value, not just binary 0 or 1. Combining Hebb's observations with the TLU model, the weights of the perceptron would be updated according to a simple learning rule:

1. Start with a set of J samples x(1).....x(j). These samples all have a label y which is 0 or 1, giving labeled data(y, x)(1)...(y,x)(j). These samples could have either a single value, in which case the perceptron has a single input , or be a vector with length N and indices i for multi-value input.

2. Initialize all weights w to a small random value or 0.

3. Compute the estimated value, yhat, for all the examples x using the perceptron function.

4. Update the weights using a learning rate r to more closely match the input to the desired output for each step t in training:

wi(t+1) = wi(t) + r(yi - yhati)xji, for all J samples and Nfeatures. 

Conceptually, note that if y is 0 and the target is 1, we want to increase the value of the weight by some increment r; likewike, if the target is 0 and the estimate is 1, we want to decrease the weight so the inputs do not exceed the threshold.

5. Repeat step 3-4 until the difference between the prediced and actual ouputs, y and yhat, falls below some desired threshold. In the case of a non zero bias term, b, an update can be computed as well using a similar formula.


While simple, you can appreciate that many patterns could be learned from such a clasifier, though still not the XOR function, However, by combining serveral perceptrons into multiple layers, these units could represent any simple Boolean function, and indeed McCulloch and Pitts had previously speculated on combining such simple units into a universal computeatation engine, or Turing Machine, that could represent any operation in a standard programming language. However, the preceding learning algorithm operates on each unit independently, meaning it could be extended to networks composed of many layers of perceptrons.


however, the 1969 book Percetrons, by MIT computer scientists Marvin Minksy and Seymour Papert, demonstrated that a three-layer feed-forward network required complete (non-zero weight) connections between at least one of these units (in the first layer) and all inputs to compute all possible logical outputs. This meant that instead of having a very sparese structure, like biological neurons, which are only oconnected to a few of their neighbors, these computational modles required very dense connections.

While connective sparsity has been incorporated in later architectures, such as CNNs, such dense connections remain a feature of many models too, particularly in the fully connected layers that oftern form the secound to last hidden layers in models. In addition to these models being computationally unwieldy on the hardware of the day, the observation that spare models could not compute all logical operations was interpreted more broadly by the research community as Perceptrons cannot compute XOR. While erroneous, this message led to a drought in funding for AI in subsequent years, a period sometimes refferred to as the AI Winter.

The next revolution in neural network research would require a more efficient way to compute the required parameters updated in complex models, a technique that would become known as backpropagation.



댓글 없음: