The recent popularity of AI algorithms might give the false impression that this field is new. Many recent models are based on discoveries made decades ago that have been reinvigorated by the massive computational resources available in the could and customized hardware for parallel matrix computations such as Graphical Processing Units(GPUs, Tensor Processing Units(TPUs), and Field Programmable Gate Array(FPGAs). If we consider research on neural networks to include their biological inspiration as will as computaitonal theory, this field is over a hundred years old. Indeed, one of the first neural networks described appears in the detaild anatomical illustrations of 19th Century scientist Santiago Ramon y Cajal, whose illustrations based on experimental observation of layers of interconnected neuranal cells inspired the Neuraon Doctrine - the idea that the brain is composed of individual, physically distinct and specialized cells, rather than a single continuous network. The distinct layers of the retina observed by Cajal were also the inspiration for particular neural network architectures such as the CNN, which we will discuss later in this chapter.
This observation of simple neuranal cells interconnected in large networks led computaional researchers to hypothesize how mental activity might bve represented by simple, logical operations that, combined, yield complex mental phenomena, The original "automata theory" is usually traced to a 1943 article by Warren McCulloch and Walter Pitts of the Massachusetts Institue of Technology. They described a simple model know as the Threshold Logic Unit(TLU), in which binary inputs are translated into a binary output based on a threshold:
where I is the input values, W is the weights with ranges from (0,1) or (-1,1), and f is a threshold function that converts these inputs into a binary output depending upon whether they exceed a threshold T.
f(x) = 1 if x > T, else 0
Visually and conceptually, there is some similarity between McCulloch and Pitts model and the biological neuron that inspired it. Their model integrates inputs into an output signal, just as the natural dendrites (short, input "arms" of the neuron that receive signals from other cells) of a neuraon synthesize inputs into a single output via the axon (this long "tail" of the cell, which passes signals received from the dendrites along to other neurons). We might imagine that, just as neuraonal cells are composed into networks to yield complex biological circuits, these simple units might be connected to simulate sophisticated decision processes.
Indeed, using this simple model, we can already start to represent several logical operations. If we consider a simple case of a neuron with one input, we can see that a TLU can solve an identity or negation function.
For an identity operation that simple returns the input as output, the weight matrix would have Is on the diagonal(or be simply the scalar 1, for a single numerical input, as illustrated in Table 1);
Similarly, for a negation operation, the weight matrix could be a negative identity matrix, with a threshold at 0 flipping the sign of the output from the input:
Given two inputs, a TLU could also represent operations such as AND and OR.
Here, a threshold could be set such that combined input values either have to exceed 2(to yield an output of 1)for an AND operation or 1(to yield an output of 1 if either of the two inputs are 1) in an OR operation.
However, a TLU cannot capture patterns such as Exclusive OR(XOR), which emits 1 if and only if the OR condition is true.
To see why this is true, consider a TLU with two inputs and positive weights of 1 for each unit. If the threshold value T is 1, then inputs of (0,0), (1,0), and (0,1) will yield the correct value. What happens with (1,1) though? Because the threshold function returns 1 for any inputs summing to greater than 1, it cannot represent XOP(Table 3.5), which would require a second threshold to compute a different output once a different, higher value is exceeded. Changing one or both of the weights to negative values won't help either; the problem is that the decision threshold operates only in one direction and can't be reversed for larger inputs.
Similarly, the TLU can't represent the negation of the Exclusive NOR, XNOR As with the XOR operation, the impossibility of the XNOR operation being represented by a TLU function can be illustrated by considering a weight matrix of two 1s; for two inputs (1,0) or (0,1), we obtain the correct value if we set a threshold of 2 for outputting 1. As with the XOR operation, we run into a problem with an input of (0,0), as we can't set a second threshold to output 1 at a sum of 0.