It is apparent that a neural network derives its computing power through, first, its massively parallel distributed structure and, second, its ability to learn and therefore generalize. Generalization refers to the neural network producing reasonable outputs for inputs not encountered during training(learning). These two information-processing capabilities make it possible for neural networks to solve complex(large-scale) provide the solution by working individually. Rather, they need to be integrated into a consistent system engineering approach. Specifically, a complex problem of interest is decomposed into a number of relatively simple tasks, and neural networks are assigned a subset of the tasks that match their inherent capabilities. It is important to recognize, however, that we have a long way to go(if ever) before we can build a computer architecture that mimics a human brain.
The use of neural networks offers the following useful properties and capabilities:
1. Nonlinearity. An artificial neuron can be linear or nonlinear. A neural network, made up of an interconnection of nonlinear neurons, is itself nonlinear. Moreover, the nonlinearity is of special kind in the sense that it is distributed throughout the network. Nonlinearity is a highly important property, particularly if the underlying physical mechanism responsible for generation of the input signal(e.g., speech signal) is inherently nonlinear.
2. Iuput-Output Mapping. A popular paradigm of learning called learning with a teacher or supervised learning involves modification of the synaptic weights of a neural network by applying a set of labeled training samples or task examples. Each example consists of a unique input signal and a corresponding desired response. The network is presented with an example picked at random from the set, and the synaptic weights(free parameters) of the network are modified to minimize the difference between the desired response and the actual response of the network produced by the input signal in accordance with an appropriate statistical criterion. The training of the network is repeated for many examples in the set until the network reaches a steady state where there are no further significant changes in the synaptic weights. The previously applied training examples may be reapplied during the training session but in a different order.
Thus the network learns from the examples by constructing an input-output mapping for the problem at hand. Such an approach brings to mind the study of nonparametric statistical inference, which is branch of statistics dealing with model-free estimation, or, from a biological viewpoint, tabula rasa learning (Geman et. al., 1992); the term "nonparametric" is used here to signify the fact that no prior assumptions are made on a statistical model for the input data. Consider, for example, a pattern classification task, where the requirement is to assign an input signal representing a physical object or event to one of several prespecified categories (classes). In a nonparametric approach to this problem, the requirement is to "estimate" arbitrary decision boundaries in the input signal space for the pattern-classification task using a set of examples, and to do so without invoking a probabilistic distribution model. A similar point of view is implicit in the supervised learning paradigm, which suggests a close analogy between the input-output mapping performed by a neural network and nonparametric statistical inference.
3. Adaptivity. Neural networks have a built-in capability to adapt their synaptic weights to changes in the surrounding environment. In particular, a neural network trained to operate in a specific environment can be easily retrained to deal with minor changes in the operating environmental conditions. Moreover, when it is operating in a nonstationary environment (i.3., one where statistics change with time), a neural network can be designed to change its synaptic weights in real time. The natural architecture of a neural network for pattern classification, signal processing, and control applications, coupled with the adaptive capability of the network, make it a useful tool in adaptive pattern classification, adaptive signal processing, and adaptive control. As a general rule, it may be said that the more adaptive we make a system, all the time ensuring that the system remains stable, the more robust its performance will likely be when the system is required to operate in a nonstationary environment. It should be emphasized, however, that adaptivity does not always lead to robustness; indeed, it may do the very opposite, For example, an adaptive system with short time constants may change raplidly and therefore tend to respond to spurious disturbances, causing a drastic degradation in system performance. To realize the full benefits of adaptivity, the principal time constants of the system should be long enough for the system to ignore spurious disturbances and yet short enough to respond to meaningful changes in the environment; the problem described here is referred to as the stability-plasticity dilemma(Grossberg, 1988b).
4. Evidential Response. In the context of pattern classification, a neural network can be designed to provide information not only about which particular pattern to select, but also about the confidence in the decision made. This latter information may be used to reject ambiguous patterns, should they arise, and thereby improve the classification performance of the network.
5. Contextual Information. Knowledge is represented by the very structure and activation state of a neural network. Every neuron in the network is potentially affected by the global activity of all other neurons in the network. Consequently, contextual information is dealt with naturally by a neural network.
6. Fault Tolerance. A neural network, implemented in hardware form, has the potential to be inherently fault tolerant, or capable of robust computation, in the sense that its performance degrades gracefully under adverse operating conditions. For example, it a neuron or its connecting links are damaged, recall of a stored pattern is impaired in quality. However, due to the distributed nature of information stored in the network, the damage has to be extensive before the overall response of the network is degraded seriously. Thus, in principle, a neural network exhibits a graceful degradation in performance rather than catastrophic failure. There is some empirical evidence for robust computation, but usually it is uncontrolled. In order to be assured that the neral network is in fact fault tolerant, it may be necessary to take corrective measures in designing the algorithm used to train the network(Kerlirzin and Vallet, 1993).
7. VLSI Implementability. The massively parallel nature of a neural network makes it potentially fast for the computation of certain tasks. This same feature makes a neural network well suited for implementation using very-large-scale-integrated(VLSI) technology. One particular beneficial virture of VLSI is that it provideds a means of capturing truly complex behavior in a highly hierarchical fashion(Mead, 1989).
8. Uniformity of Analysis and Design. Basically, neural networks enjoy universality as information processors. We say this in the sense that the same notation is used in all domains involving the application of neural networks. This feature manifests itself in different ways:
- Neureaons, in one form or another, represent an ingredient common to all neural networks.
- This commonality makes it possible to share theories and learning algorithms in different applications of neural networks.
- Modular networks can be built through a seamless integration of modules.
9. Neurobiological Analogy. The design of a neural network is motivated by analogy with the brain, which is a living proof that fault tolerant parallel processing is not only physically possibile but also fast and powerful. Neurobiologists look to (artificial) neural networks as a research tool for the interpretation of neuraobiological phenomena. On the other hand, engineers look to neurobiology for new ideas to solve problems more complex than those based on conventional hard-wired design techniques. These two viewpoints are illustrated by the following two respective examples:
- In Anastasio(1993), linear system models of the vestibulo-ocular reflex are compared to neural network models based on recurrent networks that are described in Section 1.6 and discussed in detail in Chapter 15. The vestibulo-ocular reflexvisual(i.e., retinal) image statbility by making eye rotations that are opposite to head rotations. The VOR is mediated by premotor neurons in the vestibular nuclei that receive and process head rotation signals from vestibular sensory neurons and send the results to the eye muscle motor neurons. The VOR is well suited for modeling because its input (head rotation) and its output(eye rotation) can be precisely specified. It is slso a relatively simple reflex and the neurophsysiological properties of its constituent neurons have been well described.
Among the three neural types, the premotor neurons(reflex interneurons) in the vestibular nuclei are the most complex and therefore most complex and therefore most interesting. The VOR has previously been modeled using lumped, linear system descriptors and control theory. These models were useful in explaining some of the overall properties of the VOR, but gave little ingight into the properties of its consituent neurons. This sistuation has been greatlyu improved through neural network modeling. Recurrent network models of VOR(programmed using an algorithm called real-time recurrent learning that is described in Chapter 15) can reproduce and ehlp explain many of the static, dynamic, nonlinear, and distribueted aspects of signal processing by the neurons that mediate the VOR, especially the vestibular nuclei neurons(Anastasio, 1993).
- The retina, more than any other part of the brain, is where we begin to put together the relatiionships between the outside world represented by a visual sense, its physiucal image projected onto an array of receptors, and the first neural images. The retina is a thin sheet of neural tissue that lines the posterio hemisphere of the eyeball. The retina's tgask is to convert an optical image into a neural image for transmission down the optic nerve to a multitude of centers for further analysis. This is a complex task, as evidenced by th synaptic organization for the retian. In all vertebvrate retinas the transformation from optical to neural image involves three stages(Sterling, 1990):
(i) Photo transduction byu a layer of receptor neurons.
(ii) Transmission of the resulting signals (produced in response to light) by chemical synapses to a layer of bipolar cells.
(iii) Transmission of these signals, also by chemical synapses, to ouput neurons that are called ganglion cells.
At both synaptic stages (i.e., from receptor to bipolar cells, and from bipolar to ganglion cells), there are specialized laterally connected neurons called horizontal cells and amacrine cells, respectively. The task of these neuyrons is to modify the transmission across the synaptic layuers. There are also centrifugal elements called inter-plexiform cells; their task it to convery siugnals from the innner synaptic layer back to the outer one. A few researchers have built electronic chips that mimic the structure of the retina (Mahowald and Mead, 1989; Boahen and Ardreou, 1992; Boahen, 19956). These electronic chips are called neuromorphic integrated circuits, a term conined by Mead(1989). A neuromorphic imageing sensor consists of an arrray of photoreceptors combined with analog circuitry at each picture element(pixel). It emulates the retina in that it can adapt locally to changes in brightness, detect edges, and detect motion. The neurobiological analyogy, exemplified by neuromorphic integrated circuits is useful in another important way: It provides a hope and belief, and to a certain extents and existence of proof, that physicall u nderstanding of neurobiological structures could have a productive influence on the art of electronics and VLSI technology.
With inspireation from neurobiology in mind, it seems appropriate that we take a brief look at the human brain and its structural levels of organization.
1.2 HUMAN BRAIN
The human nervous sy stem may be viewed as a thress-stage system, as depicted in the block diagram of Fig. 1.1 (Arbib, 1987). Central to the system is the brain, represented by the neural (nerve) net, which continually receives information, perceives it, and makes appropriate decisions. Two sets of arrows are shown in the figure. Those pointing from left to right indicate the forward transmission of information-bearing signals through the system. The arrows pointing from right to left signify the presence of feedback in the system. The receptors convert stimuli from the human body or the external environment into electrical impulses generated by the neural net into discernible responses as system outputs.
The struggle to understand the brain bas been made easier because of the pioneering work of Ramon y Cajal(1911), who introduced the idea of neurons as structural constituents of the brain. Typically, neurons are five to six orders of magnitude slower than silicon logic gates; events in a silicon chip happen in the nanosecond(10-9 s) range, whereas neural events happen in the millisecound(10-3)range. However, the brain makes up for the relatively slow rate of operation of a neuron by having a truly staggering number of neurons (nerve cells) with massive interconnections between them. It is estimated that there are approximately 10 billion neurons in the human cortex, and 60 trillion synapses or connections (Shepherd and Koch, 1990). The net result is that the brain is an enormouslyu efficient structure. Specificallyu, the energetic efficiency of the brain is approximately 10-16 joules(J) per operation per secound, whereas operation per secound(Faggin, 1991).
Synapses are elementary structural and functional units that mediate the interactions between neuraons. The most common kind of synapse is a chemical synapse, which operates as follows. A presynaptic process librates a transmitter substance tghat diffuses across the synaptic junction between neurons and then acts on a postsynaptic process.
Thus a synapse convers a presynaptic electrical signal into a chemical signal and then back into a postsynaptic electical signal (Shepherd and Koch, 1990). In electrical termiology, such an element is said to be a nonreciprocal two-port device. In traditional descriptions of neural organization, it is assumed that a synapse is a simple connnection that can impose excitation or inhibition, but not both on the receptive neuron.
Earlier we mentioned that plasticity permits the developing nervous system to adapt to its surrounding environment(Eggermont, 1990; Churchland and Sejnowsi, 1992). In an adult brain, plasticity may be accounted for by two mechanisms: the creation of new synaptic connections between neurons, and the modification of existing synapses. Axons, the transmission lines, and dendrites, the receptive zones, constitute two types of cell filaments that are distinguished on morphological grounds; and axon has a smoother surface, fewer branches, and greater length, whereas a dendrite (so called becaouse of its resemblance to a tree)has an irregular surface and more branches(Freeman, 1975). Neurons come in a wide variety of shapes and sizes in differenct parats of the brain. Figure 1.2 illustrates the shape of a pyramidal cell, which is one of the most common types of cortical neurons. Like many other types of neuraons, it receives most of tis inputs though dendritic spiness; see the segment of dendrite inthe insert in Fig. 1.2 for detail. The pyramidal cell can receive 10,000 or more synaptic contacts and it can project onto thousands of thousands of target cells.
The majority of neurons encode their outputs as a series of brief voltage pulses, These pulses, commonly known as action potentials or spikes, originate at or close to the cell body of neurons and then propagate across the individual neurons at constant velocity and amplitude. The reasons for the use of action potentials for communication among neurons are based on the physics of axons. The axon of a neuron is very long and thin and is characterized by high electrical resistance and very large capacitance.
Both of these elements are distributed across the axon. The axon may therefore be modeled as an RC transmission line, hence the common use of "cable equation" as the terminology for describing signal propagation along an axon. Analysis of this propagation mechanism reveals that when a voltage is applied at one end of the axon it decays exponentially with distance, dropping to an insignificant level by the time it reaches the other end. The action potentials provide a way to circumvent this transmission problem(Anderson, 1995).
In the brain there are both small-0scale and large-scale anatomical organizations, and different functions take place at lower and higher levels. Figure 1.3 shows a hierarchy of interwoen levels of organization that has emerged from the extensive work done on the analysis of local regions in the brain (Shepherd and Koch, 1990; Churchland and Sejnowsk, 1992).The synapses represent the most fundamental level, depending on molecules and ions for then neurons. A neural microcircuit refers to an assembly of synapses organized into patterns of connectivity to produce a functional operation of interest. A neural microcircuit may be likened to a silicon chip made up of an assembly of transistors. The smallest size of microcircuits is measured in micrometers(um), and their fastest speed of operation is measured in milliseconds. The neural microcircuits are grouped to form dendritic subunits within the dendritic trees of individual neurons. The whole neuraon, about 100um in size, contains serveral dendritic subunits. At the next level of complexity we have local circuits (about 1mm in size) made up of neurons with siumilar or different properties; theses neural assemblies perform operations characteristic of a localized region in the brain. This is followed by interregional circuits amde up of pathways, columns, and topographic maps, which involve multiple regions located in different parts of the brain.
Topographic maps are organized to respond to incoming sensory information. These maps are often arranged in sheets, as in the superior colliculus, where the visual, auditory, and somatosensory maps are stacked in adjacent layers in such a way that stimuli from corresponding points in space lie abobe or below each other. Figure 1.4 presents a cytoarchitectural map of the cerebral cortex as worked out by Brodmann (Brodal, 1981),. This figure shows clearly that different sensory inputs (motor, somatosensory, visual, auditory, etc.) are mapped onto corresponding areas of the cerebral cortex in an orderly fashion. At the final level of complexity, the topographic maps and other interregional circuits mediate specific types of behavior in the central nervous system.
It is important to recognize that the structural levels or organization described herein are a unique characteristic or the brain. They are nowhere to be found in a digital computer, and we are nowhere close to re-creating them with artificial neural networks. Nevertheless, we are inching out way toward a hierarch of computational levels similar to that described in Fig. 1.3. The artificial neurons we use to build our neural networks are truly primitive in comparision to those found in the brain. The neural networks we are presently able to design are just as primitive compared to the local circuits and the interregiional circuits in the brain. What is really satisfying, however, is the remarkable progress that we have made on so many fronts during the past two decades. With neurobiological analogy as the dource of inspiration, and the wealth of theoretical and technological tools that we are bringing together, it is certain that in another decade our understanding of artificial neural networks will be much more sophisticated than it is today.
Our primary interest in this book is confined to the stydy of artificial neuralk networks from an engineering perspective. We begin the study by describing the models of (artificial) neurons that from the basis of the neural networks considered in subsequent chapters of the book.