Limit(0)

2022년 2월 5일 토요일

The rules of probability

At the simplest level, a model, be it for machine learning or more classical method such as linear regression, is a mathematical description of how various kinds of data relate to one another.

In the task of modeling, we usually think about separating the variables of our dataset into two broad classes:

1. Independent data, which primarily means inputs to a model, are denoted by X. These could be categorical features(such as a "0" or "1" in six columns indicating which of six schools a student attends), continuous(such as the heights or text scores of the same students), or ordinal(the rank or a student in the class).

2. Dependent data, conversely, are the outputs of our models, and are denoted by Y.(Note that in some cases Y is label that can be used to condition a generative output, such as in a conditional GAN.) As with the independent variables, these can be continuous, categorical, or ordinal, and they can be an individual element or multidimensional matrix(tensor) for each element of the datast.

So how can we describe the data in our model using statistics? In other words, how can we quantiatively describe what values we are likely to see, and how frequently, and which values are more likely to appear together? One way is by asking the likelihood of observing a particular value in the data, or the probability of that value.

For example, it we were to ask what the probability is of observing a roll of 4 on a six-sided die, the answere is that, on average, we would observe a 4 once every six rols. We wite this as follows:

P(X=4)=1 = 16 = 16.67%

where P denotes probability of.

What defines the allowed probability values for a particular dataset? If we imagine the set of all possible values of a dataset, such as all values of a die, then a probability maps each value to a number between 0 and 1. The minimum is 0 because we can't have a negative chance of seeing a result; the most unlikely result is that we would never see a particular value, or 0% probability, such as rollihng a 7 on a six-side die, Similarly, we can't have greater than 100% probability of observing a result, represented by the value 1; an oucome with probabily 1 is absolutely certain. This set of probability values associated with a dataset belong to discrete classes(such as the faces of a die) or an infinite set of potential values(such as variations in height or weight). In either case, however, these values have to follow certain rules, the Probability Axioms described by the mathematician Andrey Kolmogorov in 1933:

1, The probaility of an observation (a die role, a particular height, and so on) is a non-negative, finite number between 0 and 1.

2. The probability of at least one of the observations in the space of all possible observations occurring is 1.

3. The joint probabiulity of distinct, mutuallyu exclusive events is the sum of the probability of the individual events.

While these rules might seem abstract, you will see in Chapter 3, Building Blocks of Deep Neural Networks, that they have direct erelevance to developing neural network models. For example, an application of rule 1 is to generate the probability between 1 and 0 for a particular outcome in a softmax function for predicting target classes.

Rule 3 is used to normalize these outcomes into the range 0-1, under the guarantee that they are mutually distinct predictions of a deep neural network(in other words, a real-world image logically can't be classified as both a dog and a cat, but rather a dog or a cat, with the probability of these two outcomes additive). Finally, the secound rule provides the theoretical guarantees that we can generate data at all using these models.

However, in the context of machine learning and modeling, we are not usually interested in just the probability of observing a piece of input data, X; we instead want to knoow the conditional probability of an outcome, Y, given the data, X. In other words, we want to know how likely a label is for a set of data, based on that data. We write this as the probability of Y given X, or the probability of Y conditional on X:

P(Y|X)

Another question we could ask about Y and X is how likely they are to occur together or their joint probability, which can be expressed using the preceding conditional probability expression as follows:

P(X,Y) = P(Y|X)P(X) =P(X|Y)(Y)

This formula expressed the probability of X and Y. In the case of X and Y being completely independent of one another, this is simple their product:

P(X|Y)P(Y) = P(Y|X)P(X) = P(X)P(Y)

You will see that these expressions become important in our discussion of complementary priors in Chapter 4, Teaching Networks to Generate Digits, and the ability of restricted Boltzmann machines to simulate independent data samples.

They are also important as building blocks of Bayes' theorem, which we will discuss next.

2022년 2월 4일 금요일

Implementing generative models

While generative models could theoretically be implemented using a wide variety of machine learning algorithms, in practice, they are usually built with deep neural networks, which are well suited to capturing complex variations in data such as images or language.

In this book, we will focus on implementing theses deep generative models for many different applications using TensorFlow2.0. TensorFlow is a C++ framework, with APIs in the Python programming language, used to develop and productionize deep learning modles. It was open sourced by Google in 2013, and has become one of the most popular libraries for the research and deployment of neural network models.

With the 2.0 release, much of the boilerplate code that characterized development in earlier versions of the library was cleaned up with high-level abstractions, allowing us to focus on the model rather than the plumbing of the computations.

The latest version also introduced the concept of eager execution, allowing network computations to be run on demand, which will be an important benefit of implementing some of our models.

In upcoming chapters, yuuou will learn not only the underlying theory behind these models, but the practical skills needed to implement them in popular programming frameworks. In Chapter 2, Setting up a TensorFlow Lab, you will learn how to set up a cloud environment that will allow you to run TensorFlow in a distributed fashion, using the Kubeflow framework to catalog your experiments.

Indeed, as I will describe in more detail in Chapter 3, Building B locks of Deep Neural Networks, since 2006 an explosion of research into deep learning using large neural network models has produced a wide variety of generative modeling applications.

The first of these was the restricted Boltzmann machine, which is stached in multiple layers to create a deep belief network. I will describe both of these models in Chapter 4, Teaching Networks to Generate Digits. Later innovations incluided Variational Autoencoders(VAEs), which can efficiently generate complex data samples from random numbers, using techniques that I will describe in Chapter 5, Painting Pictures with Neural Networks Using VAEs.

We will also cover the algorithm used to create The Portrait of Edmond Belamy, the GAN, in more detail in Chapter6, Image Generation with GANs, of this book.

Conceptually, the GAN model creates a competition between two neural networks. One(termed the generator) produces realistic(or, in the case of the experiments by Obvious, artistic) images starting from a set of random numbers and applying a mathematical transformation. In a sense, the generator is like an art student, producing new paintings from brushstrokes and creative inspiration.

The second network, known as the discriminator, attempts to classify whether a picture comes from a set of real-world images, or whether it was created by the generator. Thus, the discriminator acts like a teacher, grading whether the student has produced work comparable to the paintings they are attempting to mimic. As the generator becomes better at fooling the discriminator, its ouput becomes closer and closer to the historical examples it is designed to copy.

There are many classes of GAN models, with additional variants covered in Chapter7, Style Transfer with GANs, and Chapter 11. Composing Music with Generative Models, in our discussion of advanced models. Another key innovation in generative models is in the domain of natural language data. By representing the complex interrelationship between words ina sentence in a computationally scalable way the Transformer network and the Bidirectional Encoder from Trasformers(BERT) model built on top of it present powerful building block to generate textual data in applications such as chatbots, which we'll cover in more detail in Chapter 9, The Rise of Methods for Text Generation, and Chapter 10, NLP2.0: Using Transformers to Generate Text

In Chapter 12, Play Video Games with Generative AI: GAIL, you will also see how models such as GANs and VAEs can be used to generate not just images or text, but sets of rules that allow game-playuing networks developed with reinforcement learning algorithms to process and navigate their enviroment more efficientlyin essence, learning to learn. Generative models are a huge field or research that is constantly growing, so unfortunately, we can't cover every topic in this book. For the interested reader, references to further topics will be provided in Chapter 13, Emerging Applications in Generative AI.

To get started with come background information, let's discuss the rule of probability.

Discriminative and generative models

Theses other example of AI differ in an important way from the model that generated The Portrait of Edmond Belamy. In all of thes other applications, the model is presented with a set of inputs-data such as English text, imatges from X0rays, or the positions on a gameboard- that is paired with a target output, such as the next word in a translated sentence, the diagnostic classification of an X-ray, or the next move in a game. Indeed, this is probaly the kind of AI model you are most familiar with from prior expreiences of predictive modelling; they are broadly knows as discriminative models, shose purpose is th create a mapping between a set of input variables and a target output. The target output could be a set of discrete classes(such as which word in the Englkish language appears next in a translation), or a continuous outcome(such as the expected amount of money a cuatomer will spend in a online store over the next 12 months).

In should be noted that this kind of model, in which data is labeled or scored, represents only half the capabilities of modern machine learning. Another class of algorithms, such as the one that generated the artificial portrait sold at Christie's, don't compute a score or label from input variables, but rather generate new data. Unlike discriminative models, the input variables are often vectors of numgbers that aren't related to real-world values at all, and are often even randomly generated.

This kind of model-known as a generative model-can produce complex outputs such as text, music, or images from random noise, and is the topic of this book.

Even if you didn't know it at the time, you have probably seen other instances of generative models in the news alongside the discriminative example given earlier.

A prominent example is deep fakes, whcih are videos in which one person's face has been systematically replaced with another's by using a neural network to remap the pixels.

Maybe you have also seen stories about AI models that generate fake news, which scientists at the firm OpenAI were initially terrified to release to the public due to concerns they could be used to create propaganda and misinformation online.

In these and other applications, such as Google's voice assistant Duplex, which can make a restaurant reservation by dynamically creating a conversation with a human in real time, or software that can generate original musical compositions, we are surrounded by the outputs of generative AI algorithms.

These models are able to handle complex information in a variety of domains: creating photorealistic images or styleistic filters on pictures(Figure 1.4), synthetic sound, conversational text, and even rules for optimally playing video games. You might ask, where did these models come from? How can I implement them myself?

We will discuss more on that in the next section.

Applicatioins of AI

In New York City in October 2018, the inernational auction house Christies's sold the Portrait of Edmond Belamy(Figure 1.1) during the show Prints & Multiples for $432,500.00. This sale was remarkable both because the sale price was 45 times higher than the initial estimates for the piece, and due to the unusual origin of this portrait. Unlike the majority of other artworks sold by Christie's since the 18th century, the Portrait of Edmond Belamy is not painted using oil or watercolors, nor is its creator even human; rather, it is an entirely digital image produced by a sophisticated maching learning algorithm. The creators - a Paris-based collective named Obvious- used a collection of 15,000 portraits created between the 14th and 20th centuries to tune an artificial neural network model capable of generating aesthetically similar, albeit synthetic, images.

Portraiture is far from the only area in which machine learning has demonstrated astonishing results. Indeed, it you have paid attention to the news inthe last few years, you have likely seen many stories about the ground-breaking results of modern ALsystems applied to diverse problems, from the hard sciences to digital art.

Deep neural network models, such as the one created by Obvious, can now classify X-ray images of human anatomy on the level of trained physicians, beat human masters at both classic board games such as Go(an Asian game similar to chess) and multiplayer computer games, and translate French into English with amazing sensitivily to grammatical nuances.

1. An Introduction to Generative AI:"Drawing" Data from Models

In this chapter, we will dive into the various applications of generative models.

Before that, we will take a step back and examine how exactly generative models are different from other types of machine learning. The difference lies with the basic units of any machine learning algorithm: probability and the various ways we use mathematics to quantify the shape and distribution of data we encounter in the world.

In the rest of this chapter, we will cover:

- Application of AI

- Discriminative and generative models

- Implementing generative models

- The rules of probability

- Why use generative models?

- Unique challenges of generative models

2022년 2월 1일 화요일

1.3 MODELS OF A NEURON

A neuron is an information-processing unit that is fundamental to the operation of a neural network. The block diagram of Fig. 1.5 shows the model of a neuron, which forms the basis for designing (artificial)neural networks. Here we identify three basic elements of the neuronal model:

1. A set of synapses or connecting links, each of which is characterized by a weight or strength of its own. Specifically, a signal xj at the input of synapes j connected to neuron k is multiplied by the synaptic weight wkj. It is important to make a note of the manner in which the subscripts of the synaptic weight wkj are written; the first subscript refers to the neuron in question and the second subscript refers to the input end of the synapse to which the weight refers. Unlike a synapse in the brain, the synaptic weight of an artificial neuron may lie in a range that includes negative as well as positive values.

Benefits of Neural Networks

It is apparent that a neural network derives its computing power through, first, its massively parallel distributed structure and, second, its ability to learn and therefore generalize. Generalization refers to the neural network producing reasonable outputs for inputs not encountered during training(learning). These two information-processing capabilities make it possible for neural networks to solve complex(large-scale) provide the solution by working individually. Rather, they need to be integrated into a consistent system engineering approach. Specifically, a complex problem of interest is decomposed into a number of relatively simple tasks, and neural networks are assigned a subset of the tasks that match their inherent capabilities. It is important to recognize, however, that we have a long way to go(if ever) before we can build a computer architecture that mimics a human brain.

The use of neural networks offers the following useful properties and capabilities:

1. Nonlinearity. An artificial neuron can be linear or nonlinear. A neural network, made up of an interconnection of nonlinear neurons, is itself nonlinear. Moreover, the nonlinearity is of special kind in the sense that it is distributed throughout the network. Nonlinearity is a highly important property, particularly if the underlying physical mechanism responsible for generation of the input signal(e.g., speech signal) is inherently nonlinear.

2. Iuput-Output Mapping. A popular paradigm of learning called learning with a teacher or supervised learning involves modification of the synaptic weights of a neural network by applying a set of labeled training samples or task examples. Each example consists of a unique input signal and a corresponding desired response. The network is presented with an example picked at random from the set, and the synaptic weights(free parameters) of the network are modified to minimize the difference between the desired response and the actual response of the network produced by the input signal in accordance with an appropriate statistical criterion. The training of the network is repeated for many examples in the set until the network reaches a steady state where there are no further significant changes in the synaptic weights. The previously applied training examples may be reapplied during the training session but in a different order.

Thus the network learns from the examples by constructing an input-output mapping for the problem at hand. Such an approach brings to mind the study of nonparametric statistical inference, which is branch of statistics dealing with model-free estimation, or, from a biological viewpoint, tabula rasa learning (Geman et. al., 1992); the term "nonparametric" is used here to signify the fact that no prior assumptions are made on a statistical model for the input data. Consider, for example, a pattern classification task, where the requirement is to assign an input signal representing a physical object or event to one of several prespecified categories (classes). In a nonparametric approach to this problem, the requirement is to "estimate" arbitrary decision boundaries in the input signal space for the pattern-classification task using a set of examples, and to do so without invoking a probabilistic distribution model. A similar point of view is implicit in the supervised learning paradigm, which suggests a close analogy between the input-output mapping performed by a neural network and nonparametric statistical inference.

3. Adaptivity. Neural networks have a built-in capability to adapt their synaptic weights to changes in the surrounding environment. In particular, a neural network trained to operate in a specific environment can be easily retrained to deal with minor changes in the operating environmental conditions. Moreover, when it is operating in a nonstationary environment (i.3., one where statistics change with time), a neural network can be designed to change its synaptic weights in real time. The natural architecture of a neural network for pattern classification, signal processing, and control applications, coupled with the adaptive capability of the network, make it a useful tool in adaptive pattern classification, adaptive signal processing, and adaptive control. As a general rule, it may be said that the more adaptive we make a system, all the time ensuring that the system remains stable, the more robust its performance will likely be when the system is required to operate in a nonstationary environment. It should be emphasized, however, that adaptivity does not always lead to robustness; indeed, it may do the very opposite, For example, an adaptive system with short time constants may change raplidly and therefore tend to respond to spurious disturbances, causing a drastic degradation in system performance. To realize the full benefits of adaptivity, the principal time constants of the system should be long enough for the system to ignore spurious disturbances and yet short enough to respond to meaningful changes in the environment; the problem described here is referred to as the stability-plasticity dilemma(Grossberg, 1988b).

4. Evidential Response. In the context of pattern classification, a neural network can be designed to provide information not only about which particular pattern to select, but also about the confidence in the decision made. This latter information may be used to reject ambiguous patterns, should they arise, and thereby improve the classification performance of the network.

5. Contextual Information. Knowledge is represented by the very structure and activation state of a neural network. Every neuron in the network is potentially affected by the global activity of all other neurons in the network. Consequently, contextual information is dealt with naturally by a neural network.

6. Fault Tolerance. A neural network, implemented in hardware form, has the potential to be inherently fault tolerant, or capable of robust computation, in the sense that its performance degrades gracefully under adverse operating conditions. For example, it a neuron or its connecting links are damaged, recall of a stored pattern is impaired in quality. However, due to the distributed nature of information stored in the network, the damage has to be extensive before the overall response of the network is degraded seriously. Thus, in principle, a neural network exhibits a graceful degradation in performance rather than catastrophic failure. There is some empirical evidence for robust computation, but usually it is uncontrolled. In order to be assured that the neral network is in fact fault tolerant, it may be necessary to take corrective measures in designing the algorithm used to train the network(Kerlirzin and Vallet, 1993).

7. VLSI Implementability. The massively parallel nature of a neural network makes it potentially fast for the computation of certain tasks. This same feature makes a neural network well suited for implementation using very-large-scale-integrated(VLSI) technology. One particular beneficial virture of VLSI is that it provideds a means of capturing truly complex behavior in a highly hierarchical fashion(Mead, 1989).

8. Uniformity of Analysis and Design. Basically, neural networks enjoy universality as information processors. We say this in the sense that the same notation is used in all domains involving the application of neural networks. This feature manifests itself in different ways:

- Neureaons, in one form or another, represent an ingredient common to all neural networks.

- This commonality makes it possible to share theories and learning algorithms in different applications of neural networks.

- Modular networks can be built through a seamless integration of modules.

9. Neurobiological Analogy. The design of a neural network is motivated by analogy with the brain, which is a living proof that fault tolerant parallel processing is not only physically possibile but also fast and powerful. Neurobiologists look to (artificial) neural networks as a research tool for the interpretation of neuraobiological phenomena. On the other hand, engineers look to neurobiology for new ideas to solve problems more complex than those based on conventional hard-wired design techniques. These two viewpoints are illustrated by the following two respective examples:

- In Anastasio(1993), linear system models of the vestibulo-ocular reflex are compared to neural network models based on recurrent networks that are described in Section 1.6 and discussed in detail in Chapter 15. The vestibulo-ocular reflexvisual(i.e., retinal) image statbility by making eye rotations that are opposite to head rotations. The VOR is mediated by premotor neurons in the vestibular nuclei that receive and process head rotation signals from vestibular sensory neurons and send the results to the eye muscle motor neurons. The VOR is well suited for modeling because its input (head rotation) and its output(eye rotation) can be precisely specified. It is slso a relatively simple reflex and the neurophsysiological properties of its constituent neurons have been well described.

Among the three neural types, the premotor neurons(reflex interneurons) in the vestibular nuclei are the most complex and therefore most complex and therefore most interesting. The VOR has previously been modeled using lumped, linear system descriptors and control theory. These models were useful in explaining some of the overall properties of the VOR, but gave little ingight into the properties of its consituent neurons. This sistuation has been greatlyu improved through neural network modeling. Recurrent network models of VOR(programmed using an algorithm called real-time recurrent learning that is described in Chapter 15) can reproduce and ehlp explain many of the static, dynamic, nonlinear, and distribueted aspects of signal processing by the neurons that mediate the VOR, especially the vestibular nuclei neurons(Anastasio, 1993).

- The retina, more than any other part of the brain, is where we begin to put together the relatiionships between the outside world represented by a visual sense, its physiucal image projected onto an array of receptors, and the first neural images. The retina is a thin sheet of neural tissue that lines the posterio hemisphere of the eyeball. The retina's tgask is to convert an optical image into a neural image for transmission down the optic nerve to a multitude of centers for further analysis. This is a complex task, as evidenced by th synaptic organization for the retian. In all vertebvrate retinas the transformation from optical to neural image involves three stages(Sterling, 1990):

(i) Photo transduction byu a layer of receptor neurons.

(ii) Transmission of the resulting signals (produced in response to light) by chemical synapses to a layer of bipolar cells.

(iii) Transmission of these signals, also by chemical synapses, to ouput neurons that are called ganglion cells.

At both synaptic stages (i.e., from receptor to bipolar cells, and from bipolar to ganglion cells), there are specialized laterally connected neurons called horizontal cells and amacrine cells, respectively. The task of these neuyrons is to modify the transmission across the synaptic layuers. There are also centrifugal elements called inter-plexiform cells; their task it to convery siugnals from the innner synaptic layer back to the outer one. A few researchers have built electronic chips that mimic the structure of the retina (Mahowald and Mead, 1989; Boahen and Ardreou, 1992; Boahen, 19956). These electronic chips are called neuromorphic integrated circuits, a term conined by Mead(1989). A neuromorphic imageing sensor consists of an arrray of photoreceptors combined with analog circuitry at each picture element(pixel). It emulates the retina in that it can adapt locally to changes in brightness, detect edges, and detect motion. The neurobiological analyogy, exemplified by neuromorphic integrated circuits is useful in another important way: It provides a hope and belief, and to a certain extents and existence of proof, that physicall u nderstanding of neurobiological structures could have a productive influence on the art of electronics and VLSI technology.

With inspireation from neurobiology in mind, it seems appropriate that we take a brief look at the human brain and its structural levels of organization.

1.2 HUMAN BRAIN

The human nervous sy stem may be viewed as a thress-stage system, as depicted in the block diagram of Fig. 1.1 (Arbib, 1987). Central to the system is the brain, represented by the neural (nerve) net, which continually receives information, perceives it, and makes appropriate decisions. Two sets of arrows are shown in the figure. Those pointing from left to right indicate the forward transmission of information-bearing signals through the system. The arrows pointing from right to left signify the presence of feedback in the system. The receptors convert stimuli from the human body or the external environment into electrical impulses generated by the neural net into discernible responses as system outputs.

The struggle to understand the brain bas been made easier because of the pioneering work of Ramon y Cajal(1911), who introduced the idea of neurons as structural constituents of the brain. Typically, neurons are five to six orders of magnitude slower than silicon logic gates; events in a silicon chip happen in the nanosecond(10-9 s) range, whereas neural events happen in the millisecound(10-3)range. However, the brain makes up for the relatively slow rate of operation of a neuron by having a truly staggering number of neurons (nerve cells) with massive interconnections between them. It is estimated that there are approximately 10 billion neurons in the human cortex, and 60 trillion synapses or connections (Shepherd and Koch, 1990). The net result is that the brain is an enormouslyu efficient structure. Specificallyu, the energetic efficiency of the brain is approximately 10-16 joules(J) per operation per secound, whereas operation per secound(Faggin, 1991).

Synapses are elementary structural and functional units that mediate the interactions between neuraons. The most common kind of synapse is a chemical synapse, which operates as follows. A presynaptic process librates a transmitter substance tghat diffuses across the synaptic junction between neurons and then acts on a postsynaptic process.

Thus a synapse convers a presynaptic electrical signal into a chemical signal and then back into a postsynaptic electical signal (Shepherd and Koch, 1990). In electrical termiology, such an element is said to be a nonreciprocal two-port device. In traditional descriptions of neural organization, it is assumed that a synapse is a simple connnection that can impose excitation or inhibition, but not both on the receptive neuron.

Earlier we mentioned that plasticity permits the developing nervous system to adapt to its surrounding environment(Eggermont, 1990; Churchland and Sejnowsi, 1992). In an adult brain, plasticity may be accounted for by two mechanisms: the creation of new synaptic connections between neurons, and the modification of existing synapses. Axons, the transmission lines, and dendrites, the receptive zones, constitute two types of cell filaments that are distinguished on morphological grounds; and axon has a smoother surface, fewer branches, and greater length, whereas a dendrite (so called becaouse of its resemblance to a tree)has an irregular surface and more branches(Freeman, 1975). Neurons come in a wide variety of shapes and sizes in differenct parats of the brain. Figure 1.2 illustrates the shape of a pyramidal cell, which is one of the most common types of cortical neurons. Like many other types of neuraons, it receives most of tis inputs though dendritic spiness; see the segment of dendrite inthe insert in Fig. 1.2 for detail. The pyramidal cell can receive 10,000 or more synaptic contacts and it can project onto thousands of thousands of target cells.

The majority of neurons encode their outputs as a series of brief voltage pulses, These pulses, commonly known as action potentials or spikes, originate at or close to the cell body of neurons and then propagate across the individual neurons at constant velocity and amplitude. The reasons for the use of action potentials for communication among neurons are based on the physics of axons. The axon of a neuron is very long and thin and is characterized by high electrical resistance and very large capacitance.

Both of these elements are distributed across the axon. The axon may therefore be modeled as an RC transmission line, hence the common use of "cable equation" as the terminology for describing signal propagation along an axon. Analysis of this propagation mechanism reveals that when a voltage is applied at one end of the axon it decays exponentially with distance, dropping to an insignificant level by the time it reaches the other end. The action potentials provide a way to circumvent this transmission problem(Anderson, 1995).

In the brain there are both small-0scale and large-scale anatomical organizations, and different functions take place at lower and higher levels. Figure 1.3 shows a hierarchy of interwoen levels of organization that has emerged from the extensive work done on the analysis of local regions in the brain (Shepherd and Koch, 1990; Churchland and Sejnowsk, 1992).The synapses represent the most fundamental level, depending on molecules and ions for then neurons. A neural microcircuit refers to an assembly of synapses organized into patterns of connectivity to produce a functional operation of interest. A neural microcircuit may be likened to a silicon chip made up of an assembly of transistors. The smallest size of microcircuits is measured in micrometers(um), and their fastest speed of operation is measured in milliseconds. The neural microcircuits are grouped to form dendritic subunits within the dendritic trees of individual neurons. The whole neuraon, about 100um in size, contains serveral dendritic subunits. At the next level of complexity we have local circuits (about 1mm in size) made up of neurons with siumilar or different properties; theses neural assemblies perform operations characteristic of a localized region in the brain. This is followed by interregional circuits amde up of pathways, columns, and topographic maps, which involve multiple regions located in different parts of the brain.

Topographic maps are organized to respond to incoming sensory information. These maps are often arranged in sheets, as in the superior colliculus, where the visual, auditory, and somatosensory maps are stacked in adjacent layers in such a way that stimuli from corresponding points in space lie abobe or below each other. Figure 1.4 presents a cytoarchitectural map of the cerebral cortex as worked out by Brodmann (Brodal, 1981),. This figure shows clearly that different sensory inputs (motor, somatosensory, visual, auditory, etc.) are mapped onto corresponding areas of the cerebral cortex in an orderly fashion. At the final level of complexity, the topographic maps and other interregional circuits mediate specific types of behavior in the central nervous system.

It is important to recognize that the structural levels or organization described herein are a unique characteristic or the brain. They are nowhere to be found in a digital computer, and we are nowhere close to re-creating them with artificial neural networks. Nevertheless, we are inching out way toward a hierarch of computational levels similar to that described in Fig. 1.3. The artificial neurons we use to build our neural networks are truly primitive in comparision to those found in the brain. The neural networks we are presently able to design are just as primitive compared to the local circuits and the interregiional circuits in the brain. What is really satisfying, however, is the remarkable progress that we have made on so many fronts during the past two decades. With neurobiological analogy as the dource of inspiration, and the wealth of theoretical and technological tools that we are bringing together, it is certain that in another decade our understanding of artificial neural networks will be much more sophisticated than it is today.

Our primary interest in this book is confined to the stydy of artificial neuralk networks from an engineering perspective. We begin the study by describing the models of (artificial) neurons that from the basis of the neural networks considered in subsequent chapters of the book.