What other kinds of distributions might we be interested in? While useful from a theoretical perspective, one of the shortcomings of the Hopfield network is that it can't incorporate the kinds of uncertainty seen in actual physical or biological systems, rather than deterministically turning on or off, real-world problems often involve an element of chance - a magnet might flip polarity, or a neuron might fire at random.
This uncertainty, or stochasticity, is reflected in the Boltzmann machine, a variant of the Hopfield network in which half the neurons (the "visible" units) receive information from the environment, while half(the "hidden" units) only receive information from the visible units.
The Boltzmann machine randomly turns on(1) or off(0) each neuron by sampling, and over many iterations converges to a stable state represented by the minima of the energy function. This is shown schematically in Figure 4.6, in which the white nodes of the network are "off," and the blue ones are "on," if we were to simulate the activations in the network, these values would fluctuate over time.
In theory, a model like this could be used, for example, to model the distribution of images, such as the MNIST data using the hidden nodes as a "barcode" that represents an underlying probability model for "activation" each pixel in the image. In Practice, though, there are problems with this approach. Firstly, as the number of units in the Boltzmann network increases, the number of connections increases exponentially (for example, the number of potential configurations that has to be accounted for in the Gibbs measure's normalization constant explodes), as does the time needed to sample the network to an equilibrium state. Secondly, weights for units with intermediate activate probabilities(not strongly 0 or 1) will tend to fluctuate in a random walk pattern (for example, the probabilities will increase or decrease randomly but never stabilize to an equilibrium value) until the neurons converge, which also prolongs training.
A practical modification is to remove some of the connections in the Boltzmann machine, namely those between visible units and between hidden units, leaving only connections between the two types of neurons. This modification is known as the RBM, shown in Figure 4.7.
Imagine as described earlier that the visible units are input pixels from the MNIST dataset, and the hidden units are an encoded representation of that image. By sampling back and forth to convergence, we could create a generative model for images. We would just need a learning rule that would tell us how to update the weights to allo the energy function to converge to its minimum; this algorithm is contrastive divergence(CD). To understand why we need a special algorithm for RBMs, it helps to revisit the energy equation and how we might sample to get equilibrium for the network.
댓글 없음:
댓글 쓰기