In order to allow us to backpropagate through our autoencoder, we need to transform the stochastic samples of z into a deterministic, differentiable transformation. We can do this by reparameterizing z as a function of a noise variable E:
Once we have smapled from E, the randomness in z no longer depends on the parameters of the variational distribution Q(the encoder), and we can backpropagate end to end. Our network now look like Figure 5.7, and we can optimize our objective using random samples of E(for example, a standard normal distribution).
This reparameterization moves the "random" node out the encoder/decoder framework so we can backpropagate through the whole system, but it slao has a subtler advantage; it reduces the variance of these gradients. Note that in the un-reparameterized netwrk, the distribution of z depends on the parameters of the encoder distribution Q; thus, as we are changing the parameters of Q, we are also changeing the distribution of z, and we would need to potentially use a large number of samples to get a decent estimate.
By reparameterizing, z now depends only on our simpler function, g, with randomness introduced through sampling E from a standard normal (that doesn't depend on Q): hence, we've removed a somewhat circular dependency, and made the gradients we are estimating more stable:
Now that you have seen how the VAE network is constructed, let's discuss a further refinement of this algorithm that allows VAEs to sample from complex distribution;
Inverse Autoregressive Flow(IAF).
댓글 없음:
댓글 쓰기