I practice, we do not train the generator to minimize log(1-D(G(z))) as this function does not provide sufficient gradients for learning. During the initial learning phases, where G is poor, the discriminator is able to classify the fake from the real with high confidence. This leads to the saturation of log(1-D(G(z))), which hinders improvements in the generator model. We thus tweak the generator to maximize log(D(G(z))) instead:
This provides stronger gradients for the generator to learn. This is shown in Figure 6.6. The x-axis denotes D(G(z)). The top line shows the objective, which is minimizing the likelihood of the discriminator being correct. The bottom line(updated objective) works by maximizing the likelihood of the discirimiator being wrong
Figure 6.6 illustrates how a slight change helps achieve better gradients during the initial phases of training.
댓글 없음:
댓글 쓰기