Now that we've discussed the underlying theory of VAE algorithms, let's start building a practical example using a real-world dataset. As we discussed in the introduction, for the experiments in this chapter, we'll be working with the Canadian Institute for Advanced Research (CIFAR) 10 dataset. The images in this dataset are part of a larger 80 million "small image" dataset, most of which do not have class labels like CIFAR-10, the labels were initially created by student volunteers, and the larger tiny images dataset allows researchers to submit labels for parts of the data.
Like the MNIST dataset, CIFAR-10 can be downloaded using the TEnsorFlow dataset's API:
import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds
cifar10_builder = tfds.builder("cifar10")
cifar10_builer.download_and_prepare()
This will download the dataset to disk and make it available for our experiments. To split it into training and test sets, we can use the following commands:
cifar10_train = cifar10_builder.as_dataset(split="train")
cifar10_test = cifar10_builder.as_dataset(split="test")
Let's inspect one of the images to see what format it is in:
cifar10_train.take(1)
The output tells us that each image in the dataset is of format <DatasetV1Adapter shapes: {image: (32,32,3), label: ()}, types: {image:tf.uint8, label: tf.int64}>:Unlike the MNIST dataset we used in Chapter 4, Teaching Networks to Generate Digits, the CIFAR images have three color channels, each with 32 * 32 pixels, while the label is an integer from 0 to 9(representing on of the 10 classes). We can also plot the images to inspect them visually:
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
for sample in cifar10_train.map(lambda x: flatten_image(x, label=True)).take(1):
plt.imshow(sample[0].numpy().reshape(32,32,3), astype(np.float32), cmap=plt.get_cmap("gray"))
print("Label:L %d" % sample[1].numpy())
This gives the following outpt:
Like the RBM model, the VAE model we'll build in this example has an output scaled between 1 and 0 and accepts flattened versions of the images, so we'll need to turn each image into a vector and scale it to maximum of 1:
def flattern_image(x, label=False):
if label:
return (tf.divide(tf.dtypes.cast(tf.reshape(x["image"], (1, 32*32*3)), tf.floate32), 256.0), x["label"])
else:
return ( tf.divide(tf.dtypes.cast(tf.reshape(x["image"], (1, 32*32*3)), tf.float32), 256.0))
This results in each image being a vector of length 3072(32*32*3), which we can reshape once we've run the model to examine the generated images.
댓글 없음:
댓글 쓰기