페이지

2022년 3월 17일 목요일

Importing CIFAR

 Now that we've discussed the underlying theory of VAE algorithms, let's start building a practical example using a real-world dataset. As we discussed in the introduction, for the experiments in this chapter, we'll be working with the Canadian Institute for Advanced Research (CIFAR) 10 dataset. The images in this dataset are part of a larger 80 million "small image" dataset, most of which do not have class labels like CIFAR-10, the labels were initially created by student volunteers, and the larger tiny images dataset allows researchers to submit labels for parts of the data.

Like the MNIST dataset, CIFAR-10 can be downloaded using the TEnsorFlow dataset's API:

import tensorflow.compat.v2 as tf

import tensorflow_datasets as tfds

cifar10_builder = tfds.builder("cifar10")

cifar10_builer.download_and_prepare()

This will download the dataset to disk and make it available for our experiments. To split it into training and test sets, we can use the following commands:

cifar10_train = cifar10_builder.as_dataset(split="train")

cifar10_test = cifar10_builder.as_dataset(split="test")

Let's inspect one of the images to see what format it is in:

cifar10_train.take(1)

The output tells us that each image in the dataset is of format  <DatasetV1Adapter shapes: {image: (32,32,3), label: ()}, types: {image:tf.uint8, label: tf.int64}>:Unlike the MNIST dataset we used in Chapter 4, Teaching Networks to Generate Digits, the CIFAR images have three color channels, each with 32 * 32 pixels, while the label is an integer from 0 to 9(representing on of the 10 classes). We can also plot the images to inspect them visually:

from PIL import Image

import numpy as np

import matplotlib.pyplot as plt

for sample in cifar10_train.map(lambda x: flatten_image(x, label=True)).take(1):

    plt.imshow(sample[0].numpy().reshape(32,32,3), astype(np.float32), cmap=plt.get_cmap("gray"))

    print("Label:L %d" % sample[1].numpy())

This gives the following outpt:

Like the RBM model, the VAE model we'll build in this example has an output scaled between 1 and 0 and accepts flattened versions of the images, so we'll need to turn each image into a vector and scale it to maximum of 1:

def flattern_image(x, label=False):

    if label:

        return (tf.divide(tf.dtypes.cast(tf.reshape(x["image"], (1, 32*32*3)), tf.floate32), 256.0), x["label"])

    else:

        return ( tf.divide(tf.dtypes.cast(tf.reshape(x["image"], (1, 32*32*3)), tf.float32), 256.0))

This results in each image being a vector of length 3072(32*32*3), which we can reshape once we've run the model to examine the generated  images.

댓글 없음: