The wide range of generative AI models that we will implement in this book are all built on the foundation of advances over the last decade in deep learning and neural networks. While in practice we could implement these projects without reference to historical developements, it will give you a richer understanding of how and why these models work to retrace their underlying components. In this chapter, we will dive into this backgournd, showing you how generative AI models are built from the ground up, how smailer units are assembled into complex architectures, how the loss functions in these models are optimized, and some current theories as to why these models are so effective. Armed with this background knowledge, you should be able to understand in greater depth the reasoning behind the more advanced models and topics that start in Chapter 4, Teaching Networks to Generate Digits, of this book. Generally speaking, we can group the building blocks of neural network models into a number of choices regarding how the model is constructed and trained, which we will cover in this chapter:
Which neural network architecture to use:
- Perceptron
- Multilayer perceptron (MLP)/FEEDFORWARD
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Long Short-Term Memory Networks (LSTMs)
- Gated Recurrent Units (GRUs)
Which activation functions to use in the network:
- Linear
- Sigmoid
- Tanh
- ReLU
- PReLU
What optimization algorithm to use to tune the parameters of the network:
- Stochastic Gradient Descent (SGD)
- RMSProp
- AdaGrad
- ADAM
- AdaDelta
- Hessian-free optimization
How to initialize the parameters of the network:
- Random
- Xavier initialization
- He initalization
As you can appreciate, the products of these decisions can lead to a huge number of potential neural network variants, and one of the challenges of developing these models is determining the right search space witin each of these choices. In the course of describing the history of neural networks we will discuss the implications of each of these model parameters in more detail. Our overview of this field begins with the origin of the discipline: the humble perceptron model.