Lecture 13: Generative Models
Unsupervised Learning
- Just data, no labels, which means training is cheap.
- Learn some underlying hidden structure of the data.
Generative Model
Given training data, generate new samples from same distribution.
Taxonomy of Generative Models
PixelRNN and PixelCNN
Explicit density model, which use chain rule to decompose likelihood of an image x into product of 1-d distributions:
-
On the left is the likelihood of image x, while on the right is the probabilities of ith pixel value given all previous pixels.
-
What we want to do is to maximize the likelihood of the training data.
-
But how do we know about the complex distribution over pixel values? Express them using a neural network.
-
Also, we need to define ordering of "previous pixels".
PixelRNN
- Generate image pixels starting from corner.
- Dependency on previous pixels modeled using an RNN(LSTM).
- Drawback: Sequential generation is slow.
PixelCNN
- Still generate image pixels starting from corner.
- Dependency on previous pixels now modeled using a CNN over context region.
- Training: Maximize likelihood of training images.
- Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images).
- Generation must still proceed sequentially => still slow.
Pros and Cons
Variational Autoencoder(VAE)
PixelCNNs define tractable density function, optimize likelihood of training data:
VAEs define intractable density function with latent z:
\(z\)可以认为是一些提取出来的feature.
Cannot optimize directly, derive and optimize lower bound on likelihood instead, which will be discussed later.
Autoencoders
Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data.
我们使用诸如CNN的手段提取出feature Z, 用于捕捉图像中的meaningful factor.
通过训练, 我们希望feature Z能重构出原始图像, 使用L2 loss:
训练之后, 我们throw away decoder部分, 只保留encoder部分用于提取图像的特征Z.
Variational Autoencoders
使用encoder生成z的分布, 然后根据均值和方差采样生层z, 再通过decoder生成重构图像的均值和方差, 采样生成重构图像.
这里是\(p_\theta(x^{(i)})\)表达式的推导过程. 最后一行中,前两项可以认为是tractable的lower bound, which we can take gradient of and optimize. 第三项是intractable的, 但是其\(\geq 0\). 所以我们在训练时优化前两项即可, 相当于最大化训练数据集上概率的lower bound.
Info
Example
![]() |
![]() |
---|---|
Pros and Cons
Generative Adversarial Networks(GAN)
What if we give ip on explicitly modeling density, and just want ability to sample?
GANS: don't work with any explicitly density function, instead, take game theoretic approach: learn to generate from training distribution through 2-player game.
我们需要训练两个网络, 一个generator, 一个discriminator, 二者同时训练.
discriminator希望最大化这个函数, 即当输入为真实数据时, 输出为1, 当输入为生成数据时, 输出为0. generator希望最小化这个函数, 即能够成功骗过discriminator.
![]() |
![]() |
---|---|
GAN training algorithm
After training, we only use the generator network to generate new images.
Results of generation
我们可以看出generator不会直接生成一个完全和训练集某个数据相同的图像, 而是在学习training set的基础上生成新图像.
Generative Adversarial Nets: Convolutional Architectures