Lecture 13: Generative Models

Unsupervised Learning

Just data, no labels, which means training is cheap.
Learn some underlying hidden structure of the data.

Generative Model

Given training data, generate new samples from same distribution.

linear

Taxonomy of Generative Models

linear

PixelRNN and PixelCNN

Explicit density model, which use chain rule to decompose likelihood of an image x into product of 1-d distributions:

\[ p(x) = \prod_{i=1}^{n}p(x_i|x_1, ..., x_{i-1}) \]

On the left is the likelihood of image x, while on the right is the probabilities of ith pixel value given all previous pixels.
What we want to do is to maximize the likelihood of the training data.
But how do we know about the complex distribution over pixel values? Express them using a neural network.
Also, we need to define ordering of "previous pixels".

PixelRNN

linear

Generate image pixels starting from corner.
Dependency on previous pixels modeled using an RNN(LSTM).
Drawback: Sequential generation is slow.

PixelCNN

linear

Still generate image pixels starting from corner.
Dependency on previous pixels now modeled using a CNN over context region.
Training: Maximize likelihood of training images.
Training is faster than PixelRNN (can parallelize convolutions since context region values known from training images).
Generation must still proceed sequentially => still slow.

Pros and Cons

linear

Variational Autoencoder(VAE)

PixelCNNs define tractable density function, optimize likelihood of training data:

\[ p(x) = \prod_{i=1}^{n}p(x_i|x_1, ..., x_{i-1}) \]

VAEs define intractable density function with latent z:

\[ p_\theta (x) = \int p_\theta(z)p(x|z)dz \]

\(z\)可以认为是一些提取出来的feature.

Cannot optimize directly, derive and optimize lower bound on likelihood instead, which will be discussed later.

Autoencoders

Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data.

linear

我们使用诸如CNN的手段提取出feature Z, 用于捕捉图像中的meaningful factor.

linear 通过训练, 我们希望feature Z能重构出原始图像, 使用L2 loss:

\[ ||x - \hat{x}||_2^2 \]

训练之后, 我们throw away decoder部分, 只保留encoder部分用于提取图像的特征Z.

Variational Autoencoders

linear

使用encoder生成z的分布, 然后根据均值和方差采样生层z, 再通过decoder生成重构图像的均值和方差, 采样生成重构图像.

linear

这里是\(p_\theta(x^{(i)})\)表达式的推导过程. 最后一行中,前两项可以认为是tractable的lower bound, which we can take gradient of and optimize. 第三项是intractable的, 但是其\(\geq 0\). 所以我们在训练时优化前两项即可, 相当于最大化训练数据集上概率的lower bound.

Info

linear

Example

Pros and Cons

linear

Generative Adversarial Networks(GAN)

What if we give ip on explicitly modeling density, and just want ability to sample?

GANS: don't work with any explicitly density function, instead, take game theoretic approach: learn to generate from training distribution through 2-player game.

linear