Notations
Throughout this report, it will be made use of the following notations:
- Let \(\mathrm{KL}\,(P \| Q) = \int \log \frac{dP}{dQ} dP\).
- Let \(\mathbf{Z}= (Z_1, \dots, Z_n)\) and \(\mathbf{Y}= (Y_1, \dots, Y_n)\) be i.i.d. samples.
- Let \(p(X; \theta)\) denote the Radon-Nikodym derivative \(\frac{dP_\theta^X}{d\mu}\).
- Let \(\ell (\theta; X) = \log p(X; \theta)\) the log-likelihood.
- \(\mathbb{E}_\theta [ f(Y,Z) ] = \mathbb{E}_{(Y,Z) \sim P_\theta} [ f(Y,Z) ]\)
- \(\mathbb{E}_\psi [ f(Y,Z) | Y ] = \mathbb{E}_{Z | Y \sim Q_\psi} [ f(Y,Z) | Y ]\)
- \(\mathbb{E}_\psi [ f(Z) ] = \mathbb{E}_{Z \sim Q_\psi} [ f(Z) ]\)
- In order to write more compact expressions, we will use as often as possible the convention that non-linear functions (mainly \(\log\), \(\exp\), \(!\) and \(^2\)) applied to vector are actually applied coefficient-wise.