There are only so many AI-generated cat videos you can watch before you start wondering how this technology works and, more importantly, why it even exists? As someone familiar with the basics of supervised and unsupervised machine learning, I found this curiosity to be a natural segue into learning how image generation works. While the process of understanding the inner workings of these models is challenging, I assure you that once the pieces of the puzzle start to fit and you see the beauty behind the math it feels almost like watching the Mona Lisa being painted. Let me take you through my journey of understanding image generation models while focusing primarily on variational autoencoders.
High entropy → unpredictable distribution
Low entropy → predictable distribution
So finally KL-Divergence measures how inefficient it is to encode samples from P using a code optimized for Q, put simply, the difference of probability distributions between P and Q is:
Cross entropy - Entropy
which is the surprise of the believed vs observed system minus the surprise of just the believed system.
Intuitively Understanding the KL Divergence
The key Equation behind Probability
Resources:
https://youtu.be/hZ4a4NgM3u0?si=iuSxElY8t3tEHF0m
https://youtu.be/qiUEgSCyY5o?si=JrW1ed8WihDaw_FS
https://youtu.be/qJeaCHQ1k2w?si=ApWkgTLz2ew8UEl_
ELBO = Reconstruction term - Regularisation term
To conclude, the whole process took me around 10 days of dedicated effort and the one thing that stood out was the importance of the gaussian curve in image generation. This is a beautiful piece of scientific achievement. For computers to be able to “understand” and generate images so smoothly is an extraordinary feat. And as rewarding as the process is, I still don’t understand the need for this technology to be easily accessible to the general public. The risks of deepfakes and non-consensual image generations far outweighs the benefits of having an image generation software to design easy and quick prototypes. At the very least it solves Bad Bunny's need to have more photos or in Spanish Debí tirar más fotos. Hope you had a good read!


