Thursday, March 12, 2026

Debí tirar más fotos: A guide to learning Image Generation using Variational Autoencoders

There are only so many AI-generated cat videos you can watch before you start wondering how this technology works and, more importantly, why it even exists? As someone familiar with the basics of supervised and unsupervised machine learning, I found this curiosity to be a natural segue into learning how image generation works. While the process of understanding the inner workings of these models is challenging, I assure you that once the pieces of the puzzle start to fit and you see the beauty behind the math it feels almost like watching the Mona Lisa being painted. Let me take you through my journey of understanding image generation models while focusing primarily on variational autoencoders.

AI Cat Selfie: How to Do the AI Cat ...  

The first step is always to look up a video on how image generation works, but to be honest I could comprehend little jumping straight to (what I now know is) the 4th step of understanding VAE. So after conferring with Claude, here is how I approached it:

1. Refresh the Math
A solid foundation comes from understanding the basics first. A few math concepts are used heavily while making the models and they are the Bayes’ Theorem of Probability which is best revisited from the famous 3Blue1Brown youtube channel, along with refreshers on the Markov chain/ the chain rules of probability and log properties. Once these are ready in your mind: you can jump to the more esoteric KL-Divergence. This is how I understood it:

Entropy: is the measure of surprise

  • High entropy → unpredictable distribution

  • Low entropy → predictable distribution

For example if you have a biased coin where the probability of getting heads is 0.9, you would be significantly more surprised if you got tails here.

Therefore:


Cross Entropy: Cross entropy measures how well Q approximates P. Suppose you think that the coin is biased with probability Q but in reality it’s a normal coin with probability P, then you can measure the cross entropy as:


So finally KL-Divergence measures how inefficient it is to encode samples from P using a code optimized for Q, put simply, the difference of probability distributions between P and Q is:

Cross entropy - Entropy 

which is the surprise of the believed vs observed system minus the surprise of just the believed system. 




For more information please check out these resources:

Intuitively Understanding the KL Divergence

The key Equation behind Probability

KL Divergence blog




2. Understand the principles behind Autoencoders
Autoencoders are essentially a compression–decompression mechanism.

X -> Encode -> Z -> Decode -> x’

Where z is a bottleneck and a point in latent space.
Say you take an image of a cat, this image has many pixels. You train a neural network to be able to encode this image of a cat as vectors/ points in latent space so let’s say that this image of a cat is [0.13, 0.13] (though in practice it usually has dozens or hundreds of dimensions.).

The 0.13, 0.13 is the z. The bottle neck.

And then the decoder works the exact opposite way by taking z and being able to generate a similar cat x’.


Resources: 

https://youtu.be/hZ4a4NgM3u0?si=iuSxElY8t3tEHF0m

https://youtu.be/qiUEgSCyY5o?si=JrW1ed8WihDaw_FS


3. The issue with Autoencoders and need for Variational Autoencoders

Continuing the above example of our cat, say you give it an x2 which is an image of sunglasses which might be encoded as [0.15, 0.14]

Now on this latent space if I check what  [0.14, 0.14] is it will be nothing? Or frogs maybe.

But ideally it should be an image of a cat in sunglasses. So we do not encode z as a point but instead as a distribution with a mean and std. Deviation.

Standard autoencoders do not enforce any structure on the latent space. As a result, sampling or interpolating between points may produce meaningless outputs. Variational autoencoders solve this by learning a probability distribution over the latent space (typically Gaussian), which ensures smooth and meaningful generation.

Resources: 

https://youtu.be/qJeaCHQ1k2w?si=ApWkgTLz2ew8UEl_


4. Understanding ELBO
This is the core mathematics of variational autoencoders and even other probabilistic diffusion models.
This works with finding the P(X) which is the probability of the image. But we don’t know what z is because it is a latent(hidden) variable. The brute force way would be to marginalise it by adding all p(z):
p(x)=∫p(x∣z)p(z)dz

But this is intractable :( so we do something else we find the ELBO i.e. the Expectation Lower BOund.

ELBO = Reconstruction term - Regularisation term 

OR
ELBO = How well the decoder reconstruct x from z - how close is the encoder’s distribution to the standard normal
The training objective of a VAE is to maximise the ELBO or the lower bound limit, which pushes the evidence - log p(x) up and the KL gap down.


5. Code it!

Now you are finally ready to get your hands on the keyboard. I recommend starting with building a VAE on the basic MNIST dataset with PyTorch. Aladdin Perssons has a great and very easy to follow video on this topic. You can check out my github for more comments and explanations.


6. Explore
Now you can explore other models like the DDPM, DDIM, etc.

To conclude, the whole process took me around 10 days of dedicated effort and the one thing that stood out was the importance of the gaussian curve in image generation. This is a beautiful piece of scientific achievement. For computers to be able to “understand” and generate images so smoothly is an extraordinary feat. And as rewarding as the process is, I still don’t understand the need for this technology to be easily accessible to the general public. The risks of deepfakes and non-consensual image generations far outweighs the benefits of having an image generation software to design easy and quick prototypes. At the very least it solves Bad Bunny's need to have more photos or in Spanish Debí tirar más fotos. Hope you had a good read!




Tuesday, November 18, 2025

Correlation and Causation

I have a pink top that I consider unlucky because every time I wore it, I had a terrible day. As Michael Scott would say, "I am not superstitious, but I am a little stitious". So I stopped wearing it since I realized this. Many a times in life my friends and I have behaved in a similar way, found two entities that correlate and implied that there was causation. Well what do those two things mean?

Correlation is when two variables either positively or negatively change together. There is association between them, when one changes, so does the other.

Causation is when two variables have a cause effect relation. One is directly affecting the other.

However, Correlation does NOT imply causation. Just because two things seem to be, does not mean that they necessarily are related. Let's look at some fun examples. Below attached is a graph of per capita consumption of cheese in the USA and Number of people who have died by becoming tangled in their bedsheets. This clearly shows a correlation between the two, if we didn't know any better one would say that eating cheese is causing people to strangle themselves in their sleep! 


Source: Per capita consumption of cheese (US) correlates with Number of people who died by becoming tangled in their bedsheets (tylervigen.com) 


Saturday, February 19, 2022

Karma is a bitch, proved by data

Recently, I have had a lot going on in my mind and it all seems to boil down to whether or not I consider myself to be a Good Person.



Now you see there’s no calculative way to find out whether one is a Good or Bad person, you just have to rely on what your brain reasons you to be, but calculations just make everything so much easier. And that is when I watched Michael Schur's “The Good Place”.

And I thought to myself, why not give this an objective try. Therefore here I’ve designed myself an Experiment.

Our Aim is to find out :

I)If I am a Good/ Bad person?

II)Is Karma real?


Here’s how we will find out, Firstly I will log every action I do into the Actions spreadsheet and Log every Reward I feel on a separate spread sheet for a week and then we apply the process of Data Analysis on it!


So let’s see how this goes,


Note:

Inherent rewards like my privileges by birth are not included, neither are long term bad/good actions.

The time mentioned is the time the action was logged in most cases.

Some actions are redacted due to privacy reasons.

To see the working please check sheet 1, 2 and 3 in order and then the final sheet and observation sheet.

Incase of any confusion, a peer reviewed the action and I have tried my best to keep the bias to a minimum by not logging points immediately.

Link to see working on spreadsheets:

Actions Sheet - Google Sheets
Rewards Sheet- Google Sheets



Procedure:

1. Log any and every significant action like getting on a train to go to college and every positive/negative reward like eating lays with the date and time for a week.

2. Added an extra Denomination column to rank good things(1), neutral things(0), bad things(-1).

3. Sort sheet to compare and give ranks to all actions.

4. Give points to each with increments of 10 to be able to calculate the total.

5. In the final sheet combine the two sheets to make one table. (Check Final Sheet in Actions sheet)



6. Calculate the actions total and rewards total and compare.

7. Plot the table on a graph to analyze the working of karma (if bad things I did equated to bad things that happened to me and vice versa).


Result:

1. Graph of Actions(blue) vs. Rewards(Red)

                    above the x- axis                below the x-axis                     
blue = good action              blue= bad action
red= good reward                red= bad reward


One of the most shocking discoveries was how affective karma has been in my day to day. Especially looking at the entry of 10th February. The graph clearly points to how good things I did were equivalent to getting good rewards and bad things to bad rewards with almost equal intensities.



2.
Sum total of Actions = 1130
Sum total of Rewards = 550
So overall, I logged doing significantly more good things than bad, with the worst action being lying to my father marked at -70 and best being cleaning my mom's room for her at 100 points. 
Equivalently, more good things happened to me than bad, like being gifted a great pop socket at 100 points and the negative being feeling insecure about my body at -50.
3. Eating good food and talking to my friends before bed was mostly followed by "nice night" reward, implying that I'm in a good mood if I talk to certain people at night.


Conclusion:

KARMA IS SO REAL and I'm an okay enough person.

Even though this experiment did not really solve my problem of whether I'm a morally good person or not, but it surely made me consciously do better. However, a very wise friend did tell me,

"what good we do, we can never be sure that it accounts to plus points because we never know the consequences. However if the thought comes to your mind that you might not be a good person, solidifies you as a great person capable enough to question your own morality." So that has brought me some comfort as I continue to try to be better.


Saturday, January 15, 2022

Normal Standard Deviation in Life

 


The Normal Standard Deviation graph always fascinates me. In the video above you'll see a Galton Board, one where multiple balls always fall in a bell curve. No matter how many times the board is flipped it always lands the same, the events being completely random. When I explored this a little more I found this pattern in even more datasets. 


To investigate I took a look at the JEE Main test Marks. Here is a graph of the Number of people (population in ranks) VS the Marks. 

Data Source: JEE Main 2021 Marks vs Rank - Predict your Rank [Rank vs Marks Analysis] (byjus.com)


 Which on further inspection looks like... you guessed it, a Normal Standard Deviation Curve.



Link to dataset to see working: JEE scores 2020

Where the median is, approximately 32. So, the graph does show that the majority of the population scored ...  just AVERAGE. 


These curves, I believe, are trying to tell a story about our general society. You see in these curves, the population is evenly distributed. There are as many people doing extraordinarily well as there are people flunking miserably, there are just as few saints as there are criminals (*claim not supported by data), and just as many great ideas as terrible ones. Proving that the nature of everything is to be balanced out. 

Most of us are here, stuck in the middle, being just averagely good enough. Thus mediocrity is something that should be celebrated because that is where everyone is and where everyone is supposed to be, any extremities will have to be counter placed to achieve balance in this curve of ours. And for some data point to shift away from the median would require tremendous amount of effort. So, the next time someone calls you basic or average, take that as a compliment. 



Debí tirar más fotos: A guide to learning Image Generation using Variational Autoencoders

There are only so many AI-generated cat videos you can watch before you start wondering how this technology works and , more importantly, wh...