Maths behind Principal Component Analysis
- Yash Sakhuja
- Jun 8
- 5 min read

Can humans visualise everything? You might think that with the help of modern visualisation tools and technology, we’d be able to see and understand any data. But that’s not quite the case. Even today, our ability to visualise data takes a hit as the number of dimensions increases. That’s exactly where Principal Component Analysis (PCA) proves valuable- it helps simplify complex, high-dimensional data into something we can see. Technically speaking,
Principal Component Analysis (PCA) is a statistical method used to reduce the dimensionality of data by transforming it into a new coordinate system where the axes, called principal components, capture the most significant variance in the data.
To put it simply, if I gave you two variables: like Weight and Height, you could easily create a 2D scatter plot to see how the points are spread across the X and Y axes. But what if I added two more variables and asked you to visualise them all together using X, Y, Z, and J axes? That quickly becomes difficult to imagine or interpret. This is where the idea of reducing dimensions comes in- bringing those four variables down to just two, like X and Y, so we can visualise them more clearly. That process is called dimensionality reduction, and that’s exactly what PCA (Principal Component Analysis) helps us do.
The Dataset
Now that we’ve covered what PCA is and why it’s needed, let’s put it into action and break down the maths behind it using a real-world dataset. For this, we’ll use the classic Iris dataset, one of the most well-known datasets in machine learning. It contains three classes of 50 samples each, where each class represents a different species of iris plant. You can find this dataset in the UCI Machine Learning Repository.
For our purpose, what you need to know is that the dataset has four features:
SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm |
And of course, there's one more column in the dataset, the species, which tells us which type of iris each sample belongs to. Now, with four numerical features, visualising the data becomes tricky. After all, plotting in four dimensions is beyond what we can easily grasp. So the question is—can we reduce these four dimensions into something we can visualise, like two or three dimensions, while still preserving the structure of the data?
Let’s use PCA to find out.
Mighty Mathematics
Before we dive deeper into linear algebra, let's ensure we are all on the same scale, in short, let's standardise the data. Each feature (column) is standardised or here, centred around mean and often scaled (unit variance = 1).

Where:
μ: mean of the feature
σ: standard deviation
This ensures that features with larger scales don’t dominate.
Once we’ve standardised all four features, we can combine them into a single matrix; let’s call it W. This matrix holds all our feature columns (also referred to as vectors) together in one place. This W Matrix is a matrix of shape 150 X 4.
Next, we’ll use this matrix W to compute the covariance matrix, also known as the variance-covariance matrix. This symmetric matrix helps us understand how the features vary together. The diagonal elements represent the variance of each feature, while the off-diagonal elements show the covariance between different features.
To calculate it, we use a bit of linear algebra: multiply matrix W by its transpose WT, and then we will call that Q

Additionally, the covariance matrix (Q) is both square and symmetric, as it results from the dot product of matrix W (150×4) and its transpose (4×150), giving us a 4×4 matrix. Each column in this resulting matrix—also referred to as an eigenvector—represents a principal component, capturing a certain amount of the data's variance. The first principal component captures the most variance, the second captures slightly less, and so on down the line.
Eigen Vectors of a Covariance Matrix are the Principal Components of the Original Matrix W
Since we can visualise data much more easily in 2D using tools like Excel or other statistical software. We’ll focus on just the first two columns of the covariance matrix Q, and refer to this as Q_reduced (4X2 ). This reduced matrix contains the top two eigenvectors, which represent the directions capturing the most variance in the data. Now, if we take the dot product of the original matrix W (150X4) with Q_reduced, we’ll get T (150X2) PCA1 and PCA2: the first two principal components. Plotting these on the X and Y axes allows us to visualise the data in two dimensions, often revealing clear patterns or groupings, much like the graph shown at the top.

One challenge with PCA is that, in the original 4D space, each feature or dimension had a clear, interpretable meaning. But when PCA reduces the data to 2D, the new axes: PCA1 and PCA2 don’t have a direct physical meaning. Instead, they are combinations of the original features. While they may not be immediately interpretable, these two principal components effectively capture how much each of the original dimensions contributes to the overall variation in the data.
And there we have it. We’ve broken down PCA using just one key statistical equation and two matrix operations. Not too complicated, right? If I were to share one more formula, this time from the world of linear algebra, it would be the one used to calculate eigenvalues (λ) for the corresponding eigenvectors:
det(Q-λI) = 0
Where,
Q: the covariance matrix
λ: the eigenvalue
I: the Identity matrix (of same size as Q)
Solving this equation (called the characteristic equation) gives us the eigenvalues, which indicate how much variance each eigenvector (or principal component) captures from the original data. In PCA, these eigenvalues help us identify the most significant principal components—the ones that explain the most variation. This is essential for deciding how many components to retain when reducing the dimensionality of the dataset. Now, I’m not about to solve this by hand for a 4×4 matrix—while it’s definitely possible, that’s exactly what computers are for, my friend! I’ve already used software to calculate the eigenvalues for us.

We can see that the first two principal components capture 97% of the total variation, allowing the chart above to clearly distinguish between the three iris species in just two dimensions. However, this level of clarity isn’t always guaranteed—sometimes, more components are needed to represent the key variance.
Here’s a fun fact — you can actually use Excel’s Goal Seek feature to find the eigenvalues. The trick is to set the determinant to zero and adjust the lambda value until it fits. Try experimenting with different starting values for lambda to find the roots. Want to see it in action? Download the Excel file below and give it a go!
Not a fan of Excel? No worries — just fire up a Python kernel. With numpy and pandas, it’s a breeze.
I hope you found that helpful, I certainly did when I was trying to get my concepts clear about PCA and I would totally recommend giving the chapter on PCA from Anil Ananthaswamy's book Why Machines Learn? and that's totally my inspiration for this blog!
Signing Off,
Yash Sakhuja
Kommentare