Exploring Singular Value Decomposition(SVD) from scratch in python

Vikraant Koushika Pai
3 min readFeb 9, 2022

--

Photo by imgix on Unsplash

Introduction

This article is inspired by amazing explanation in Gilbert Strang’s lecture on SVD. Here’s the link if you want to watch the lecture. To gain geometrical intuition of what SVD is, article by Hussein Abdulrahman is highly recommended. Intuitively SVD is decomposition of the vectors onto orthogonal axes. Every m x n matrix can be decomposed in U, Σ and V where U and V are orthogonal matrix and Σ is diagonal matrix having the singular value. Benefit of SVD over Eigenvalue decomposition is that it can be used for all the matrices and not just square matrices. Consequently it helps to have an X-ray view of the data before really diving deep into the data.

Singular Value Decomposition of matrix A

One of the major use case for SVD is dimensionality reduction. We will majorly explore the following

  1. SVD with scipy using linalg module
  2. SVD on gene expression dataset

SVD with scipy using linalg module

Let us define 11 x 5 matrix using np.random.random.

Let’s reduce number of columns in A using SVD.

sigma gives the singular values in an array. It has to be converted into matrix m x n matrix so that it can be used for data reduction and reconstruction.

Now we have U, V and Σ. Let’s reconstruct the matrix A with n-1 elements.

This is pretty close to the original matrix. We can also transform original matrix for further use.

SVD on gene expression dataset

The gene expression data set has 801 samples with each sample having 20531 gene data as columns.

Lets apply SVD on the dataset

When we plot the square of singular values by sum of squares of the all the singular values we see that, ignoring the first few values, most of the information can be retained by first few dimensions.

Let’s try to reconstruct the original matrix by first 28 values and try to find the residual.

The median absolute deviation of the reconstructed value is pretty low considering the reduction in number of dimensions.

Conclusion

It is always useful to go back to basics and explore some of the linear algebra fundamentals. SVD helps in dimensionality reduction, without having a constraint on type of array and without having to mean center the data. Exploring SVD from scratch using numpy and scipy can be a good starting point in the journey to explore algorithms such as latent semantic analysis, Image Compression, Pattern Extraction and so on.

References

  1. https://towardsdatascience.com/svd-8c2f72e264f
  2. https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html
  3. https://machinelearningmastery.com/singular-value-decomposition-for-machine-learning/
  4. https://www.youtube.com/watch?v=mBcLRGuAFUk
  5. https://towardsdatascience.com/understanding-singular-value-decomposition-and-its-application-in-data-science-388a54be95d
  6. https://link.springer.com/chapter/10.1007/978-0-8176-4558-8_32
  7. https://www.koreascience.or.kr/article/JAKO202121055598989.pdf
  8. https://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq

--

--

Vikraant Koushika Pai
Vikraant Koushika Pai

Written by Vikraant Koushika Pai

I am a Data Scientist and currently working on recommendations at Embibe.com

No responses yet