Folioby Interconnected
Log InSign Up

The Singular Value Decomposition: Structure of Arbitrary Matrices

Every real m x n matrix factors as A = U Sigma V^T, where U and V are orthogonal and Sigma is diagonal — this is the singular value decomposition. We prove existence, show how the SVD yields optimal low-rank approximations (Eckart–Young theorem), and construct the Moore–Penrose pseudoinverse for least-squares solutions.

FO
Folio Official
March 24, 2026

1. Beyond Square Matrices

The spectral theorem gave us a powerful decomposition for symmetric matrices: every real symmetric matrix A can be written as A=QΛQT, where Q is orthogonal and Λ is diagonal. But this result applies only to square matrices, and indeed only to the symmetric ones among those. In applications, we routinely encounter matrices that are not square — a data matrix with more observations than variables, or a linear map between spaces of different dimensions. Eigenvalues are not even defined for such matrices. We need a more general tool.

The key observation is this: even when A itself is rectangular, the products ATA and AAT are always symmetric and square. If A∈Mm×n​(R), then ATA is n×n and AAT is m×m, and both are symmetric. The spectral theorem applies to each of them. The singular value decomposition (SVD) exploits this fact to decompose A itself, revealing the fundamental geometric action of any linear map: a rotation, followed by a coordinate-wise scaling, followed by another rotation.

2. Singular Values

We begin by examining the matrix ATA more carefully.

Lemma 1 (Positive semidefiniteness ofATA).
For any A∈Mm×n​(R), the matrix ATA is positive semidefinite. In particular, all eigenvalues of ATA are nonnegative.
Proof.
For any x∈Rn, we have
xT(ATA)x=(Ax)T(Ax)=∥Ax∥2≥0.
Since ATA is symmetric, the spectral theorem guarantees that all its eigenvalues are real. If λ is an eigenvalue with eigenvector v=0, then λ∥v∥2=vT(ATA)v≥0, so λ≥0. □
Definition 1 (Singular values).
Let A∈Mm×n​(R). Let λ1​≥λ2​≥⋯≥λn​≥0 be the eigenvalues of ATA, listed in decreasing order and counted with multiplicity. The singular values of A are the numbers
σi​=λi​​,i=1,…,n.
We write σ1​≥σ2​≥⋯≥σn​≥0. The number of nonzero singular values equals rank(A).
Lemma 3.
Let A∈Mm×n​(R) with singular values σ1​≥⋯≥σn​≥0. Then rank(A)=rank(ATA), which equals the number of nonzero singular values.
Proof.
It suffices to show ker(A)=ker(ATA). If Ax=0, then ATAx=0. Conversely, if ATAx=0, then ∥Ax∥2=xTATAx=0, so Ax=0. Since the two null spaces coincide, the rank-nullity theorem gives rank(A)=rank(ATA). The rank of ATA is the number of its nonzero eigenvalues, which is the number of nonzero singular values. □
Example 2.
Let A=​101​110​​. Then
ATA=(11​01​10​)​101​110​​=(21​12​).
The eigenvalues are λ=3 and λ=1, so the singular values are σ1​=3​ and σ2​=1.
Remark 3.
One can equally well define singular values via the eigenvalues of AAT. The nonzero eigenvalues of ATA and AAT are the same: if ATAv=λv with λ=0, then AAT(Av)=A(ATAv)=λ(Av), so λ is also an eigenvalue of AAT.

3. The SVD Theorem

We now state and prove the central result of this chapter.

Theorem 6 (Singular Value Decomposition).
Let A∈Mm×n​(R) with rank(A)=r. Then there exist an orthogonal matrix U∈Mm×m​(R), an orthogonal matrix V∈Mn×n​(R), and a matrix Σ∈Mm×n​(R) such that
A=UΣVT,
where Σ has the form
Σ=​σ1​​⋱​σr​​0​⋱​​,
with σ1​≥σ2​≥⋯≥σr​>0 on the diagonal and all other entries zero.
Proof.
Since ATA is a real symmetric n×n matrix, the spectral theorem provides an orthonormal basis {v1​,…,vn​} of Rn consisting of eigenvectors:
ATAvi​=λi​vi​,λ1​≥⋯≥λr​>0=λr+1​=⋯=λn​.
Set σi​=λi​​ for i=1,…,r, and let V=(v1​​⋯​vn​​).

Step 1: Constructing the left singular vectors. For i=1,…,r, define
ui​=σi​1​Avi​.
We claim that {u1​,…,ur​} is an orthonormal set in Rm. Indeed, for 1≤i,j≤r,
uiT​uj​=σi​σj​1​(Avi​)T(Avj​)=σi​σj​1​viT​ATAvj​=σi​σj​λj​​viT​vj​.
Since the vi​ are orthonormal, viT​vj​=δij​, and when i=j this gives λi​/σi2​=1. So uiT​uj​=δij​.

Step 2: Extending to an orthonormal basis. The vectors u1​,…,ur​ form an orthonormal set of r vectors in Rm. We extend this to an orthonormal basis {u1​,…,um​} of Rm (by Gram–Schmidt on any completion, for instance). Let U=(u1​​⋯​um​​).

Step 3: Verifying the decomposition. We must show A=UΣVT, or equivalently AV=UΣ. The i-th column of AV is Avi​, and the i-th column of UΣ is σi​ui​ when i≤r and 0 when i>r. For i≤r:
Avi​=σi​ui​=σi​ui​,
which holds by the definition of ui​. For i>r: since vi​∈ker(ATA)=ker(A), we have Avi​=0, and the i-th column of UΣ is also 0. Therefore AV=UΣ, and since V is orthogonal, A=UΣVT. □
Remark 4.
The columns v1​,…,vn​ of V are called the right singular vectors of A, and the columns u1​,…,um​ of U are the left singular vectors. The right singular vectors are eigenvectors of ATA, and the left singular vectors are eigenvectors of AAT. One can verify the latter directly: for i≤r, AATui​=σi​1​AATAvi​=σi​1​A(λi​vi​)=σi​Avi​=σi2​ui​.

The SVD can also be written in outer product form. Since A=UΣVT, expanding columnwise gives

A=i=1∑r​σi​ui​viT​.
Each term σi​ui​viT​ is a rank-one matrix, and A is expressed as a sum of r rank-one matrices weighted by the singular values. This form is the starting point for low-rank approximation.

4. Geometric Interpretation

The SVD reveals the geometry of any linear map x↦Ax with crystalline clarity. Writing A=UΣVT, the map decomposes into three stages:

  1. VT: rotate (and possibly reflect) the input — this is an orthogonal transformation of Rn.

  2. Σ: scale each coordinate axis independently by the corresponding singular value, mapping Rn into Rm.

  3. U: rotate (and possibly reflect) the output — this is an orthogonal transformation of Rm.

The geometric meaning becomes vivid when we consider the image of the unit sphere Sn−1={x∈Rn:∥x∥=1} under A.

Theorem 8 (Image of the unit sphere).
Let A∈Mm×n​(R) have singular values σ1​≥⋯≥σr​>0 and SVD A=UΣVT. The image A(Sn−1) is an ellipsoid in Rm (possibly degenerate if r<n) whose semi-axes have lengths σ1​,…,σr​ and point in the directions u1​,…,ur​.
Proof.
Let x∈Sn−1 and write y=VTx. Since VT is orthogonal, ∥y∥=∥x∥=1, so y ranges over all of Sn−1 as x does. The point Σy has coordinates (σ1​y1​,…,σr​yr​,0,…,0)T in Rm. As y ranges over Sn−1, the constraint y12​+⋯+yn2​=1 ensures that the image under Σ satisfies
σ12​z12​​+⋯+σr2​zr2​​≤1,
with equality on the surface. This is an ellipsoid in the subspace spanned by the first r coordinate vectors, with semi-axes σ1​,…,σr​. The final rotation U maps the i-th standard basis vector to ui​, so the semi-axes of the image ellipsoid point in the directions u1​,…,ur​. □
Example 5.
A 2×2 matrix with singular values 3 and 1 maps the unit circle to an ellipse with semi-major axis 3 and semi-minor axis 1. The directions of these axes are determined by the left singular vectors.

5. Full SVD Computation for a3×2Matrix

Let us compute the full SVD of the matrix

A=​101​110​​.

Step 1: ComputeATAand its eigenvalues.

ATA=(21​12​).
The characteristic polynomial is det(ATA−λI)=(2−λ)2−1=λ2−4λ+3=(λ−3)(λ−1). The eigenvalues are λ1​=3 and λ2​=1, giving singular values σ1​=3​ and σ2​=1.

Step 2: Find the right singular vectors. For λ1​=3: solving (ATA−3I)v=0 gives v1​=2​1​(11​). For λ2​=1: solving (ATA−I)v=0 gives v2​=2​1​(1−1​). Thus
V=2​1​(11​1−1​).

Step 3: Find the left singular vectors. We compute ui​=σi​1​Avi​:
u1​=3​1​Av1​=3​1​⋅2​1​​211​​=6​1​​211​​,
u2​=11​Av2​=2​1​​0−11​​=2​1​​0−11​​.
We extend to an orthonormal basis of R3 by finding u3​⊥u1​,u2​. One verifies that u3​=3​1​​−111​​ is orthogonal to both u1​ and u2​ and has unit norm. Thus
U=​6​2​6​1​6​1​​02​−1​2​1​​3​−1​3​1​3​1​​​,Σ=​3​00​010​​.

Step 4: Verify. One can check directly that UΣVT=A. For instance, the first column of UΣVT is 2​1​(3​u1​+1⋅u2​)=2​1​​​2/2​1/2​1/2​​​+​0−1/2​1/2​​​​. Working this through confirms the result.

6. Low-Rank Approximation

The outer product form A=∑i=1r​σi​ui​viT​ suggests a natural way to approximate A: keep only the terms with the largest singular values. For k≤r, define the truncated SVD

Ak​=i=1∑k​σi​ui​viT​.
This is a rank-k matrix (since it is a sum of k rank-one matrices with orthogonal column and row spaces). The question is: how good is this approximation?

Theorem 10 (Eckart–Young–Mirsky).
Let A∈Mm×n​(R) have singular values σ1​≥⋯≥σr​>0, and let Ak​ be its rank-k truncated SVD. Then for any matrix B of rank at most k,
∥A−Ak​∥2​≤∥A−B∥2​and∥A−Ak​∥F​≤∥A−B∥F​,
where ∥⋅∥2​ is the operator norm and ∥⋅∥F​ is the Frobenius norm. In particular,
∥A−Ak​∥2​=σk+1​and∥A−Ak​∥F​=σk+12​+⋯+σr2​​.

The Eckart–Young–Mirsky theorem tells us that the truncated SVD solves the best rank-k approximation problem in a very strong sense: it is simultaneously optimal in both the operator norm and the Frobenius norm. The geometric intuition is clear: Ak​ retains the k most important “{}stretching directions”{} of the linear map, discarding those along which the map acts most weakly. The approximation error is controlled by the discarded singular values.

Remark 6.
This theorem is the mathematical foundation of principal component analysis (PCA) in statistics and of many data compression algorithms. If the singular values decay rapidly — that is, if σk+1​ is much smaller than σ1​— then a low-rank approximation captures most of the structure of A.
Example 7.
Returning to our matrix A with σ1​=3​ and σ2​=1, the best rank-one approximation is
A1​=3​u1​v1T​=3​⋅6​1​​211​​⋅2​1​(1​1​)=21​​211​211​​.
The error is ∥A−A1​∥2​=σ2​=1.

7. The Pseudoinverse

The SVD provides a clean way to define an inverse-like object for any matrix, even one that is rectangular or singular.

Definition 8 (Pseudoinverse).
Let A∈Mm×n​(R) have SVD A=UΣVT with r nonzero singular values. Define Σ+∈Mn×m​(R) by
(Σ+)ij​={1/σi​0​if i=j≤r,otherwise.​
The Moore–Penrose pseudoinverse of A is
A+=VΣ+UT.

When A is square and invertible, A+=A−1, since Σ+=Σ−1 and A+=VΣ−1UT=(UΣVT)−1. In general, A+ is the unique matrix satisfying the four Moore–Penrose conditions: (i) AA+A=A, (ii) A+AA+=A+, (iii) (AA+)T=AA+, (iv) (A+A)T=A+A.

Theorem 14 (Pseudoinverse and least squares).
Let A∈Mm×n​(R) and b∈Rm. The vector x+=A+b is the minimum-norm least-squares solution to Ax=b. That is, among all x minimizing ∥Ax−b∥, the vector x+ has the smallest norm ∥x∥.
Proof.
Write A=UΣVT and let c=UTb, y=VTx. Since U is orthogonal,
∥Ax−b∥2=∥UΣVTx−b∥2=∥Σy−c∥2=i=1∑r​(σi​yi​−ci​)2+i=r+1∑m​ci2​.
The second sum is independent of x, and the first sum is minimized by choosing yi​=ci​/σi​ for i=1,…,r. The components yr+1​,…,yn​ do not appear in ∥Ax−b∥2, so any choice of these components gives a least-squares solution. To minimize ∥x∥2=∥y∥2=∑i=1n​yi2​, we set yi​=0 for i>r. The resulting y is exactly Σ+c=Σ+UTb, so x+=Vy=VΣ+UTb=A+b. □
Example 9.
For our running example, A+=VΣ+UT where
Σ+=(1/3​0​01​00​).
If b=​111​​, then the minimum-norm least-squares solution is x+=A+b. One can verify that A+b=31​(22​), and indeed Ax+=31​​422​​ is the orthogonal projection of b onto the column space of A.

8. Matrix Norms and the Condition Number

The singular values provide natural measures of the “{}size”{} of a matrix.

Definition 10 (Operator norm).
For A∈Mm×n​(R), the operator norm (or spectral norm, or 2-norm) is
∥A∥2​=x=0max​∥x∥∥Ax∥​=∥x∥=1max​∥Ax∥.
Theorem 17.
∥A∥2​=σ1​, the largest singular value of A.
Proof.
By Theorem 8, the image of the unit sphere under A is an ellipsoid with semi-axes of lengths σ1​≥⋯≥σr​. The maximum distance from the origin to a point on this ellipsoid is σ1​, attained in the direction u1​. Therefore max∥x∥=1​∥Ax∥=σ1​.

More explicitly, write x=∑i=1n​ci​vi​ with ∑ci2​=1. Then Ax=∑i=1r​ci​σi​ui​, so ∥Ax∥2=∑i=1r​ci2​σi2​≤σ12​∑i=1r​ci2​≤σ12​. Equality holds when c1​=1 and all other ci​=0, i.e., x=v1​. □
Definition 11 (Frobenius norm).
The Frobenius norm of A=(aij​) is ∥A∥F​=∑i,j​aij2​​.
Theorem 19.
∥A∥F​=σ12​+⋯+σr2​​.
Proof.
Since U and V are orthogonal, ∥A∥F​=∥UΣVT∥F​=∥Σ∥F​. The entries of Σ are zero except for σ1​,…,σr​ on the diagonal, so ∥Σ∥F2​=σ12​+⋯+σr2​. (The fact that orthogonal transformations preserve the Frobenius norm follows from ∥UAV∥F2​=tr((UAV)TUAV)=tr(VTATUTUAV)=tr(VTATAV)=tr(ATA)=∥A∥F2​, using the cyclic property of the trace in the last step.) □
Definition 12 (Condition number).
Let A∈Mm×n​(R) with rank(A)=r. The condition number of A is
κ(A)=σr​σ1​​,
the ratio of the largest to the smallest nonzero singular value.

The condition number measures how close A is to being rank-deficient. If κ(A) is large, then A nearly collapses some direction — it stretches the input space very unevenly. In numerical computation, a large condition number means that solving Ax=b is sensitive to perturbations in b or in A itself. When A is square and invertible, we have the precise estimate

∥x∥∥δx∥​≤κ(A)∥b∥∥δb∥​
for the relative error in the solution caused by a perturbation δb in the right-hand side.

Example 13.
For our matrix A with σ1​=3​ and σ2​=1, the condition number is κ(A)=3​≈1.73. This is a well-conditioned matrix. By contrast, the matrix B=(11​11+10−10​) has singular values approximately 2 and 5×10−11, giving κ(B)≈4×1010. Solving Bx=b is an ill-conditioned problem.
Remark 14.
The singular value decomposition unifies many threads from earlier in this text. Eigenvalues reveal the structure of a square matrix acting on its own space; singular values reveal the structure of any linear map between two spaces. The spectral theorem is the special case of the SVD when A is symmetric: the left and right singular vectors coincide with the eigenvectors, and the singular values are the absolute values of the eigenvalues. In this sense, the SVD is the natural culmination of the diagonalization program we have pursued throughout this book.
Linear AlgebraAlgebraTextbookSingular Value DecompositionSVDMatrix Decomposition
FO
Folio Official

Mathematics "between the lines" — exploring the intuition textbooks leave out, written in LaTeX on Folio.

1 followers·107 articles
Linear Algebra TextbookPart 13 of 13
Previous
The Spectral Theorem: Orthogonal Diagonalization of Symmetric Matrices
No next article

Share your expertise with the world

Write articles with LaTeX support, build your audience, and earn from your knowledge.

Start Writing — It's Free

More from Folio Official

Folio Official·March 1, 2026

Matrices and Representation of Linear Maps

Every linear map between finite-dimensional spaces is uniquely represented by a matrix once bases are chosen, and composition of maps corresponds to matrix multiplication. We derive the change-of-basis formula, characterize invertible matrices, and show that matrix rank equals the rank of the corresponding linear map.

Linear AlgebraAlgebraTextbook
Folio Official·March 1, 2026

The Determinant: A Scalar Invariant of Square Matrices

The determinant of a square matrix is the unique scalar-valued function characterized by alternation and multilinearity. We construct it via the Leibniz formula, prove the product formula det(AB) = det(A)det(B), and derive Cramer's rule for solving linear systems.

Linear AlgebraAlgebraTextbook
Folio Official·March 1, 2026

Jordan Normal Form: Beyond Diagonalization

Every complex square matrix is similar to a Jordan normal form — a block-diagonal matrix of Jordan blocks that is unique up to block ordering. We prove existence and uniqueness, connect the Jordan form to the minimal polynomial, and apply it to compute the matrix exponential e^{tA}.

Linear AlgebraAlgebraTextbook
Folio Official·March 1, 2026

Diagonalization: Simplifying Matrices by Choice of Basis

A matrix is diagonalizable if and only if the sum of its geometric multiplicities equals the matrix size. We prove this criterion, give a step-by-step diagonalization procedure with applications to computing A^n, and prove Schur's theorem that every complex square matrix is unitarily triangularizable.

Linear AlgebraAlgebraTextbook