The Singular Value Decomposition: Structure of Arbitrary Matrices

Every real m x n matrix factors as A = U Sigma V^T, where U and V are orthogonal and Sigma is diagonal — this is the singular value decomposition. We prove existence, show how the SVD yields optimal low-rank approximations (Eckart–Young theorem), and construct the Moore–Penrose pseudoinverse for least-squares solutions.

Folio Official

March 24, 2026

1. Beyond Square Matrices

The spectral theorem gave us a powerful decomposition for symmetric matrices: every real symmetric matrix $A$ can be written as $A = Q Λ Q^{T}$ , where $Q$ is orthogonal and $Λ$ is diagonal. But this result applies only to square matrices, and indeed only to the symmetric ones among those. In applications, we routinely encounter matrices that are not square — a data matrix with more observations than variables, or a linear map between spaces of different dimensions. Eigenvalues are not even defined for such matrices. We need a more general tool.

The key observation is this: even when $A$ itself is rectangular, the products $A^{T} A$ and $A A^{T}$ are always symmetric and square. If $A \in M_{m \times n} (R)$ , then $A^{T} A$ is $n \times n$ and $A A^{T}$ is $m \times m$ , and both are symmetric. The spectral theorem applies to each of them. The singular value decomposition (SVD) exploits this fact to decompose $A$ itself, revealing the fundamental geometric action of any linear map: a rotation, followed by a coordinate-wise scaling, followed by another rotation.

2. Singular Values

We begin by examining the matrix $A^{T} A$ more carefully.

Lemma 1 (Positive semidefiniteness of

A^{T} A

For any

A \in M_{m \times n} (R)

, the matrix

A^{T} A

is positive semidefinite. In particular, all eigenvalues of

A^{T} A

are nonnegative.

Proof.

For any

x \in R^{n}

, we have

x^{T} (A^{T} A) x = (A x)^{T} (A x) = ∥ A x ∥^{2} \geq 0.

Since

A^{T} A

is symmetric, the spectral theorem guarantees that all its eigenvalues are real. If

λ

is an eigenvalue with eigenvector

v \neq = 0

, then

λ ∥ v ∥^{2} = v^{T} (A^{T} A) v \geq 0

, so

λ \geq 0

. □

Definition 1 (Singular values).

Let

A \in M_{m \times n} (R)

. Let

λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0

be the eigenvalues of

A^{T} A

, listed in decreasing order and counted with multiplicity. The singular values of

A

are the numbers

σ_{i} = λ_{i}, i = 1, \dots, n .

We write

σ_{1} \geq σ_{2} \geq \dots \geq σ_{n} \geq 0

. The number of nonzero singular values equals

rank (A)

Lemma 3.

Let

A \in M_{m \times n} (R)

with singular values

σ_{1} \geq \dots \geq σ_{n} \geq 0

. Then

rank (A) = rank (A^{T} A)

, which equals the number of nonzero singular values.

Proof.

It suffices to show

ker (A) = ker (A^{T} A)

. If

A x = 0

, then

A^{T} A x = 0

. Conversely, if

A^{T} A x = 0

, then

∥ A x ∥^{2} = x^{T} A^{T} A x = 0

, so

A x = 0

. Since the two null spaces coincide, the rank-nullity theorem gives

rank (A) = rank (A^{T} A)

. The rank of

A^{T} A

is the number of its nonzero eigenvalues, which is the number of nonzero singular values. □

Example 2.

Let

A = 101110

. Then

A^{T} A = (110110) 101110 = (2112) .

The eigenvalues are

λ = 3

and

λ = 1

, so the singular values are

σ_{1} = 3

and

σ_{2} = 1

Remark 3.

One can equally well define singular values via the eigenvalues of

A A^{T}

. The nonzero eigenvalues of

A^{T} A

and

A A^{T}

are the same: if

A^{T} A v = λ v

with

λ \neq = 0

, then

A A^{T} (A v) = A (A^{T} A v) = λ (A v)

, so

λ

is also an eigenvalue of

A A^{T}

3. The SVD Theorem

We now state and prove the central result of this chapter.

Theorem 6 (Singular Value Decomposition).

Let

A \in M_{m \times n} (R)

with

rank (A) = r

. Then there exist an orthogonal matrix

U \in M_{m \times m} (R)

, an orthogonal matrix

V \in M_{n \times n} (R)

, and a matrix

Σ \in M_{m \times n} (R)

such that

A = U Σ V^{T},

where

Σ

has the form

Σ = σ_{1} ⋱ σ_{r} 0 ⋱,

with

σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0

on the diagonal and all other entries zero.

Proof.

Since

A^{T} A

is a real symmetric

n \times n

matrix, the spectral theorem provides an orthonormal basis

{v_{1}, \dots, v_{n}}

R^{n}

consisting of eigenvectors:

A^{T} A v_{i} = λ_{i} v_{i}, λ_{1} \geq \dots \geq λ_{r} > 0 = λ_{r + 1} = \dots = λ_{n} .

Set

σ_{i} = λ_{i}

for

i = 1, \dots, r

, and let

V = (v_{1} \dots v_{n})

Step 1: Constructing the left singular vectors. For

i = 1, \dots, r

, define

u_{i} = \frac{1}{σ _{i}} A v_{i} .

We claim that

{u_{1}, \dots, u_{r}}

is an orthonormal set in

R^{m}

. Indeed, for

1 \leq i, j \leq r

u_{i}^{T} u_{j} = \frac{1}{σ _{i} σ _{j}} (A v_{i})^{T} (A v_{j}) = \frac{1}{σ _{i} σ _{j}} v_{i}^{T} A^{T} A v_{j} = \frac{λ _{j}}{σ _{i} σ _{j}} v_{i}^{T} v_{j} .

Since the

v_{i}

are orthonormal,

v_{i}^{T} v_{j} = δ_{ij}

, and when

i = j

this gives

λ_{i} / σ_{i}^{2} = 1

. So

u_{i}^{T} u_{j} = δ_{ij}

Step 2: Extending to an orthonormal basis. The vectors

u_{1}, \dots, u_{r}

form an orthonormal set of

r

vectors in

R^{m}

. We extend this to an orthonormal basis

{u_{1}, \dots, u_{m}}

R^{m}

(by Gram–Schmidt on any completion, for instance). Let

U = (u_{1} \dots u_{m})

Step 3: Verifying the decomposition. We must show

A = U Σ V^{T}

, or equivalently

A V = U Σ

. The

i

-th column of

A V

A v_{i}

, and the

i

-th column of

U Σ

σ_{i} u_{i}

when

i \leq r

and

0

when

i > r

. For

i \leq r

A v_{i} = σ_{i} u_{i} = σ_{i} u_{i},

which holds by the definition of

u_{i}

. For

i > r

: since

v_{i} \in ker (A^{T} A) = ker (A)

, we have

A v_{i} = 0

, and the

i

-th column of

U Σ

is also

0

. Therefore

A V = U Σ

, and since

V

is orthogonal,

A = U Σ V^{T}

. □

Remark 4.

The columns

v_{1}, \dots, v_{n}

V

are called the right singular vectors of

A

, and the columns

u_{1}, \dots, u_{m}

U

are the left singular vectors. The right singular vectors are eigenvectors of

A^{T} A

, and the left singular vectors are eigenvectors of

A A^{T}

. One can verify the latter directly: for

i \leq r

A A^{T} u_{i} = \frac{1}{σ _{i}} A A^{T} A v_{i} = \frac{1}{σ _{i}} A (λ_{i} v_{i}) = σ_{i} A v_{i} = σ_{i}^{2} u_{i}

The SVD can also be written in outer product form. Since $A = U Σ V^{T}$ , expanding columnwise gives

A = i = 1 \sum r σ_{i} u_{i} v_{i}^{T} .

Each term

σ_{i} u_{i} v_{i}^{T}

is a rank-one matrix, and

A

is expressed as a sum of

r

rank-one matrices weighted by the singular values. This form is the starting point for low-rank approximation.

4. Geometric Interpretation

The SVD reveals the geometry of any linear map $x \mapsto A x$ with crystalline clarity. Writing $A = U Σ V^{T}$ , the map decomposes into three stages:

$V^{T}$ : rotate (and possibly reflect) the input — this is an orthogonal transformation of $R^{n}$ .
$Σ$ : scale each coordinate axis independently by the corresponding singular value, mapping $R^{n}$ into $R^{m}$ .
$U$ : rotate (and possibly reflect) the output — this is an orthogonal transformation of $R^{m}$ .

The geometric meaning becomes vivid when we consider the image of the unit sphere $S^{n - 1} = {x \in R^{n} : ∥ x ∥ = 1}$ under $A$ .

Theorem 8 (Image of the unit sphere).

Let

A \in M_{m \times n} (R)

have singular values

σ_{1} \geq \dots \geq σ_{r} > 0

and SVD

A = U Σ V^{T}

. The image

A (S^{n - 1})

is an ellipsoid in

R^{m}

(possibly degenerate if

r < n

) whose semi-axes have lengths

σ_{1}, \dots, σ_{r}

and point in the directions

u_{1}, \dots, u_{r}

Proof.

Let

x \in S^{n - 1}

and write

y = V^{T} x

. Since

V^{T}

is orthogonal,

∥ y ∥ = ∥ x ∥ = 1

, so

y

ranges over all of

S^{n - 1}

x

does. The point

Σ y

has coordinates

(σ_{1} y_{1}, \dots, σ_{r} y_{r}, 0, \dots, 0)^{T}

R^{m}

. As

y

ranges over

S^{n - 1}

, the constraint

y_{1}^{2} + \dots + y_{n}^{2} = 1

ensures that the image under

Σ

satisfies

\frac{z _{1}^{2}}{σ _{1}^{2}} + \dots + \frac{z _{r}^{2}}{σ _{r}^{2}} \leq 1,

with equality on the surface. This is an ellipsoid in the subspace spanned by the first

r

coordinate vectors, with semi-axes

σ_{1}, \dots, σ_{r}

. The final rotation

U

maps the

i

-th standard basis vector to

u_{i}

, so the semi-axes of the image ellipsoid point in the directions

u_{1}, \dots, u_{r}

. □

Example 5.

2 \times 2

matrix with singular values

3

and

1

maps the unit circle to an ellipse with semi-major axis

3

and semi-minor axis

1

. The directions of these axes are determined by the left singular vectors.

5. Full SVD Computation for a $3 \times 2$ Matrix

Let us compute the full SVD of the matrix

A = 101110 .

Step 1: Compute $A^{T} A$ and its eigenvalues.

A^{T} A = (2112) .

The characteristic polynomial is

det (A^{T} A - λ I) = (2 - λ)^{2} - 1 = λ^{2} - 4 λ + 3 = (λ - 3) (λ - 1)

. The eigenvalues are

λ_{1} = 3

and

λ_{2} = 1

, giving singular values

σ_{1} = 3

and

σ_{2} = 1

Step 2: Find the right singular vectors. For

λ_{1} = 3

: solving

(A^{T} A - 3 I) v = 0

gives

v_{1} = \frac{1}{2} (11)

. For

λ_{2} = 1

: solving

(A^{T} A - I) v = 0

gives

v_{2} = \frac{1}{2} (1 - 1)

. Thus

V = \frac{1}{2} (11 1 - 1) .

Step 3: Find the left singular vectors. We compute

u_{i} = \frac{1}{σ _{i}} A v_{i}

u_{1} = \frac{1}{3} A v_{1} = \frac{1}{3} \cdot \frac{1}{2} 211 = \frac{1}{6} 211,

u_{2} = \frac{1}{1} A v_{2} = \frac{1}{2} 0 - 1 1 = \frac{1}{2} 0 - 1 1 .

We extend to an orthonormal basis of

R^{3}

by finding

u_{3} ⊥ u_{1}, u_{2}

. One verifies that

u_{3} = \frac{1}{3} - 1 11

is orthogonal to both

u_{1}

and

u_{2}

and has unit norm. Thus

U = \frac{2}{6} \frac{1}{6} \frac{1}{6} 0 \frac{- 1}{2} \frac{1}{2} \frac{- 1}{3} \frac{1}{3} \frac{1}{3}, Σ = 300010 .

Step 4: Verify. One can check directly that

U Σ V^{T} = A

. For instance, the first column of

U Σ V^{T}

\frac{1}{2} (3 u_{1} + 1 \cdot u_{2}) = \frac{1}{2} 2/ 2 1/ 2 1/ 2 + 0 - 1/ 2 1/ 2

. Working this through confirms the result.

6. Low-Rank Approximation

The outer product form $A = \sum_{i = 1}^{r} σ_{i} u_{i} v_{i}^{T}$ suggests a natural way to approximate $A$ : keep only the terms with the largest singular values. For $k \leq r$ , define the truncated SVD

A_{k} = i = 1 \sum k σ_{i} u_{i} v_{i}^{T} .

This is a rank-

k

matrix (since it is a sum of

k

rank-one matrices with orthogonal column and row spaces). The question is: how good is this approximation?

Theorem 10 (Eckart–Young–Mirsky).

Let

A \in M_{m \times n} (R)

have singular values

σ_{1} \geq \dots \geq σ_{r} > 0

, and let

A_{k}

be its rank-

k

truncated SVD. Then for any matrix

B

of rank at most

k

∥ A - A_{k} ∥_{2} \leq ∥ A - B ∥_{2} and ∥ A - A_{k} ∥_{F} \leq ∥ A - B ∥_{F},

where

∥ \cdot ∥_{2}

is the operator norm and

∥ \cdot ∥_{F}

is the Frobenius norm. In particular,

∥ A - A_{k} ∥_{2} = σ_{k + 1} and ∥ A - A_{k} ∥_{F} = σ_{k + 1}^{2} + \dots + σ_{r}^{2} .

The Eckart–Young–Mirsky theorem tells us that the truncated SVD solves the best rank- $k$ approximation problem in a very strong sense: it is simultaneously optimal in both the operator norm and the Frobenius norm. The geometric intuition is clear: $A_{k}$ retains the $k$ most important “{}stretching directions”{} of the linear map, discarding those along which the map acts most weakly. The approximation error is controlled by the discarded singular values.

Remark 6.

This theorem is the mathematical foundation of principal component analysis (PCA) in statistics and of many data compression algorithms. If the singular values decay rapidly — that is, if

σ_{k + 1}

is much smaller than

σ_{1}

— then a low-rank approximation captures most of the structure of

A

Example 7.

Returning to our matrix

A

with

σ_{1} = 3

and

σ_{2} = 1

, the best rank-one approximation is

A_{1} = 3 u_{1} v_{1}^{T} = 3 \cdot \frac{1}{6} 211 \cdot \frac{1}{2} (11) = \frac{1}{2} 211211 .

The error is

∥ A - A_{1} ∥_{2} = σ_{2} = 1

7. The Pseudoinverse

The SVD provides a clean way to define an inverse-like object for any matrix, even one that is rectangular or singular.

Definition 8 (Pseudoinverse).

Let

A \in M_{m \times n} (R)

have SVD

A = U Σ V^{T}

with

r

nonzero singular values. Define

Σ^{+} \in M_{n \times m} (R)

(Σ^{+})_{ij} = {1/ σ_{i} 0 if i = j \leq r, otherwise.

The Moore–Penrose pseudoinverse of

A

A^{+} = V Σ^{+} U^{T} .

When $A$ is square and invertible, $A^{+} = A^{- 1}$ , since $Σ^{+} = Σ^{- 1}$ and $A^{+} = V Σ^{- 1} U^{T} = (U Σ V^{T})^{- 1}$ . In general, $A^{+}$ is the unique matrix satisfying the four Moore–Penrose conditions: (i) $A A^{+} A = A$ , (ii) $A^{+} A A^{+} = A^{+}$ , (iii) $(A A^{+})^{T} = A A^{+}$ , (iv) $(A^{+} A)^{T} = A^{+} A$ .

Theorem 14 (Pseudoinverse and least squares).

Let

A \in M_{m \times n} (R)

and

b \in R^{m}

. The vector

x^{+} = A^{+} b

is the minimum-norm least-squares solution to

A x = b

. That is, among all

x

minimizing

∥ A x - b ∥

, the vector

x^{+}

has the smallest norm

∥ x ∥

Proof.

Write

A = U Σ V^{T}

and let

c = U^{T} b

y = V^{T} x

. Since

U

is orthogonal,

∥ A x - b ∥^{2} = ∥ U Σ V^{T} x - b ∥^{2} = ∥Σ y - c ∥^{2} = i = 1 \sum r (σ_{i} y_{i} - c_{i})^{2} + i = r + 1 \sum m c_{i}^{2} .

The second sum is independent of

x

, and the first sum is minimized by choosing

y_{i} = c_{i} / σ_{i}

for

i = 1, \dots, r

. The components

y_{r + 1}, \dots, y_{n}

do not appear in

∥ A x - b ∥^{2}

, so any choice of these components gives a least-squares solution. To minimize

∥ x ∥^{2} = ∥ y ∥^{2} = \sum_{i = 1}^{n} y_{i}^{2}

, we set

y_{i} = 0

for

i > r

. The resulting

y

is exactly

Σ^{+} c = Σ^{+} U^{T} b

, so

x^{+} = V y = V Σ^{+} U^{T} b = A^{+} b

. □

Example 9.

For our running example,

A^{+} = V Σ^{+} U^{T}

where

Σ^{+} = (1/ 3 0 0100) .

b = 111

, then the minimum-norm least-squares solution is

x^{+} = A^{+} b

. One can verify that

A^{+} b = \frac{1}{3} (22)

, and indeed

A x^{+} = \frac{1}{3} 422

is the orthogonal projection of

b

onto the column space of

A

8. Matrix Norms and the Condition Number

The singular values provide natural measures of the “{}size”{} of a matrix.

Definition 10 (Operator norm).

For

A \in M_{m \times n} (R)

, the operator norm (or spectral norm, or

2

-norm) is

∥ A ∥_{2} = x \neq = 0 max \frac{∥ A x ∥}{∥ x ∥} = ∥ x ∥ = 1 max ∥ A x ∥.

Theorem 17.

∥ A ∥_{2} = σ_{1}

, the largest singular value of

A

Proof.

By Theorem 8, the image of the unit sphere under

A

is an ellipsoid with semi-axes of lengths

σ_{1} \geq \dots \geq σ_{r}

. The maximum distance from the origin to a point on this ellipsoid is

σ_{1}

, attained in the direction

u_{1}

. Therefore

max_{∥ x ∥ = 1} ∥ A x ∥ = σ_{1}

More explicitly, write

x = \sum_{i = 1}^{n} c_{i} v_{i}

with

\sum c_{i}^{2} = 1

. Then

A x = \sum_{i = 1}^{r} c_{i} σ_{i} u_{i}

, so

∥ A x ∥^{2} = \sum_{i = 1}^{r} c_{i}^{2} σ_{i}^{2} \leq σ_{1}^{2} \sum_{i = 1}^{r} c_{i}^{2} \leq σ_{1}^{2}

. Equality holds when

c_{1} = 1

and all other

c_{i} = 0

, i.e.,

x = v_{1}

. □

Definition 11 (Frobenius norm).

The Frobenius norm of

A = (a_{ij})

∥ A ∥_{F} = \sum_{i, j} a_{ij}^{2}

Theorem 19.

∥ A ∥_{F} = σ_{1}^{2} + \dots + σ_{r}^{2}

Proof.

Since

U

and

V

are orthogonal,

∥ A ∥_{F} = ∥ U Σ V^{T} ∥_{F} = ∥Σ ∥_{F}

. The entries of

Σ

are zero except for

σ_{1}, \dots, σ_{r}

on the diagonal, so

∥Σ ∥_{F}^{2} = σ_{1}^{2} + \dots + σ_{r}^{2}

. (The fact that orthogonal transformations preserve the Frobenius norm follows from

∥ U A V ∥_{F}^{2} = tr ((U A V)^{T} U A V) = tr (V^{T} A^{T} U^{T} U A V) = tr (V^{T} A^{T} A V) = tr (A^{T} A) = ∥ A ∥_{F}^{2}

, using the cyclic property of the trace in the last step.) □

Definition 12 (Condition number).

Let

A \in M_{m \times n} (R)

with

rank (A) = r

. The condition number of

A

κ (A) = \frac{σ _{1}}{σ _{r}},

the ratio of the largest to the smallest nonzero singular value.

The condition number measures how close $A$ is to being rank-deficient. If $κ (A)$ is large, then $A$ nearly collapses some direction — it stretches the input space very unevenly. In numerical computation, a large condition number means that solving $A x = b$ is sensitive to perturbations in $b$ or in $A$ itself. When $A$ is square and invertible, we have the precise estimate

\frac{∥ δ x ∥}{∥ x ∥} \leq κ (A) \frac{∥ δ b ∥}{∥ b ∥}

for the relative error in the solution caused by a perturbation

δ b

in the right-hand side.

Example 13.

For our matrix

A

with

σ_{1} = 3

and

σ_{2} = 1

, the condition number is

κ (A) = 3 \approx 1.73

. This is a well-conditioned matrix. By contrast, the matrix

B = (11 1 1 + 1 0^{- 10})

has singular values approximately

2

and

5 \times 1 0^{- 11}

, giving

κ (B) \approx 4 \times 1 0^{10}

. Solving

B x = b

is an ill-conditioned problem.

Remark 14.

The singular value decomposition unifies many threads from earlier in this text. Eigenvalues reveal the structure of a square matrix acting on its own space; singular values reveal the structure of any linear map between two spaces. The spectral theorem is the special case of the SVD when

A

is symmetric: the left and right singular vectors coincide with the eigenvectors, and the singular values are the absolute values of the eigenvalues. In this sense, the SVD is the natural culmination of the diagonalization program we have pursued throughout this book.

Linear Algebra Algebra Textbook Singular Value Decomposition SVD Matrix Decomposition

Folio Official

Mathematics "between the lines" — exploring the intuition textbooks leave out, written in LaTeX on Folio.

1 followers·107 articles

The Singular Value Decomposition: Structure of Arbitrary Matrices

1. Beyond Square Matrices

2. Singular Values

3. The SVD Theorem

4. Geometric Interpretation

5. Full SVD Computation for a $3 \times 2$ Matrix

6. Low-Rank Approximation

7. The Pseudoinverse

8. Matrix Norms and the Condition Number

Share your expertise with the world

More from Folio Official

Matrices and Representation of Linear Maps

The Determinant: A Scalar Invariant of Square Matrices

Jordan Normal Form: Beyond Diagonalization

Diagonalization: Simplifying Matrices by Choice of Basis

1. Beyond Square Matrices

2. Singular Values

3. The SVD Theorem

4. Geometric Interpretation

5. Full SVD Computation for a3×2Matrix

6. Low-Rank Approximation

7. The Pseudoinverse

8. Matrix Norms and the Condition Number

Share your expertise with the world

More from Folio Official

Matrices and Representation of Linear Maps

The Determinant: A Scalar Invariant of Square Matrices

Jordan Normal Form: Beyond Diagonalization

Diagonalization: Simplifying Matrices by Choice of Basis

5. Full SVD Computation for a $3 \times 2$ Matrix