Linear Algebra: A Complete Summary of Definitions, Theorems, and Proofs

A single-page survey of undergraduate linear algebra. From the axioms of a vector space to Jordan normal form, this article systematically organizes the definitions, theorems, and proofs that form the core of a first course. Includes a topic dependency diagram.

Folio Official

March 1, 2026

1. Introduction: How to Use This Article

This article provides a single-page panoramic view of undergraduate linear algebra—roughly one textbook's worth of material. For each topic, we present the key definitions, major theorems, and proof sketches in concise form.

How to use this page:

Newcomers: read the sections in order to build a coherent picture of the subject.
Review or exam preparation: jump to any section from the table of contents.
Quick theorem lookup: Section 13, “ Theorem Directory,”{} collects all the major results in one list.

The dependency diagram below shows how the main topics are interrelated.

graph TD
    A["Axioms of a Vector Space"] --> B["Linear Independence, Bases, and Dimension"]
    B --> C["Linear Maps"]
    C --> D["Matrices and Representation"]
    D --> E["Systems of Equations and Row Reduction"]
    D --> F["Determinants"]
    C --> G["Kernel, Image, and the Rank-Nullity Theorem"]
    F --> H["Eigenvalues and Eigenvectors"]
    D --> H
    H --> I["Diagonalization"]
    A --> J["Inner Product Spaces"]
    B --> J
    J --> K["Orthonormal Bases and Gram-Schmidt"]
    K --> L["Orthogonal Projection and Least Squares"]
    H --> M["Jordan Normal Form"]
    I --> M

    style A fill:#f5f5f5,stroke:#333,color:#000
    style H fill:#f5f5f5,stroke:#333,color:#000
    style I fill:#f5f5f5,stroke:#333,color:#000
    style M fill:#f5f5f5,stroke:#333,color:#000

Remark 1.

This article is designed to accompany the “ Linear Algebra Textbook”{} series. Links at the end of each section point to the corresponding full-length chapter for detailed exposition and complete proofs.

2. Vector Spaces

Definition 2 (Vector space).

A vector space over a field

K

is a set

V

equipped with an addition

+ : V \times V \to V

and a scalar multiplication

\cdot : K \times V \to V

satisfying the following eight axioms:

$u + v = v + u$ (commutativity of addition).
$(u + v) + w = u + (v + w)$ (associativity of addition).
There exists a zero vector $0$ such that $v + 0 = v$ .
For every $v$ there exists $- v$ with $v + (- v) = 0$ .
$a (b v) = (ab) v$ (compatibility of scalar multiplication).
$1 \cdot v = v$ .
$a (u + v) = a u + a v$ (distributivity over vector addition).
$(a + b) v = a v + b v$ (distributivity over field addition).

Theorem 3 (Uniqueness of the zero vector).

The zero vector of a vector space

V

is unique.

Proof.

0

and

0^{'}

are both zero vectors, then

0^{'} = 0^{'} + 0 = 0

. □

Theorem 4 (Basic properties of scalar multiplication).

(1)

0 \cdot v = 0

(2)

a \cdot 0 = 0

(3)

(- 1) v = - v

(4)

a v = 0 \Rightarrow a = 0

v = 0

Fundamental examples of vector spaces:

$R^{n}$ : the space of $n$ -tuples of real numbers. The prototypical vector space.
$K [x]_{\leq n}$ : polynomials of degree at most $n$ . $dim = n + 1$ .
$M_{m \times n} (K)$ : all $m \times n$ matrices over $K$ . $dim = mn$ .
$C [a, b]$ : the space of continuous functions on $[a, b]$ . Infinite-dimensional.

Definition 5 (Subspace).

A subset

W \subseteq V

is a subspace of

V

if and only if (i)

W \neq = \emptyset

, (ii)

u, v \in W \Rightarrow u + v \in W

, and (iii)

a \in K, v \in W \Rightarrow a v \in W

Definition 6 (Sum and direct sum).

W_{1} + W_{2} = {w_{1} + w_{2} ∣ w_{i} \in W_{i}}

. When the decomposition is unique, we write

W_{1} \oplus W_{2}

and call it a direct sum.

W_{1} + W_{2}

is a direct sum if and only if

W_{1} \cap W_{2} = {0}

Theorem 7 (Dimension formula for subspaces).

dim (W_{1} + W_{2}) = dim W_{1} + dim W_{2} - dim (W_{1} \cap W_{2})

. In particular, for a direct sum,

dim (W_{1} \oplus W_{2}) = dim W_{1} + dim W_{2}

For more details:

https://interconnectd.app/articles/MhAgcx5EoAtrvudzv8s1

3. Linear Independence, Bases, and Dimension

Definition 8 (Linear combination).

An expression

a_{1} v_{1} + a_{2} v_{2} + \dots + a_{n} v_{n}

(with

a_{i} \in K

) is called a linear combination of

v_{1}, \dots, v_{n}

. The set of all such linear combinations is denoted

span {v_{1}, \dots, v_{n}}

Definition 9 (Linear independence).

Vectors

v_{1}, \dots, v_{n}

are linearly independent if the only solution to

a_{1} v_{1} + \dots + a_{n} v_{n} = 0

a_{1} = \dots = a_{n} = 0

. Otherwise they are linearly dependent.

Theorem 10.

v_{1}, \dots, v_{n}

are linearly dependent if and only if some

v_{k}

can be written as a linear combination of the remaining vectors.

Definition 11 (Basis and dimension).

A linearly independent set that spans

V

is called a basis for

V

. The representation

v = \sum a_{i} v_{i}

with respect to a basis is unique; the tuple

(a_{1}, \dots, a_{n})

is called the coordinate vector of

v

Theorem 12 (Steinitz exchange lemma).

{v_{1}, \dots, v_{m}}

spans

V

and

{w_{1}, \dots, w_{n}}

is linearly independent, then

n \leq m

. Moreover,

n

of the

v_{i}

can be replaced by

w_{1}, \dots, w_{n}

so that the resulting set still spans

V

Proof.

Since

w_{1} = \sum c_{j} v_{j}

(we may assume

c_{1} \neq = 0

), replacing

v_{1}

w_{1}

preserves the span of

V

. Repeat for

w_{2}, w_{3}, \dots

in succession. If

n > m

, then after

m

steps the

v_{i}

are exhausted and

w_{m + 1}

lies in the span of

{w_{1}, \dots, w_{m}}

, contradicting linear independence. □

Theorem 13 (Invariance of basis cardinality).

Any two bases of

V

have the same number of elements. This common value is called the dimension of

V

, written

dim V

Theorem 14.

When

dim V = n

: (1) any

n

linearly independent vectors form a basis; (2) any

n

vectors that span

V

form a basis; (3) every linearly independent set contains at most

n

vectors.

Example 15.

dim R^{n} = n

dim K [x]_{\leq n} = n + 1

dim M_{m \times n} (K) = mn

For more details:

https://interconnectd.app/articles/TJTXXNk2vKMrfFo6rFAw

4. Linear Maps

Definition 16 (Linear map).

A map

T : V \to W

is a linear map (linear transformation) if

T (a u + b v) = a T (u) + b T (v)

for all

a, b \in K

and all

u, v \in V

Theorem 17 (Determination by values on a basis).

Let

{v_{1}, \dots, v_{n}}

be a basis for

V

. For any choice of vectors

w_{1}, \dots, w_{n} \in W

, there is a unique linear map

T : V \to W

satisfying

T (v_{i}) = w_{i}

Definition 18 (Kernel and image).

ker T Im T = {v \in V ∣ T (v) = 0} (kernel, or null space) = {T (v) ∣ v \in V} (image, or range)

ker T

is a subspace of

V

and

Im T

is a subspace of

W

Theorem 19.

T

is injective if and only if

ker T = {0}

Proof.

(\Rightarrow)

v \in ker T

, then

T (v) = 0 = T (0)

. Injectivity gives

v = 0

(\Leftarrow)

T (u) = T (v)

, then

T (u - v) = 0

, so

u - v \in ker T = {0}

, whence

u = v

. □

Theorem 20 (Rank–nullity theorem).

V

is finite-dimensional and

T : V \to W

is a linear map, then

dim V = dim ker T + dim Im T

Proof.

Extend a basis

{u_{1}, \dots, u_{r}}

ker T

to a basis

{u_{1}, \dots, u_{r}, v_{1}, \dots, v_{s}}

V

(where

r + s = dim V

). One then shows that

{T (v_{1}), \dots, T (v_{s})}

is a basis for

Im T

. □

Example 21.

Let

T : R^{3} \to R^{2}

be defined by

T (x, y, z) = (x + y, y + z)

. Then

ker T = {(t, - t, t)}

(

dim = 1

). By the rank–nullity theorem,

dim Im T = 3 - 1 = 2

, so

T

is surjective.

Definition 22 (Isomorphism).

A linear map

T

that is bijective is called an isomorphism. For finite-dimensional spaces,

V ≅ W

if and only if

dim V = dim W

For more details:

https://interconnectd.app/articles/HvkZC8lvgxbM5wcf5cAJ

5. Matrices and Representation

Definition 23 (Representation matrix).

Given a basis

B = {v_{1}, \dots, v_{n}}

for

V

and a basis

C = {w_{1}, \dots, w_{m}}

for

W

, the representation matrix (or matrix of

T

with respect to

B

and

C

) is the matrix

A = (a_{ij})

defined by

T (v_{j}) = \sum_{i = 1}^{m} a_{ij} w_{i}

. We write

[T]_{B}^{C} = A

. The

j

-th column of

A

is the coordinate vector of

T (v_{j})

with respect to

C

Theorem 24 (Composition corresponds to matrix multiplication).

T : V \to W

and

S : W \to U

have representation matrices

A

and

B

respectively, then

[S \circ T] = B A

. That is, composition of linear maps corresponds to multiplication of matrices.

Definition 25 (Change-of-basis matrix).

The matrix

P

effecting a change from basis

B

to basis

B^{'}

is defined by

v_{j}^{'} = \sum p_{ij} v_{i}

Theorem 26 (Change-of-basis formula).

A

is the matrix of

T : V \to V

with respect to basis

B

, and

A^{'}

is the matrix with respect to basis

B^{'}

, then

A^{'} = P^{- 1} A P

where

P

is the change-of-basis matrix from

B

B^{'}

Proof.

Let

x

and

x^{'}

denote the coordinate vectors of

v

with respect to

B

and

B^{'}

, respectively. Then

x = P x^{'}

, and the

B^{'}

-coordinates of

A x

are

P^{- 1} A P x^{'}

. □

Definition 27 (Similarity).

A^{'} = P^{- 1} A P

for some invertible matrix

P

, then

A

and

A^{'}

are said to be similar. Similar matrices share the same eigenvalues, determinant, trace, and rank.

Theorem 28 (Rank of a matrix).

The column rank (dimension of the column space) equals the row rank (dimension of the row space). This common value is denoted

rank A

Theorem 29 (Equivalent conditions for invertibility).

For an

n \times n

matrix

A

, the following are equivalent:

$A$ is invertible.
$rank A = n$ .
$det A \neq = 0$ .
The only solution to $A x = 0$ is $x = 0$ .
The columns of $A$ are linearly independent.
The reduced row echelon form (RREF) of $A$ is $I_{n}$ .

Theorem 30 (Properties of the inverse).

A

and

B

are invertible, then

(A^{- 1})^{- 1} = A

(A B)^{- 1} = B^{- 1} A^{- 1}

(A^{T})^{- 1} = (A^{- 1})^{T}

For more details:

https://interconnectd.app/articles/3jZ26AxT53YgunAiwg60

6. Systems of Linear Equations and Row Reduction

A system of $m$ equations in $n$ unknowns may be written as $A x = b$ (where $A \in M_{m \times n} (K)$ ).

Definition 31 (Elementary row operations).

(1) Interchange two rows:

R_{i} \leftrightarrow R_{j}

. (2) Scale a row:

R_{i} \to c R_{i}

(

c \neq = 0

). (3) Add a multiple of one row to another:

R_{i} \to R_{i} + c R_{j}

. None of these operations changes the solution set.

Definition 32 (Reduced row echelon form (RREF)).

A matrix is in RREF if: (1) all zero rows are at the bottom; (2) the leading entry (pivot) in each nonzero row is

1

; (3) each pivot lies to the right of the pivot in the row above; (4) each pivot is the only nonzero entry in its column. Every matrix can be brought to a unique RREF by elementary row operations.

Theorem 33 (Solution space of a homogeneous system).

The solution set of

A x = 0

is a subspace

ker A

K^{n}

, with

dim ker A = n - rank A

Theorem 34 (Structure of solutions to a nonhomogeneous system).

A x = b

has a particular solution

x_{0}

, then the general solution is

x = x_{0} + ker A = {x_{0} + h ∣ A h = 0}

(a particular solution plus the general solution of the associated homogeneous system).

Proof.

A x = b

and

A x_{0} = b

, then

A (x - x_{0}) = 0

, so

x - x_{0} \in ker A

. The converse is immediate. □

Theorem 35 (Existence and uniqueness of solutions).

For the system

A x = b

with

n

unknowns:

A solution exists if and only if $rank A = rank (A ∣ b)$ .
When a solution exists, it is unique if and only if $rank A = n$ (no free variables).
When a solution exists, the number of free variables is $n - rank A$ .

Example 36.

(111213) x = (36)

has RREF

(1001 - 1 2)

. The free variable is

x_{3} = t

. The general solution is

x = (t, 3 - 2 t, t)^{T} = (0, 3, 0)^{T} + t (1, - 2, 1)^{T}

For more details:

https://interconnectd.app/articles/l1ijcNk6xu71svk232tF

7. The Determinant

Definition 37 (Determinant).

The determinant of an

n \times n

matrix

A = (a_{ij})

det A = σ \in S_{n} \sum sgn (σ) i = 1 \prod n a_{i, σ (i)}

where

S_{n}

is the symmetric group on

n

letters and

sgn (σ)

denotes the sign of the permutation

σ

Example 38.

For

n = 2

det (a c b d) = a d - b c

. Geometrically, this is the signed area of the parallelogram spanned by the column vectors.

Theorem 39 (Fundamental properties of the determinant).

Multilinearity: the determinant is linear in each row.
Alternating property: interchanging two rows reverses the sign.
Normalization: $det I_{n} = 1$ .
If two rows are equal, then $det A = 0$ .
Adding a scalar multiple of one row to another does not change $det A$ .
$det A^{T} = det A$ .

Definition 40 (Cofactor).

The

(i, j)

cofactor is

\tilde{a}_{ij} = (- 1)^{i + j} M_{ij}

, where

M_{ij}

is the minor obtained by deleting row

i

and column

j

Theorem 41 (Cofactor expansion).

Expansion along row

i

det A = \sum_{j = 1}^{n} a_{ij} \tilde{a}_{ij}

. Expansion along column

j

det A = \sum_{i = 1}^{n} a_{ij} \tilde{a}_{ij}

Example 42.

A = 201 1 - 1 0 321

. Expanding along the first row:

det A = 2 (- 1) - 1 (- 2) + 3 (1) = 3

Theorem 43 (Multiplicativity of the determinant).

det (A B) = det A \cdot det B

Proof.

det A = 0

, then

rank (A B) < n

, so

det (A B) = 0

. If

det A \neq = 0

, write

A

as a product of elementary matrices and verify the identity step by step using the fundamental properties. □

Theorem 44.

A

is invertible if and only if

det A \neq = 0

. When

A

is invertible,

det (A^{- 1}) = (det A)^{- 1}

Theorem 45 (Cramer's rule).

A

is an invertible

n \times n

matrix, the solution of

A x = b

is given by

x_{i} = det A_{i} / det A

, where

A_{i}

A

with its

i

-th column replaced by

b

Theorem 46 (Adjugate formula for the inverse).

A^{- 1} = \frac{1}{d e t A} \tilde{A}^{T}

, where

\tilde{A}^{T}

is the transpose of the cofactor matrix (the classical adjugate).

For more details:

https://interconnectd.app/articles/5849O3iEwBX0eWcI54mw

8. Eigenvalues and Eigenvectors

Definition 47 (Eigenvalue and eigenvector).

A scalar

λ \in K

is an eigenvalue of

A

if there exists a nonzero vector

v

satisfying

A v = λ v

. Such a vector

v

is called an eigenvector corresponding to

λ

Definition 48 (Eigenspace).

The eigenspace of

λ

V_{λ} = ker (A - λ I) = {v ∣ A v = λ v}

. It is a subspace of

V

Theorem 49.

λ

is an eigenvalue of

A

if and only if

det (A - λ I) = 0

Definition 50 (Characteristic polynomial).

p_{A} (λ) = det (A - λ I) = (- 1)^{n} λ^{n} + (- 1)^{n - 1} (tr A) λ^{n - 1} + \dots + det A

Theorem 51.

Similar matrices have the same characteristic polynomial.

Proof.

det (P^{- 1} A P - λ I) = det (P^{- 1} (A - λ I) P) = det (A - λ I)

. □

Definition 52 (Algebraic and geometric multiplicity).

The algebraic multiplicity of

λ_{0}

is the multiplicity of

(λ - λ_{0})

as a factor of

p_{A} (λ)

. The geometric multiplicity of

λ_{0}

dim V_{λ_{0}}

. The inequality

1 \leq geometric multiplicity \leq algebraic multiplicity

always holds.

Example 53.

A = (2012)

: the eigenvalue

λ = 2

has algebraic multiplicity

2

and

V_{2} = span {(1, 0)^{T}}

, so the geometric multiplicity is

1

Theorem 54 (Linear independence of eigenvectors for distinct eigenvalues).

λ_{1}, \dots, λ_{k}

are distinct eigenvalues and

v_{1}, \dots, v_{k}

are corresponding eigenvectors, then

v_{1}, \dots, v_{k}

are linearly independent.

Proof.

Proceed by induction on

k

. From

\sum c_{i} v_{i} = 0

, apply

A

and subtract

λ_{k}

times the original relation to obtain

\sum_{i = 1}^{k - 1} c_{i} (λ_{i} - λ_{k}) v_{i} = 0

. By the inductive hypothesis and

λ_{i} \neq = λ_{k}

, each

c_{i} = 0

. □

Theorem 55 (Trace and determinant via eigenvalues).

λ_{1}, \dots, λ_{n}

are the eigenvalues of

A

(counted with multiplicity), then

tr A = \sum λ_{i}

and

det A = \prod λ_{i}

Theorem 56 (Cayley–Hamilton theorem).

A

is an

n \times n

matrix with characteristic polynomial

p_{A} (λ)

, then

p_{A} (A) = O

Example 57.

A = (1324)

p_{A} (λ) = λ^{2} - 5 λ - 2

. One can verify by direct computation that

A^{2} - 5 A - 2 I = O

For more details:

https://interconnectd.app/articles/y42I8nHkIhSuCFllFTM0

9. Diagonalization

Definition 58 (Diagonalizability).

A matrix

A

is diagonalizable if there exists an invertible matrix

P

such that

P^{- 1} A P = diag (λ_{1}, \dots, λ_{n})

is a diagonal matrix.

Theorem 59 (Necessary and sufficient conditions for diagonalizability).

The following are equivalent:

$A$ is diagonalizable.
$A$ possesses $n$ linearly independent eigenvectors.
For every eigenvalue $λ_{i}$ , the geometric multiplicity equals the algebraic multiplicity.

Proof.

(1 \Leftrightarrow 2)

P^{- 1} A P = D

if and only if

A P = P D

, which holds if and only if the columns of

P

are eigenvectors.

P

is invertible if and only if these columns are linearly independent. □

Remark 60.

A

has

n

distinct eigenvalues, then

A

is diagonalizable (since eigenvectors for distinct eigenvalues are linearly independent). The converse does not hold.

Diagonalization procedure:

Solve $p_{A} (λ) = det (A - λ I) = 0$ to find the eigenvalues.
For each eigenvalue $λ_{i}$ , find a basis for $ker (A - λ_{i} I)$ (the eigenvectors).
Check that the geometric multiplicity equals the algebraic multiplicity for every eigenvalue (if not, $A$ is not diagonalizable).
Form the matrix $P$ whose columns are the eigenvectors; then $D = P^{- 1} A P$ .

Example 61.

A = (41 - 2 1)

p_{A} = (λ - 2) (λ - 3)

. Eigenvectors:

v_{1} = (1, 1)^{T}

for

λ = 2

v_{2} = (2, 1)^{T}

for

λ = 3

. Setting

P = (1121)

gives

P^{- 1} A P = (2003)

Computing $A^{n}$ : if $A = P D P^{- 1}$ , then $A^{n} = P D^{n} P^{- 1} = P diag (λ_{1}^{n}, \dots, λ_{k}^{n}) P^{- 1}$ .

Theorem 62 (Schur triangularization theorem).

Over

K = C

, every square matrix is unitarily similar to an upper triangular matrix. In particular, triangularization is always possible even when diagonalization is not.

For more details:

https://interconnectd.app/articles/GmSReTHwXXaJxx5v0Og7

10. Inner Product Spaces and Orthonormal Bases

Definition 63 (Inner product).

An inner product on a real vector space

V

is a map

⟨ \cdot, \cdot ⟩ : V \times V \to R

satisfying: (1) symmetry:

⟨ u, v ⟩ = ⟨ v, u ⟩

; (2) linearity in the first argument; (3) positive definiteness:

⟨ v, v ⟩ \geq 0

, with equality if and only if

v = 0

Examples of inner products:

Standard inner product on $R^{n}$ : $⟨ x, y ⟩ = \sum x_{i} y_{i} = x^{T} y$ .
Function spaces: $⟨ f, g ⟩ = \int_{a}^{b} f (x) g (x) d x$ .
Matrix spaces: $⟨ A, B ⟩ = tr (A^{T} B)$ .

Definition 64 (Norm and orthogonality).

∥ v ∥ = ⟨ v, v ⟩

is the norm of

v

. When

⟨ u, v ⟩ = 0

, the vectors

u

and

v

are orthogonal, written

u ⊥ v

Theorem 65 (Cauchy–Schwarz inequality).

∣ ⟨ u, v ⟩ ∣ \leq ∥ u ∥ \cdot ∥ v ∥

. Equality holds if and only if

u

and

v

are linearly dependent.

Proof.

For

v \neq = 0

, the inequality

0 \leq ∥ u - t v ∥^{2} = ∥ u ∥^{2} - 2 t ⟨ u, v ⟩ + t^{2} ∥ v ∥^{2}

holds for all

t

. Substituting

t = ⟨ u, v ⟩ /∥ v ∥^{2}

and rearranging yields the result. □

Theorem 66 (Triangle inequality).

∥ u + v ∥ \leq ∥ u ∥ + ∥ v ∥

Definition 67 (Orthonormal basis).

A basis

{e_{1}, \dots, e_{n}}

satisfying

⟨ e_{i}, e_{j} ⟩ = δ_{ij}

is called an orthonormal basis (ONB). With respect to an ONB,

v = \sum ⟨ v, e_{i} ⟩ e_{i}

(i.e., coordinates are computed by taking inner products).

Theorem 68 (Gram–Schmidt orthonormalization).

Given a linearly independent set

{v_{1}, \dots, v_{n}}

, one can construct an orthonormal set

{e_{1}, \dots, e_{n}}

by the procedure:

u_{k} = v_{k} - j = 1 \sum k - 1 ⟨ v_{k}, e_{j} ⟩ e_{j}, e_{k} = u_{k} /∥ u_{k} ∥

At each step,

span {e_{1}, \dots, e_{k}} = span {v_{1}, \dots, v_{k}}

Example 69.

Let

v_{1} = (1, 1, 0)^{T}

and

v_{2} = (1, 0, 1)^{T}

. Then

e_{1} = \frac{1}{2} (1, 1, 0)^{T}

. Next,

u_{2} = v_{2} - ⟨ v_{2}, e_{1} ⟩ e_{1} = (1/2, - 1/2, 1)^{T}

, and

e_{2} = u_{2} /∥ u_{2} ∥

Definition 70 (Orthogonal complement).

W^{⊥} = {v \in V ∣ ⟨ v, w ⟩ = 0 for all w \in W}

Theorem 71 (Orthogonal decomposition).

For a subspace

W

of a finite-dimensional inner product space

V

V = W \oplus W^{⊥}

dim W^{⊥} = dim V - dim W

, and

(W^{⊥})^{⊥} = W

Definition 72 (Orthogonal projection).

Writing

v = w + w^{⊥}

with

w \in W

and

w^{⊥} \in W^{⊥}

, the vector

w = proj_{W} (v)

is called the orthogonal projection of

v

onto

W

. If

{e_{1}, \dots, e_{k}}

is an ONB for

W

, then

proj_{W} (v) = \sum_{i = 1}^{k} ⟨ v, e_{i} ⟩ e_{i}

Theorem 73 (Best approximation theorem).

proj_{W} (v)

is the closest point in

W

v

∥ v - proj_{W} (v) ∥ \leq ∥ v - w ∥

for all

w \in W

Proof.

Write

\overset{v}{^} = proj_{W} (v)

. Then

∥ v - w ∥^{2} = ∥ v - \overset{v}{^} ∥^{2} + ∥ \overset{v}{^} - w ∥^{2}

(since

v - \overset{v}{^} ⊥ \overset{v}{^} - w

). Because

∥ \overset{v}{^} - w ∥^{2} \geq 0

, it follows that

∥ v - w ∥ \geq ∥ v - \overset{v}{^} ∥

. □

Theorem 74 (Normal equations (least squares)).

When

A x = b

has no solution, the least-squares solution satisfies

A^{T} A \hat{x} = A^{T} b

For more details:

https://interconnectd.app/articles/IRLofHq7X10eobXlzRcr

11. Jordan Normal Form

Definition 75 (Jordan block).

J_{k} (λ) = λ O 1 λ ⋱ ⋱ O 1 λ \in M_{k} (K)

The diagonal entries are

λ

, the superdiagonal entries are

1

, and all other entries are

0

. In particular,

J_{1} (λ) = (λ)

reduces to a diagonal entry.

Definition 76 (Generalized eigenspace).

For an eigenvalue

λ

of algebraic multiplicity

m

, the generalized eigenspace is

W_{λ} = ker (A - λ I)^{m}

(

dim W_{λ} = m

). The ordinary eigenspace

V_{λ} = ker (A - λ I)

is a subspace of

W_{λ}

Theorem 77 (Direct-sum decomposition via generalized eigenspaces).

λ_{1}, \dots, λ_{s}

are the distinct eigenvalues of

A

, then

K^{n} = W_{λ_{1}} \oplus \dots \oplus W_{λ_{s}}

Theorem 78 (Existence and uniqueness of the Jordan normal form).

Over

K = C

, for every

n \times n

matrix

A

there exists an invertible matrix

P

such that

P^{- 1} A P = diag (J_{k_{1}} (λ_{1}), J_{k_{2}} (λ_{2}), \dots, J_{k_{r}} (λ_{r}))

The Jordan normal form is unique up to the ordering of the Jordan blocks.

Remark 79.

A

is diagonalizable if and only if every Jordan block is

1 \times 1

Example 80.

Suppose

p_{A} (λ) = (λ - 2)^{3} (λ - 5)

with

dim V_{2} = 2

and

dim V_{5} = 1

. Then the Jordan blocks for

λ = 2

are

J_{2} (2)

and

J_{1} (2)

, and for

λ = 5

we have

J_{1} (5)

J = 2000120000200005

Definition 81 (Minimal polynomial).

The minimal polynomial

m_{A} (λ)

is the monic polynomial of least degree satisfying

m_{A} (A) = O

. It divides the characteristic polynomial and shares the same roots.

Theorem 82.

A

is diagonalizable if and only if

m_{A} (λ)

has no repeated roots. In general,

m_{A} (λ) = \prod_{i} (λ - λ_{i})^{k_{i}}

, where

k_{i}

is the size of the largest Jordan block associated with

λ_{i}

Theorem 83 (Matrix exponential).

e^{t J_{k} (λ)} = e^{λ t} 10 ⋮ 00 t 1 \dots \dots t^{2} /2! t ⋱ \dots \dots ⋱ 10 t^{k - 1} / (k - 1)! t^{k - 2} / (k - 2)! ⋮ t 1

Writing

J_{k} (λ) = λ I + N

(where

N^{k} = O

), we have

e^{t J_{k} (λ)} = e^{λ t} e^{tN}

(a finite sum).

Example 84.

A = (2012) = J_{2} (2)

. Then

e^{t A} = e^{2 t} (10 t 1)

. For the system of differential equations

x^{'} = A x

with

x (0) = (1, 1)^{T}

, the solution is

x (t) = e^{2 t} (1 + t, 1)^{T}

For more details:

https://interconnectd.app/articles/8ddUxygIQW8xqOeSooOn

12. The Big Picture of Linear Algebra

graph LR
    A["Vector Spaces<br/>8 Axioms"] --> B["Bases and Dimension<br/>Steinitz"]
    B --> C["Linear Maps<br/>Rank-Nullity Theorem"]
    C --> D["Matrices<br/>Representation"]
    D --> E["Determinants<br/>Multiplicativity"]
    E --> F["Eigenvalues<br/>Characteristic Polynomial"]
    F --> G["Diagonalization<br/>A = PDP⁻¹"]
    G --> H["Jordan Normal Form<br/>Generalized Eigenspaces"]
    B --> I["Inner Product Spaces<br/>Gram-Schmidt"]
    I --> J["Projection and Least Squares"]

    style A fill:#f5f5f5,stroke:#333,color:#000
    style F fill:#f5f5f5,stroke:#333,color:#000
    style H fill:#f5f5f5,stroke:#333,color:#000

13. Theorem Directory

A one-line summary of the major theorems of undergraduate linear algebra.

Uniqueness of the zero vector: the zero vector $0$ is unique.
Steinitz exchange lemma: the cardinality of a linearly independent set $\leq$ the cardinality of a spanning set.
Invariance of basis cardinality: any two bases have the same number of elements.
Dimension formula: $dim (W_{1} + W_{2}) = dim W_{1} + dim W_{2} - dim (W_{1} \cap W_{2})$ .
Rank–nullity theorem: $dim V = dim ker T + dim Im T$ .
Row rank equals column rank: $rank$ is well-defined.
Multiplicativity of the determinant: $det (A B) = det A \cdot det B$ .
Cofactor expansion: $det A = \sum_{j} a_{ij} \tilde{a}_{ij}$ .
Cramer's rule: $x_{i} = det A_{i} / det A$ .
Linear independence of eigenvectors for distinct eigenvalues.
Cayley–Hamilton theorem: $p_{A} (A) = O$ .
Diagonalizability criterion: geometric multiplicity $=$ algebraic multiplicity for every eigenvalue.
Cauchy–Schwarz inequality: $∣ ⟨ u, v ⟩ ∣ \leq ∥ u ∥∥ v ∥$ .
Gram–Schmidt orthonormalization.
Best approximation theorem: the orthogonal projection is the nearest point.
Normal equations: $A^{T} A \hat{x} = A^{T} b$ .

Additionally, the following structural results:

Schur triangularization theorem: every matrix is unitarily similar to an upper triangular matrix (over $C$ ).
Existence and uniqueness of the Jordan normal form: every matrix is similar to a Jordan form (over $C$ ).
Minimal polynomial and diagonalizability: $m_{A}$ has no repeated roots if and only if $A$ is diagonalizable.

14. Appendix: Notation and Terminology

Symbol	Name	Meaning
$K$	Field	Usually $R$ or $C$
$V, W$	Vector spaces	Sets satisfying the eight axioms
$dim V$	Dimension of $V$	Number of elements in a basis
$span {v_{1}, \dots, v_{n}}$	Span	Set of all linear combinations
$W_{1} \oplus W_{2}$	Direct sum	Sum with unique decomposition
$T : V \to W$	Linear map	Map preserving addition and scalar multiplication
$ker T$	Kernel	${v ∣ T (v) = 0}$
$Im T$	Image	${T (v) ∣ v \in V}$
$[T]_{B}^{C}$	Representation matrix	Matrix of $T$ with respect to bases $B$ and $C$
$rank A$	Rank	Dimension of the column (or row) space
$det A$	Determinant	Defined via signed permutations
$tr A$	Trace	Sum of diagonal entries $= \sum λ_{i}$
$A^{T}$	Transpose	$(A^{T})_{ij} = A_{ji}$
$A^{- 1}$	Inverse	$A A^{- 1} = A^{- 1} A = I$
$p_{A} (λ)$	Characteristic polynomial	$det (A - λ I)$
$V_{λ}$	Eigenspace	$ker (A - λ I)$
$W_{λ}$	Generalized eigenspace	$ker (A - λ I)^{m}$
$⟨ u, v ⟩$	Inner product	Symmetric, linear, positive-definite pairing
$∥ v ∥$	Norm	$⟨ v, v ⟩$
$u ⊥ v$	Orthogonality	$⟨ u, v ⟩ = 0$
$W^{⊥}$	Orthogonal complement	All vectors orthogonal to $W$
$proj_{W} (v)$	Orthogonal projection	Nearest point to $v$ in $W$
$J_{k} (λ)$	Jordan block	$λ I + N$ ( $N$ nilpotent)
$m_{A} (λ)$	Minimal polynomial	Monic polynomial of least degree annihilating $A$
$e^{A}$	Matrix exponential	$\sum A^{k} / k!$

Linear Algebra Algebra Summary Theorem Reference Eigenvalues Determinant Inner Product Spaces

Folio Official

Mathematics "between the lines" — exploring the intuition textbooks leave out, written in LaTeX on Folio.

1 followers·107 articles

Linear Algebra: A Complete Summary of Definitions, Theorems, and Proofs

1. Introduction: How to Use This Article

2. Vector Spaces

3. Linear Independence, Bases, and Dimension

4. Linear Maps

5. Matrices and Representation

6. Systems of Linear Equations and Row Reduction

7. The Determinant

8. Eigenvalues and Eigenvectors

9. Diagonalization

10. Inner Product Spaces and Orthonormal Bases

11. Jordan Normal Form

12. The Big Picture of Linear Algebra

13. Theorem Directory

14. Appendix: Notation and Terminology

Share your expertise with the world

More from Folio Official

The Determinant: A Scalar Invariant of Square Matrices

Matrices and Representation of Linear Maps

Jordan Normal Form: Beyond Diagonalization

The Singular Value Decomposition: Structure of Arbitrary Matrices