Linear Algebra: A Complete Summary of Definitions, Theorems, and Proofs
A single-page survey of undergraduate linear algebra. From the axioms of a vector space to Jordan normal form, this article systematically organizes the definitions, theorems, and proofs that form the core of a first course. Includes a topic dependency diagram.
This article provides a single-page panoramic view of undergraduate linear algebra—roughly one textbook's worth of material. For each topic, we present the key definitions, major theorems, and proof sketches in concise form.
How to use this page:
Newcomers: read the sections in order to build a coherent picture of the subject.
Review or exam preparation: jump to any section from the table of contents.
Quick theorem lookup: Section 13, “ Theorem Directory,”{} collects all the major results in one list.
The dependency diagram below shows how the main topics are interrelated.
graph TD
A["Axioms of a Vector Space"] --> B["Linear Independence, Bases, and Dimension"]
B --> C["Linear Maps"]
C --> D["Matrices and Representation"]
D --> E["Systems of Equations and Row Reduction"]
D --> F["Determinants"]
C --> G["Kernel, Image, and the Rank-Nullity Theorem"]
F --> H["Eigenvalues and Eigenvectors"]
D --> H
H --> I["Diagonalization"]
A --> J["Inner Product Spaces"]
B --> J
J --> K["Orthonormal Bases and Gram-Schmidt"]
K --> L["Orthogonal Projection and Least Squares"]
H --> M["Jordan Normal Form"]
I --> M
style A fill:#f5f5f5,stroke:#333,color:#000
style H fill:#f5f5f5,stroke:#333,color:#000
style I fill:#f5f5f5,stroke:#333,color:#000
style M fill:#f5f5f5,stroke:#333,color:#000
Remark 1.
This article is designed to accompany the “ Linear Algebra Textbook”{} series. Links at the end of each section point to the corresponding full-length chapter for detailed exposition and complete proofs.
2. Vector Spaces
Definition 2 (Vector space).
A vector space over a field K is a set V equipped with an addition +:V×V→V and a scalar multiplication ⋅:K×V→V satisfying the following eight axioms:
u+v=v+u (commutativity of addition).
(u+v)+w=u+(v+w) (associativity of addition).
There exists a zero vector 0 such that v+0=v.
For every v there exists −v with v+(−v)=0.
a(bv)=(ab)v (compatibility of scalar multiplication).
1⋅v=v.
a(u+v)=au+av (distributivity over vector addition).
(a+b)v=av+bv (distributivity over field addition).
Theorem 3 (Uniqueness of the zero vector).
The zero vector of a vector space V is unique.
Proof.
If 0 and 0′ are both zero vectors, then 0′=0′+0=0.□
Theorem 4 (Basic properties of scalar multiplication).
(1) 0⋅v=0 (2) a⋅0=0 (3) (−1)v=−v (4) av=0⇒a=0 or v=0
Fundamental examples of vector spaces:
Rn: the space of n-tuples of real numbers. The prototypical vector space.
K[x]≤n: polynomials of degree at most n. dim=n+1.
Mm×n(K): all m×n matrices over K. dim=mn.
C[a,b]: the space of continuous functions on [a,b]. Infinite-dimensional.
Definition 5 (Subspace).
A subset W⊆V is a subspace of V if and only if (i) W=∅, (ii) u,v∈W⇒u+v∈W, and (iii) a∈K,v∈W⇒av∈W.
Definition 6 (Sum and direct sum).
W1+W2={w1+w2∣wi∈Wi}. When the decomposition is unique, we write W1⊕W2 and call it a direct sum. W1+W2 is a direct sum if and only if W1∩W2={0}.
Theorem 7 (Dimension formula for subspaces).
dim(W1+W2)=dimW1+dimW2−dim(W1∩W2). In particular, for a direct sum, dim(W1⊕W2)=dimW1+dimW2.
An expression a1v1+a2v2+⋯+anvn (with ai∈K) is called a linear combination of v1,…,vn. The set of all such linear combinations is denoted span{v1,…,vn}.
Definition 9 (Linear independence).
Vectors v1,…,vn are linearly independent if the only solution to a1v1+⋯+anvn=0 is a1=⋯=an=0. Otherwise they are linearly dependent.
Theorem 10.
v1,…,vn are linearly dependent if and only if some vk can be written as a linear combination of the remaining vectors.
Definition 11 (Basis and dimension).
A linearly independent set that spans V is called a basis for V. The representation v=∑aivi with respect to a basis is unique; the tuple (a1,…,an) is called the coordinate vector of v.
Theorem 12 (Steinitz exchange lemma).
If {v1,…,vm} spans V and {w1,…,wn} is linearly independent, then n≤m. Moreover, n of the vi can be replaced by w1,…,wn so that the resulting set still spans V.
Proof.
Since w1=∑cjvj (we may assume c1=0), replacing v1 by w1 preserves the span of V. Repeat for w2,w3,… in succession. If n>m, then after m steps the vi are exhausted and wm+1 lies in the span of {w1,…,wm}, contradicting linear independence.□
Theorem 13 (Invariance of basis cardinality).
Any two bases of V have the same number of elements. This common value is called the dimension of V, written dimV.
Theorem 14.
When dimV=n: (1) any n linearly independent vectors form a basis; (2) any n vectors that span V form a basis; (3) every linearly independent set contains at most n vectors.
Given a basis B={v1,…,vn} for V and a basis C={w1,…,wm} for W, the representation matrix (or matrix of T with respect to B and C) is the matrix A=(aij) defined by T(vj)=∑i=1maijwi. We write [T]BC=A. The j-th column of A is the coordinate vector of T(vj) with respect to C.
Theorem 24 (Composition corresponds to matrix multiplication).
If T:V→W and S:W→U have representation matrices A and B respectively, then [S∘T]=BA. That is, composition of linear maps corresponds to multiplication of matrices.
Definition 25 (Change-of-basis matrix).
The matrix P effecting a change from basis B to basis B′ is defined by vj′=∑pijvi.
Theorem 26 (Change-of-basis formula).
If A is the matrix of T:V→V with respect to basis B, and A′ is the matrix with respect to basis B′, then
A′=P−1AP
where P is the change-of-basis matrix from B to B′.
Proof.
Let x and x′ denote the coordinate vectors of v with respect to B and B′, respectively. Then x=Px′, and the B′-coordinates of Ax are P−1APx′.□
Definition 27 (Similarity).
If A′=P−1AP for some invertible matrix P, then A and A′ are said to be similar. Similar matrices share the same eigenvalues, determinant, trace, and rank.
Theorem 28 (Rank of a matrix).
The column rank (dimension of the column space) equals the row rank (dimension of the row space). This common value is denoted rankA.
Theorem 29 (Equivalent conditions for invertibility).
For an n×n matrix A, the following are equivalent:
A is invertible.
rankA=n.
detA=0.
The only solution to Ax=0 is x=0.
The columns of A are linearly independent.
The reduced row echelon form (RREF) of A is In.
Theorem 30 (Properties of the inverse).
If A and B are invertible, then (A−1)−1=A, (AB)−1=B−1A−1, (AT)−1=(A−1)T.
A system of m equations in n unknowns may be written as Ax=b (where A∈Mm×n(K)).
Definition 31 (Elementary row operations).
(1) Interchange two rows: Ri↔Rj. (2) Scale a row: Ri→cRi (c=0). (3) Add a multiple of one row to another: Ri→Ri+cRj. None of these operations changes the solution set.
Definition 32 (Reduced row echelon form (RREF)).
A matrix is in RREF if: (1) all zero rows are at the bottom; (2) the leading entry (pivot) in each nonzero row is 1; (3) each pivot lies to the right of the pivot in the row above; (4) each pivot is the only nonzero entry in its column. Every matrix can be brought to a unique RREF by elementary row operations.
Theorem 33 (Solution space of a homogeneous system).
The solution set of Ax=0 is a subspace kerA of Kn, with dimkerA=n−rankA.
Theorem 34 (Structure of solutions to a nonhomogeneous system).
If Ax=b has a particular solution x0, then the general solution is
x=x0+kerA={x0+h∣Ah=0}
(a particular solution plus the general solution of the associated homogeneous system).
Proof.
If Ax=b and Ax0=b, then A(x−x0)=0, so x−x0∈kerA. The converse is immediate.□
Theorem 35 (Existence and uniqueness of solutions).
For the system Ax=b with n unknowns:
A solution exists if and only if rankA=rank(A∣b).
When a solution exists, it is unique if and only if rankA=n (no free variables).
When a solution exists, the number of free variables is n−rankA.
Example 36.
(111213)x=(36) has RREF (1001−12). The free variable is x3=t. The general solution is x=(t,3−2t,t)T=(0,3,0)T+t(1,−2,1)T.
where Sn is the symmetric group on n letters and sgn(σ) denotes the sign of the permutation σ.
Example 38.
For n=2: det(acbd)=ad−bc. Geometrically, this is the signed area of the parallelogram spanned by the column vectors.
Theorem 39 (Fundamental properties of the determinant).
Multilinearity: the determinant is linear in each row.
Alternating property: interchanging two rows reverses the sign.
Normalization: detIn=1.
If two rows are equal, then detA=0.
Adding a scalar multiple of one row to another does not change detA.
detAT=detA.
Definition 40 (Cofactor).
The (i,j) cofactor is a~ij=(−1)i+jMij, where Mij is the minor obtained by deleting row i and column j.
Theorem 41 (Cofactor expansion).
Expansion along row i: detA=∑j=1naija~ij. Expansion along column j: detA=∑i=1naija~ij.
Example 42.
A=2011−10321. Expanding along the first row: detA=2(−1)−1(−2)+3(1)=3.
Theorem 43 (Multiplicativity of the determinant).
det(AB)=detA⋅detB.
Proof.
If detA=0, then rank(AB)<n, so det(AB)=0. If detA=0, write A as a product of elementary matrices and verify the identity step by step using the fundamental properties.□
Theorem 44.
A is invertible if and only if detA=0. When A is invertible, det(A−1)=(detA)−1.
Theorem 45 (Cramer's rule).
If A is an invertible n×n matrix, the solution of Ax=b is given by xi=detAi/detA, where Ai is A with its i-th column replaced by b.
Theorem 46 (Adjugate formula for the inverse).
A−1=detA1A~T, where A~T is the transpose of the cofactor matrix (the classical adjugate).
A scalar λ∈K is an eigenvalue of A if there exists a nonzero vector v satisfying Av=λv. Such a vector v is called an eigenvector corresponding to λ.
Definition 48 (Eigenspace).
The eigenspace of λ is Vλ=ker(A−λI)={v∣Av=λv}. It is a subspace of V.
Theorem 49.
λ is an eigenvalue of A if and only if det(A−λI)=0.
Definition 50 (Characteristic polynomial).
pA(λ)=det(A−λI)=(−1)nλn+(−1)n−1(trA)λn−1+⋯+detA
Theorem 51.
Similar matrices have the same characteristic polynomial.
Proof.
det(P−1AP−λI)=det(P−1(A−λI)P)=det(A−λI).□
Definition 52 (Algebraic and geometric multiplicity).
The algebraic multiplicity of λ0 is the multiplicity of (λ−λ0) as a factor of pA(λ). The geometric multiplicity of λ0 is dimVλ0. The inequality 1≤geometric multiplicity≤algebraic multiplicity always holds.
Example 53.
A=(2012): the eigenvalue λ=2 has algebraic multiplicity 2 and V2=span{(1,0)T}, so the geometric multiplicity is 1.
Theorem 54 (Linear independence of eigenvectors for distinct eigenvalues).
If λ1,…,λk are distinct eigenvalues and v1,…,vk are corresponding eigenvectors, then v1,…,vk are linearly independent.
Proof.
Proceed by induction on k. From ∑civi=0, apply A and subtract λk times the original relation to obtain ∑i=1k−1ci(λi−λk)vi=0. By the inductive hypothesis and λi=λk, each ci=0.□
Theorem 55 (Trace and determinant via eigenvalues).
If λ1,…,λn are the eigenvalues of A (counted with multiplicity), then trA=∑λi and detA=∏λi.
Theorem 56 (Cayley–Hamilton theorem).
If A is an n×n matrix with characteristic polynomial pA(λ), then pA(A)=O.
Example 57.
A=(1324): pA(λ)=λ2−5λ−2. One can verify by direct computation that A2−5A−2I=O.
A matrix A is diagonalizable if there exists an invertible matrix P such that P−1AP=diag(λ1,…,λn) is a diagonal matrix.
Theorem 59 (Necessary and sufficient conditions for diagonalizability).
The following are equivalent:
A is diagonalizable.
A possesses n linearly independent eigenvectors.
For every eigenvalue λi, the geometric multiplicity equals the algebraic multiplicity.
Proof.
(1⇔2): P−1AP=D if and only if AP=PD, which holds if and only if the columns of P are eigenvectors. P is invertible if and only if these columns are linearly independent.□
Remark 60.
If A has n distinct eigenvalues, then A is diagonalizable (since eigenvectors for distinct eigenvalues are linearly independent). The converse does not hold.
Diagonalization procedure:
Solve pA(λ)=det(A−λI)=0 to find the eigenvalues.
For each eigenvalue λi, find a basis for ker(A−λiI) (the eigenvectors).
Check that the geometric multiplicity equals the algebraic multiplicity for every eigenvalue (if not, A is not diagonalizable).
Form the matrix P whose columns are the eigenvectors; then D=P−1AP.
Example 61.
A=(41−21): pA=(λ−2)(λ−3). Eigenvectors: v1=(1,1)T for λ=2, v2=(2,1)T for λ=3. Setting P=(1121) gives P−1AP=(2003).
ComputingAn: if A=PDP−1, then An=PDnP−1=Pdiag(λ1n,…,λkn)P−1.
Theorem 62 (Schur triangularization theorem).
Over K=C, every square matrix is unitarily similar to an upper triangular matrix. In particular, triangularization is always possible even when diagonalization is not.
An inner product on a real vector space V is a map ⟨⋅,⋅⟩:V×V→R satisfying: (1) symmetry: ⟨u,v⟩=⟨v,u⟩; (2) linearity in the first argument; (3) positive definiteness: ⟨v,v⟩≥0, with equality if and only if v=0.
Examples of inner products:
Standard inner product on Rn: ⟨x,y⟩=∑xiyi=xTy.
Function spaces: ⟨f,g⟩=∫abf(x)g(x)dx.
Matrix spaces: ⟨A,B⟩=tr(ATB).
Definition 64 (Norm and orthogonality).
∥v∥=⟨v,v⟩ is the norm of v. When ⟨u,v⟩=0, the vectors u and v are orthogonal, written u⊥v.
Theorem 65 (Cauchy–Schwarz inequality).
∣⟨u,v⟩∣≤∥u∥⋅∥v∥. Equality holds if and only if u and v are linearly dependent.
Proof.
For v=0, the inequality 0≤∥u−tv∥2=∥u∥2−2t⟨u,v⟩+t2∥v∥2 holds for all t. Substituting t=⟨u,v⟩/∥v∥2 and rearranging yields the result.□
Theorem 66 (Triangle inequality).
∥u+v∥≤∥u∥+∥v∥.
Definition 67 (Orthonormal basis).
A basis {e1,…,en} satisfying ⟨ei,ej⟩=δij is called an orthonormal basis (ONB). With respect to an ONB, v=∑⟨v,ei⟩ei (i.e., coordinates are computed by taking inner products).
Theorem 68 (Gram–Schmidt orthonormalization).
Given a linearly independent set {v1,…,vn}, one can construct an orthonormal set {e1,…,en} by the procedure:
uk=vk−j=1∑k−1⟨vk,ej⟩ej,ek=uk/∥uk∥
At each step, span{e1,…,ek}=span{v1,…,vk}.
Example 69.
Let v1=(1,1,0)T and v2=(1,0,1)T. Then e1=21(1,1,0)T. Next, u2=v2−⟨v2,e1⟩e1=(1/2,−1/2,1)T, and e2=u2/∥u2∥.
Definition 70 (Orthogonal complement).
W⊥={v∈V∣⟨v,w⟩=0 for all w∈W}.
Theorem 71 (Orthogonal decomposition).
For a subspace W of a finite-dimensional inner product space V: V=W⊕W⊥, dimW⊥=dimV−dimW, and (W⊥)⊥=W.
Definition 72 (Orthogonal projection).
Writing v=w+w⊥ with w∈W and w⊥∈W⊥, the vector w=projW(v) is called the orthogonal projection of v onto W. If {e1,…,ek} is an ONB for W, then projW(v)=∑i=1k⟨v,ei⟩ei.
Theorem 73 (Best approximation theorem).
projW(v) is the closest point in W to v: ∥v−projW(v)∥≤∥v−w∥ for all w∈W.
Proof.
Write v^=projW(v). Then ∥v−w∥2=∥v−v^∥2+∥v^−w∥2 (since v−v^⊥v^−w). Because ∥v^−w∥2≥0, it follows that ∥v−w∥≥∥v−v^∥.□
Theorem 74 (Normal equations (least squares)).
When Ax=b has no solution, the least-squares solution satisfies ATAx^=ATb.
The diagonal entries are λ, the superdiagonal entries are 1, and all other entries are 0. In particular, J1(λ)=(λ) reduces to a diagonal entry.
Definition 76 (Generalized eigenspace).
For an eigenvalue λ of algebraic multiplicity m, the generalized eigenspace is Wλ=ker(A−λI)m (dimWλ=m). The ordinary eigenspace Vλ=ker(A−λI) is a subspace of Wλ.
Theorem 77 (Direct-sum decomposition via generalized eigenspaces).
If λ1,…,λs are the distinct eigenvalues of A, then Kn=Wλ1⊕⋯⊕Wλs.
Theorem 78 (Existence and uniqueness of the Jordan normal form).
Over K=C, for every n×n matrix A there exists an invertible matrix P such that
P−1AP=diag(Jk1(λ1),Jk2(λ2),…,Jkr(λr))
The Jordan normal form is unique up to the ordering of the Jordan blocks.
Remark 79.
A is diagonalizable if and only if every Jordan block is 1×1.
Example 80.
Suppose pA(λ)=(λ−2)3(λ−5) with dimV2=2 and dimV5=1. Then the Jordan blocks for λ=2 are J2(2) and J1(2), and for λ=5 we have J1(5):
J=2000120000200005
Definition 81 (Minimal polynomial).
The minimal polynomialmA(λ) is the monic polynomial of least degree satisfying mA(A)=O. It divides the characteristic polynomial and shares the same roots.
Theorem 82.
A is diagonalizable if and only if mA(λ) has no repeated roots. In general, mA(λ)=∏i(λ−λi)ki, where ki is the size of the largest Jordan block associated with λi.
graph LR
A["Vector Spaces<br/>8 Axioms"] --> B["Bases and Dimension<br/>Steinitz"]
B --> C["Linear Maps<br/>Rank-Nullity Theorem"]
C --> D["Matrices<br/>Representation"]
D --> E["Determinants<br/>Multiplicativity"]
E --> F["Eigenvalues<br/>Characteristic Polynomial"]
F --> G["Diagonalization<br/>A = PDP⁻¹"]
G --> H["Jordan Normal Form<br/>Generalized Eigenspaces"]
B --> I["Inner Product Spaces<br/>Gram-Schmidt"]
I --> J["Projection and Least Squares"]
style A fill:#f5f5f5,stroke:#333,color:#000
style F fill:#f5f5f5,stroke:#333,color:#000
style H fill:#f5f5f5,stroke:#333,color:#000
13. Theorem Directory
A one-line summary of the major theorems of undergraduate linear algebra.
Uniqueness of the zero vector: the zero vector 0 is unique.
Steinitz exchange lemma: the cardinality of a linearly independent set ≤ the cardinality of a spanning set.
Invariance of basis cardinality: any two bases have the same number of elements.