The Spectral Theorem: Orthogonal Diagonalization of Symmetric Matrices
Every real symmetric matrix can be orthogonally diagonalized — we prove this spectral theorem and its complex generalization for normal operators. The spectral decomposition A = sum lambda_i P_i into orthogonal projections is derived, and we apply it to classify quadratic forms via Sylvester's law of inertia.
1. Motivation: From Diagonalization to Orthogonal Diagonalization
In earlier chapters we established when a linear operator can be diagonalized: the algebraic and geometric multiplicities of every eigenvalue must agree. When this holds, there exists an invertible matrix P such that P−1AP is diagonal. But the matrix P is not unique, and in general the columns of P need not be orthogonal.
On an inner product space, we can ask for something stronger. We say that a real n×n matrix A is orthogonally diagonalizable if there exists an orthogonal matrix Q (that is, QTQ=I) such that QTAQ is diagonal. Equivalently, A possesses an orthonormal basis of eigenvectors.
Why does this matter? Orthogonal diagonalization preserves lengths and angles, so the eigenvector decomposition respects the geometry of the inner product. This is essential in applications ranging from principal component analysis and the classification of conic sections to the study of vibrations in mechanical systems.
The central question of this chapter is: which matrices are orthogonally diagonalizable? The answer, given by the spectral theorem, is remarkably clean: the real symmetric matrices are precisely the orthogonally diagonalizable matrices.
2. The Adjoint of a Linear Map
Definition 1 (Adjoint operator).
Let V be a finite-dimensional inner product space over R (or C), and let T:V→V be a linear operator. The adjoint of T is the unique linear operator T∗:V→V satisfying
⟨Tv,w⟩=⟨v,T∗w⟩for all v,w∈V.
Theorem 2 (Existence and uniqueness of the adjoint).
Let V be a finite-dimensional inner product space and T:V→V a linear operator. Then T∗ exists and is unique.
Proof.
Fix w∈V and consider the map φw:V→R defined by φw(v)=⟨Tv,w⟩. This is a linear functional on V. By the Riesz representation theorem (which follows from the theory of orthogonal complements developed in Chapter 10), there exists a unique vector w∗∈V such that φw(v)=⟨v,w∗⟩ for all v∈V. We define T∗(w)=w∗.To show that T∗ is linear, let w1,w2∈V and a,b∈R. For every v∈V,
Since this holds for every v, the positive definiteness of the inner product gives T∗(aw1+bw2)=aT∗w1+bT∗w2.Uniqueness follows immediately: if S also satisfies ⟨Tv,w⟩=⟨v,Sw⟩ for all v,w, then ⟨v,T∗w−Sw⟩=0 for all v, which forces T∗w=Sw for every w.□
Example 3.
Let V=Rn with the standard inner product, and let T be the linear operator with matrix A (so Tv=Av). Then ⟨Av,w⟩=(Av)Tw=vTATw=⟨v,ATw⟩. Hence the adjoint of T has matrix AT. In the complex case with the standard Hermitian inner product ⟨v,w⟩=w∗v, the adjoint has matrix A∗=AT (the conjugate transpose).
Theorem 4 (Properties of the adjoint).
Let S,T:V→V be linear operators and c a scalar. Then:
(S+T)∗=S∗+T∗.
(cT)∗=cˉT∗.
(T∗)∗=T.
(ST)∗=T∗S∗.
kerT∗=(imT)⊥.
Proof.
We verify each identity by checking the defining property of the adjoint.(1) For all v,w: ⟨(S+T)v,w⟩=⟨Sv,w⟩+⟨Tv,w⟩=⟨v,S∗w⟩+⟨v,T∗w⟩=⟨v,(S∗+T∗)w⟩.(2) ⟨(cT)v,w⟩=c⟨Tv,w⟩=c⟨v,T∗w⟩=⟨v,cˉT∗w⟩ (using conjugate-linearity in the second argument in the complex case; in the real case cˉ=c).(3) ⟨v,(T∗)∗w⟩=⟨T∗v,w⟩=⟨w,T∗v⟩=⟨Tw,v⟩=⟨v,Tw⟩. Hence (T∗)∗=T.(4) ⟨(ST)v,w⟩=⟨S(Tv),w⟩=⟨Tv,S∗w⟩=⟨v,T∗(S∗w)⟩=⟨v,(T∗S∗)w⟩.(5) We have w∈kerT∗ if and only if T∗w=0, which holds if and only if ⟨v,T∗w⟩=0 for all v∈V, i.e., ⟨Tv,w⟩=0 for all v. This is precisely the condition w∈(imT)⊥.□
3. Self-Adjoint (Symmetric) Operators
Definition 5 (Self-adjoint operator).
A linear operator T on an inner product space V is called self-adjoint (or symmetric) if T∗=T, that is,
⟨Tv,w⟩=⟨v,Tw⟩for all v,w∈V.
In terms of matrices with respect to the standard inner product, T is self-adjoint if and only if its matrix satisfies A=AT (real case) or A=A∗ (complex case, where such operators are called Hermitian).
Example 6.
The matrix A=(2−1−13) is real symmetric, hence the corresponding operator on R2 is self-adjoint.
Theorem 7 (Eigenvalues of self-adjoint operators are real).
Let T be a self-adjoint operator on a real or complex inner product space. Then every eigenvalue of T is real.
Proof.
Let λ be an eigenvalue with eigenvector v=0. Then
λ⟨v,v⟩=⟨λv,v⟩=⟨Tv,v⟩=⟨v,Tv⟩=⟨v,λv⟩=λˉ⟨v,v⟩.
Since ⟨v,v⟩>0, we conclude λ=λˉ, so λ∈R.□
Theorem 8 (Orthogonality of eigenvectors).
Let T be a self-adjoint operator. If v1 and v2 are eigenvectors corresponding to distinct eigenvalues λ1=λ2, then v1⊥v2.
Proof.
We compute:
λ1⟨v1,v2⟩=⟨Tv1,v2⟩=⟨v1,Tv2⟩=λ2⟨v1,v2⟩,
where the last equality uses the fact that λ2 is real (by the previous theorem). Hence (λ1−λ2)⟨v1,v2⟩=0. Since λ1=λ2, we conclude ⟨v1,v2⟩=0.□
Lemma 9 (Invariant subspaces and their complements).
Let T be a self-adjoint operator on an inner product space V, and let W be a T-invariant subspace (i.e., T(W)⊆W). Then W⊥ is also T-invariant.
Proof.
Let u∈W⊥. We must show Tu∈W⊥, i.e., ⟨Tu,w⟩=0 for all w∈W. Since T is self-adjoint and W is T-invariant, we have Tw∈W, and therefore ⟨Tu,w⟩=⟨u,Tw⟩=0 because u∈W⊥.□
4. The Real Spectral Theorem
We now arrive at the main result: every real symmetric matrix is orthogonally diagonalizable. We state this both in the language of operators and matrices.
Theorem 7 (Real spectral theorem).
Let V be a finite-dimensional real inner product space and T:V→V a self-adjoint operator. Then V has an orthonormal basis consisting of eigenvectors of T. Equivalently, every real symmetric matrix A is orthogonally diagonalizable: there exists an orthogonal matrix Q such that QTAQ=diag(λ1,…,λn).
Proof.
We proceed by induction on n=dimV.Base case. If n=1, every operator is multiplication by a scalar, and any unit vector is an orthonormal eigenbasis.Inductive step. Assume the theorem holds for all real inner product spaces of dimension less than n. Let T be self-adjoint on the n-dimensional real inner product space V.First, we must show that T has at least one real eigenvalue. Let A be the matrix of T with respect to any orthonormal basis. Then A is real symmetric. The characteristic polynomial p(λ)=det(A−λI) has degree n with real coefficients. Viewing A as a complex matrix, its eigenvalues are all real by Theorem 7 (since A=AT=AT=A∗, so A is Hermitian). In particular, p(λ) has a real root λ1.Let v1 be a unit eigenvector for λ1, and set W=span{v1}. Since T(v1)=λ1v1∈W, the subspace W is T-invariant. By Lemma 9, W⊥ is also T-invariant.Now W⊥ is an (n−1)-dimensional inner product space (with the inner product inherited from V), and the restriction T∣W⊥:W⊥→W⊥ is again self-adjoint: for u,w∈W⊥,
⟨T∣W⊥u,w⟩=⟨Tu,w⟩=⟨u,Tw⟩=⟨u,T∣W⊥w⟩.
By the inductive hypothesis, W⊥ has an orthonormal basis {v2,…,vn} of eigenvectors of T∣W⊥ (hence of T).Since v1⊥W⊥, the set {v1,v2,…,vn} is an orthonormal basis of V consisting of eigenvectors of T.□
Remark 8.
The converse also holds: if a real matrix A is orthogonally diagonalizable, then A is symmetric. Indeed, if QTAQ=D with Q orthogonal and D diagonal, then A=QDQT, so AT=(QDQT)T=QDTQT=QDQT=A.
5. Normal Operators and the Complex Spectral Theorem
Over C, the appropriate generalization of self-adjoint operators is the class of normal operators.
Definition 9 (Normal operator).
A linear operator T on an inner product space V is normal if it commutes with its adjoint:
T∗T=TT∗.
A complex matrix A is normal if A∗A=AA∗.
Example 10.
Self-adjoint operators are normal (since T∗=T implies T∗T=T2=TT∗). Unitary operators (T∗T=I) are also normal. The rotation matrix (cosθsinθ−sinθcosθ) is a real normal matrix that is not symmetric (for θ∈/{0,π}), but as a complex matrix it is unitarily diagonalizable.
Lemma 14.
Let T be a normal operator. Then ∥Tv∥=∥T∗v∥ for every v∈V.
The operator T−λI is also normal (one checks that (T−λI)∗=T∗−λˉI and that these commute when T is normal). By Lemma 14 applied to T−λI, we have ∥(T−λI)v∥=∥(T∗−λˉI)v∥. The left side is 0 since Tv=λv, so T∗v=λˉv.□
Theorem 11 (Complex spectral theorem).
Let V be a finite-dimensional complex inner product space and T:V→V a normal operator. Then V has an orthonormal basis of eigenvectors of T. Equivalently, every normal matrix A∈Mn(C) is unitarily diagonalizable: there exists a unitary matrix U (with U∗U=I) such that U∗AU=diag(λ1,…,λn).
Proof.
We again use induction on n=dimV.Base case. For n=1 the result is trivial.Inductive step. Since C is algebraically closed, T has at least one eigenvalue λ1. Let v1 be a unit eigenvector, and set W=span{v1}.We claim W⊥ is T-invariant. Let u∈W⊥. For any w∈W, write w=cv1. By Lemma 15, T∗v1=λˉ1v1, so
since u∈W⊥. Hence Tu∈W⊥.The restriction T∣W⊥ is again normal on the (n−1)-dimensional inner product space W⊥. By the inductive hypothesis, W⊥ has an orthonormal eigenbasis {v2,…,vn}, and {v1,v2,…,vn} is the desired orthonormal eigenbasis of V.□
Remark 12.
Over R, a normal matrix need not be orthogonally diagonalizable (its eigenvalues may be non-real). For example, the rotation matrix above has eigenvalues e±iθ∈/R when θ∈/{0,π}. The real spectral theorem handles the real case by restricting to symmetric (self-adjoint) operators, which guarantees real eigenvalues.
6. The Spectral Decomposition
The spectral theorem allows us to express any self-adjoint operator as a weighted sum of orthogonal projections onto its eigenspaces.
Definition 13 (Orthogonal projection onto an eigenspace).
Let λ be an eigenvalue of a self-adjoint operator T, and let Eλ=ker(T−λI) be the corresponding eigenspace. The orthogonal projectionPλ:V→V is the linear map that projects every vector onto Eλ along Eλ⊥. In coordinates, if {u1,…,uk} is an orthonormal basis of Eλ, then
Pλ(v)=j=1∑k⟨v,uj⟩uj.
Theorem 14 (Spectral decomposition).
Let T be a self-adjoint operator on a finite-dimensional real inner product space V with distinct eigenvalues λ1,…,λs. Let Pi=Pλi be the orthogonal projection onto the eigenspace Eλi. Then:
Pi2=Pi and Pi∗=Pi for each i (each Pi is an orthogonal projection).
PiPj=0 for i=j (the projections are mutually orthogonal).
By the spectral theorem, V=Eλ1⊕Eλ2⊕⋯⊕Eλs, where the summands are mutually orthogonal. Every vector v∈V can be written uniquely as v=v1+v2+⋯+vs with vi∈Eλi.(1) Pi sends v to vi, so Pi2(v)=Pi(vi)=vi=Pi(v). For self-adjointness, ⟨Piv,w⟩=⟨vi,w⟩=⟨vi,wi⟩ (since vi⊥wj for j=i), and ⟨v,Piw⟩=⟨v,wi⟩=⟨vi,wi⟩ by the same reasoning. Hence Pi∗=Pi.(2) PiPj(v)=Pi(vj). Since vj∈Eλj and j=i, the component of vj in Eλi is 0, so Pi(vj)=0.(3) (P1+⋯+Ps)(v)=v1+⋯+vs=v.(4) Tv=T(v1+⋯+vs)=λ1v1+⋯+λsvs=λ1P1v+⋯+λsPsv.□
Example 15.
Let A=(100−1). The eigenvalues are λ1=1 and λ2=−1 with eigenspaces spanned by e1 and e2, respectively. The orthogonal projections are
P1=(1000),P2=(0001),
and indeed A=1⋅P1+(−1)⋅P2=P1−P2.
Remark 16.
The spectral decomposition T=∑λiPi makes it easy to compute functions of T. For any polynomial (or more generally any function defined on the spectrum), f(T)=∑f(λi)Pi. For instance, if all eigenvalues are positive, T1/2=∑λiPi.
7. Worked Example: A3×3Symmetric Matrix
We carry out the complete orthogonal diagonalization of the symmetric matrix
A=211121112.
Step 1: Find the eigenvalues. The characteristic polynomial is
det(A−λI)=det
Adding all three rows to the first row gives (4−λ)(111). Factoring out (4−λ) and subtracting the first row from the others, we obtain
det(A−λI)=(4−λ)(1−λ)2.
The eigenvalues are λ1=4 (multiplicity 1) and λ2=1 (multiplicity 2).
Step 2: Find eigenvectors. For λ1=4: solving (A−4I)v=0 gives v1=31111.
For λ2=1: solving (A−I)v=0 yields the system x1+x2+x3=0, a two-dimensional eigenspace. Two linearly independent solutions are w1=1−10 and w2=10−1.
Step 3: Orthonormalize. The vector v1 is already a unit vector and is automatically orthogonal to the λ2-eigenspace (by Theorem 8). We apply Gram–Schmidt to {w1,w2}:
One can verify: 4P1+P2=314+24−14−14−14+24−14−14−14+2=211121112=A.
8. Application to Quadratic Forms
A quadratic form on Rn is a function q:Rn→R of the form
q(x)=xTAx=i,j∑aijxixj,
where A is a symmetric matrix (we may always assume symmetry by replacing A with 21(A+AT)). The matrix A is called the matrix of the quadratic form.
Example 17.
The quadratic form q(x1,x2)=2x12+6x1x2+5x22 has matrix A=(2335). To classify this form, we orthogonally diagonalize A. The eigenvalues are λ=27±35, both positive, so q is positive definite.
Definition 18 (Classification of quadratic forms).
A quadratic form q(x)=xTAx (equivalently, its symmetric matrix A) is called:
positive definite if q(x)>0 for all x=0;
positive semidefinite if q(x)≥0 for all x;
negative definite if q(x)<0 for all x=0;
negative semidefinite if q(x)≤0 for all x;
indefinite if q takes both positive and negative values.
Theorem 19.
A real symmetric matrix A is positive definite if and only if all its eigenvalues are positive.
Proof.
By the spectral theorem, write A=Qdiag(λ1,…,λn)QT with Q orthogonal. Setting y=QTx, we have q(x)=yTdiag(λ1,…,λn)y=∑i=1nλiyi2.If all λi>0, then q(x)=∑λiyi2>0 whenever y=0 (equivalently, x=0). Conversely, if some λk≤0, choose x=Qek (so y=ek), giving q(x)=λk≤0.□
Theorem 20 (Sylvester's law of inertia).
Let A be a real symmetric n×n matrix with p positive eigenvalues, z zero eigenvalues, and q negative eigenvalues (so p+z+q=n). If B=MTAM for some invertible matrix M (not necessarily orthogonal), then B has the same numbers p, z, and q of positive, zero, and negative eigenvalues as A. The triple (p,z,q) is called the signature (or inertia) of A.
Proof.
By the spectral theorem, there exists an orthogonal Q such that QTAQ=D=diag(λ1,…,λn), where λ1,…,λp>0, λp+1=⋯=λp+z=0, and λp+z+1,…,λn<0.Suppose for contradiction that B=MTAM has p′ positive eigenvalues with p′>p (the case p′<p is symmetric). Let W+ be the span of eigenvectors of B corresponding to positive eigenvalues, so dimW+=p′. Let W0=span{Qep+1,…,Qen}, the eigenspace of A corresponding to non-positive eigenvalues; dimW0=n−p.Consider the map x↦Mx from W+ into Rn. Since p′+(n−p)>n, the subspaces M(W+) and W0 must have nontrivial intersection. Choose a nonzero v∈W+ with Mv∈W0. Then vTBv>0 (since v lies in the positive-definite subspace of B), but vTBv=(Mv)TA(Mv)≤0 (since Mv∈W0 where A is negative semidefinite). This is a contradiction.□
Remark 21.
Sylvester's law tells us that the signature is a complete invariant for quadratic forms under change of basis. Two real quadratic forms are equivalent (related by an invertible linear substitution) if and only if they have the same signature.
Remark 22.
The spectral theorem stands at the crossroads of algebra and analysis. In quantum mechanics, self-adjoint operators represent physical observables, and the spectral theorem guarantees that every measurement yields a real eigenvalue. In statistics, the spectral decomposition of covariance matrices underlies principal component analysis. The infinite-dimensional generalization of the spectral theorem, due to von Neumann and Stone, is a cornerstone of functional analysis.