The geometry that inner products unlock: orthogonality, projection, and least squares

A vector space, by itself, has no concept of length or angle. Inner products supply both — and with them come orthogonal projections, the Gram–Schmidt process, least squares, and the bridge to Fourier analysis.

Folio Official

March 1, 2026

Up to this point in a linear algebra course, the only operations available in a vector space are addition and scalar multiplication. There is no notion of "length," no notion of "angle," and no way to say whether two vectors are "perpendicular." All of that changes with a single additional structure: the inner product.

1 The definition

Definition 1.

An inner product on a real vector space

V

is a function

⟨ \cdot, \cdot ⟩ : V \times V \to R

satisfying:

$⟨ u, v ⟩ = ⟨ v, u ⟩$ (symmetry)
$⟨ a u + b v, w ⟩ = a ⟨ u, w ⟩ + b ⟨ v, w ⟩$ (linearity)
$⟨ v, v ⟩ \geq 0$ , with equality iff $v = 0$ (positive definiteness)

The standard inner product on $R^{n}$ , $⟨ u, v ⟩ = \sum u_{i} v_{i}$ , is the most familiar — but it is far from the only one.

Example 2 (Inner product on function spaces).

C [0, 1]

(continuous functions on

[0, 1]

), define

⟨ f, g ⟩ = \int_{0}^{1} f (x) g (x) d x .

This satisfies all three axioms. It gives us a notion of "length" for functions:

∥ f ∥ = \int_{0}^{1} f (x)^{2} d x

2 Length and angle

An inner product immediately yields:

∥ v ∥ cos θ = ⟨ v, v ⟩ (length / norm), = \frac{⟨ u , v ⟩}{∥ u ∥ \cdot ∥ v ∥} (angle) .

When $⟨ u, v ⟩ = 0$ , we have $θ = 90°$ , and the vectors are orthogonal.

Theorem 3 (Cauchy–Schwarz inequality).

∣ ⟨ u, v ⟩ ∣ \leq ∥ u ∥ \cdot ∥ v ∥.

This inequality is what makes the definition of $cos θ$ legitimate: it guarantees $∣ cos θ ∣ \leq 1$ .

3 Gram–Schmidt orthonormalization

Given any basis, the Gram–Schmidt process produces an orthonormal one — a basis whose vectors are mutually perpendicular and each of unit length.

Example 4.

Orthonormalize the basis

{v_{1}, v_{2}, v_{3}} = ⎩ ⎨ ⎧ 110, 101, 011 ⎭ ⎬ ⎫

R^{3}

Step 1. Set

u_{1} = v_{1} = 110

and normalize:

e_{1} = \frac{1}{2} 110

Step 2. Subtract from

v_{2}

its component along

e_{1}

u_{2} = v_{2} - ⟨ v_{2}, e_{1} ⟩ e_{1} = 101 - \frac{1}{2} 110 = 1/2 - 1/2 1,

then normalize to get

e_{2}

Step 3. Subtract from

v_{3}

its components along

e_{1}

and

e_{2}

u_{3} = v_{3} - ⟨ v_{3}, e_{1} ⟩ e_{1} - ⟨ v_{3}, e_{2} ⟩ e_{2},

then normalize to get

e_{3}

At each step, the idea is the same: strip away the components along directions already chosen. What remains is necessarily orthogonal to all of them.

4 Orthogonal projection: finding the closest point

One of the most powerful applications of inner products is orthogonal projection.

Given a subspace $W$ and a vector $v$ , the orthogonal projection $\hat{v}$ onto $W$ is the point in $W$ closest to $v$ .

Theorem 5.

{e_{1}, \dots, e_{k}}

is an orthonormal basis for

W

, then

\hat{v} = i = 1 \sum k ⟨ v, e_{i} ⟩ e_{i},

and the error

v - \hat{v}

is orthogonal to every vector in

W

5 Least squares

Suppose you have experimental data $(x_{1}, y_{1}), \dots, (x_{m}, y_{m})$ and want to fit a line $y = a + b x$ . When $m > 2$ , there is generally no line passing through all points — the system $A x = b$ is overdetermined.

The least-squares solution is the one that minimizes $∥ A x - b ∥^{2}$ . Geometrically, it projects $b$ onto the column space of $A$ and then solves the consistent system.

Theorem 6 (Normal equations).

The least-squares solution satisfies

A^{T} A x = A^{T} b .

Example 7.

Fit

y = a + b x

to the points

(0, 1)

(1, 2)

(2, 4)

A = 111012, b = 124 .

Solving

A^{T} A \hat{x} = A^{T} b

gives

\overset{a}{^} = 2/3

and

\hat{b} = 3/2

. The best-fit line is

y = \frac{2}{3} + \frac{3}{2} x

6 The bridge to Fourier analysis

Equip $C [- π, π]$ with the inner product $⟨ f, g ⟩ = \int_{- π}^{π} f (x) g (x) d x$ . The functions ${1, cos x, sin x, cos 2 x, sin 2 x, \dots}$ , suitably normalized, form an orthonormal system.

Projecting a function $f$ onto these basis functions gives the Fourier expansion:

f (x) \approx \frac{a _{0}}{2} + n = 1 \sum N (a_{n} cos n x + b_{n} sin n x),

where

a_{n} = \frac{1}{π} \int_{- π}^{π} f (x) cos n x d x = \frac{⟨ f , cos n x ⟩}{∥ cos n x ∥ ^{2}} .

Fourier series is orthogonal projection in an infinite-dimensional inner product space. The same principle that finds the best-fit line through three data points also decomposes a sound wave into its constituent frequencies.

7 The takeaway

An inner product gives a vector space geometry: length, angle, and orthogonality. With these come the Gram–Schmidt process, orthogonal projection, least squares, and Fourier analysis — all manifestations of a single idea. Whether you are fitting a regression line to data or decomposing a function into harmonics, you are projecting onto a subspace in an inner product space. One structure, one principle, and an extraordinary range of applications.

Linear Algebra Algebra Between the Lines

Folio Official

Mathematics "between the lines" — exploring the intuition textbooks leave out, written in LaTeX on Folio.

1 followers·107 articles

The geometry that inner products unlock: orthogonality, projection, and least squares

Folio Official

March 1, 2026

1 The definition

Definition 1.

An inner product on a real vector space

V

is a function

⟨ \cdot, \cdot ⟩ : V \times V \to R

satisfying:

$⟨ u, v ⟩ = ⟨ v, u ⟩$ (symmetry)
$⟨ a u + b v, w ⟩ = a ⟨ u, w ⟩ + b ⟨ v, w ⟩$ (linearity)
$⟨ v, v ⟩ \geq 0$ , with equality iff $v = 0$ (positive definiteness)

The standard inner product on $R^{n}$ , $⟨ u, v ⟩ = \sum u_{i} v_{i}$ , is the most familiar — but it is far from the only one.

Example 2 (Inner product on function spaces).

C [0, 1]

(continuous functions on

[0, 1]

), define

⟨ f, g ⟩ = \int_{0}^{1} f (x) g (x) d x .

This satisfies all three axioms. It gives us a notion of "length" for functions:

∥ f ∥ = \int_{0}^{1} f (x)^{2} d x

2 Length and angle

An inner product immediately yields:

∥ v ∥ cos θ = ⟨ v, v ⟩ (length / norm), = \frac{⟨ u , v ⟩}{∥ u ∥ \cdot ∥ v ∥} (angle) .

When $⟨ u, v ⟩ = 0$ , we have $θ = 90°$ , and the vectors are orthogonal.

Theorem 3 (Cauchy–Schwarz inequality).

∣ ⟨ u, v ⟩ ∣ \leq ∥ u ∥ \cdot ∥ v ∥.

This inequality is what makes the definition of $cos θ$ legitimate: it guarantees $∣ cos θ ∣ \leq 1$ .

3 Gram–Schmidt orthonormalization

Given any basis, the Gram–Schmidt process produces an orthonormal one — a basis whose vectors are mutually perpendicular and each of unit length.

Example 4.

Orthonormalize the basis

{v_{1}, v_{2}, v_{3}} = ⎩ ⎨ ⎧ 110, 101, 011 ⎭ ⎬ ⎫

R^{3}

Step 1. Set

u_{1} = v_{1} = 110

and normalize:

e_{1} = \frac{1}{2} 110

Step 2. Subtract from

v_{2}

its component along

e_{1}

u_{2} = v_{2} - ⟨ v_{2}, e_{1} ⟩ e_{1} = 101 - \frac{1}{2} 110 = 1/2 - 1/2 1,

then normalize to get

e_{2}

Step 3. Subtract from

v_{3}

its components along

e_{1}

and

e_{2}

u_{3} = v_{3} - ⟨ v_{3}, e_{1} ⟩ e_{1} - ⟨ v_{3}, e_{2} ⟩ e_{2},

then normalize to get

e_{3}

At each step, the idea is the same: strip away the components along directions already chosen. What remains is necessarily orthogonal to all of them.

4 Orthogonal projection: finding the closest point

One of the most powerful applications of inner products is orthogonal projection.

Given a subspace $W$ and a vector $v$ , the orthogonal projection $\hat{v}$ onto $W$ is the point in $W$ closest to $v$ .

Theorem 5.

{e_{1}, \dots, e_{k}}

is an orthonormal basis for

W

, then

\hat{v} = i = 1 \sum k ⟨ v, e_{i} ⟩ e_{i},

and the error

v - \hat{v}

is orthogonal to every vector in

W

5 Least squares

The least-squares solution is the one that minimizes $∥ A x - b ∥^{2}$ . Geometrically, it projects $b$ onto the column space of $A$ and then solves the consistent system.

Theorem 6 (Normal equations).

The least-squares solution satisfies

A^{T} A x = A^{T} b .

Example 7.

Fit

y = a + b x

to the points

(0, 1)

(1, 2)

(2, 4)

A = 111012, b = 124 .

Solving

A^{T} A \hat{x} = A^{T} b

gives

\overset{a}{^} = 2/3

and

\hat{b} = 3/2

. The best-fit line is

y = \frac{2}{3} + \frac{3}{2} x

6 The bridge to Fourier analysis

Projecting a function $f$ onto these basis functions gives the Fourier expansion:

f (x) \approx \frac{a _{0}}{2} + n = 1 \sum N (a_{n} cos n x + b_{n} sin n x),

where

a_{n} = \frac{1}{π} \int_{- π}^{π} f (x) cos n x d x = \frac{⟨ f , cos n x ⟩}{∥ cos n x ∥ ^{2}} .

7 The takeaway

Linear Algebra Algebra Between the Lines

Folio Official

Mathematics "between the lines" — exploring the intuition textbooks leave out, written in LaTeX on Folio.

1 followers·107 articles

The geometry that inner products unlock: orthogonality, projection, and least squares

1 The definition

2 Length and angle

3 Gram–Schmidt orthonormalization

4 Orthogonal projection: finding the closest point

5 Least squares

6 The bridge to Fourier analysis

7 The takeaway

Share your expertise with the world

More from Folio Official

What is "dimension," really? The truth about degrees of freedom

Cosets and quotient groups: the art of controlled forgetting

Matrices and Representation of Linear Maps

The Determinant: A Scalar Invariant of Square Matrices

The geometry that inner products unlock: orthogonality, projection, and least squares

1 The definition

2 Length and angle

3 Gram–Schmidt orthonormalization

4 Orthogonal projection: finding the closest point

5 Least squares

6 The bridge to Fourier analysis

7 The takeaway

Share your expertise with the world

More from Folio Official

What is "dimension," really? The truth about degrees of freedom

Cosets and quotient groups: the art of controlled forgetting

Matrices and Representation of Linear Maps

The Determinant: A Scalar Invariant of Square Matrices