Why matrix multiplication works that way: when linear maps become matrices

The definition of matrix multiplication looks arbitrary — until you realize it encodes composition of linear maps. We compute a concrete composition by hand and watch the "row-times-column" rule emerge naturally.

Folio Official

March 1, 2026

Most students, on first encountering the definition of matrix multiplication, have the same reaction: why on earth is it defined this way?

Given $A = (a c b d)$ and $B = (e g f h)$ , the product is

A B = (a e + b g ce + d g a f + bh c f + d h) .

Multiplying entry-by-entry would be much simpler. But the "row-times-column" rule is not a convention chosen for aesthetic reasons. It is forced on us by the fact that matrices represent linear maps, and matrix multiplication represents composition.

1 Matrices as linear maps

A linear map $T : R^{2} \to R^{3}$ is completely determined by what it does to a basis. If we know $T (e_{1})$ and $T (e_{2})$ , we know everything.

Example 1.

Let

T (e_{1}) = 120

and

T (e_{2}) = 3 - 1 4

. Then for any vector

(x y)

T (x y) = x T (e_{1}) + y T (e_{2}) = x + 3 y 2 x - y 4 y = 120 3 - 1 4 (x y) .

The recipe is simple: place the images of the basis vectors as columns. The resulting matrix is the representation of $T$ .

2 Composition forces the rule

Here is the crux. Take two linear maps and compose them — apply one after the other — and compute the representation matrix of the composition by hand.

Example 2.

Let

T : R^{2} \to R^{3}

have matrix

A = 101011

and

S : R^{3} \to R^{2}

have matrix

B = (1001 - 1 - 1)

. The composition

S \circ T : R^{2} \to R^{2}

acts as follows.

e_{1}

S (T (e_{1})) = S 101 = 1 (10) + 0 (01) + 1 (- 1 - 1) = (0 - 1) .

e_{2}

S (T (e_{2})) = S 011 = 0 (10) + 1 (01) + 1 (- 1 - 1) = (- 1 0) .

So the matrix of

S \circ T

(0 - 1 - 1 0)

Now look carefully at what happened. To get the $(i, j)$ entry of the result, we took the dot product of the $i$ -th row of $B$ with the $j$ -th column of $A$ . That is precisely the definition of matrix multiplication:

B A = (1001 - 1 - 1) 101011 = (0 - 1 - 1 0) .

A perfect match.

3 Size constraints become obvious

An $m \times n$ matrix represents a map $R^{n} \to R^{m}$ , and a $p \times q$ matrix represents a map $R^{q} \to R^{p}$ . The composition $A \circ B$ makes sense only when the output of $B$ can be fed into $A$ — that is, when $p = n$ . The result is a map $R^{q} \to R^{m}$ , hence an $m \times q$ matrix.

4 Associativity is just composition

The associative law $(A B) C = A (BC)$ for matrices is a reflection of the associativity of function composition: $(f \circ g) \circ h = f \circ (g \circ h)$ . Applying three maps in succession, it does not matter which pair you compose first — the final result is the same. There is nothing to prove about matrices specifically; associativity is inherited from the nature of function composition itself.

5 Why $A B \neq = B A$

The non-commutativity of matrix multiplication is also transparent from the map perspective.

Example 3.

Let

R = (01 - 1 0)

(rotation by

90°

) and

S = (2001)

(horizontal stretch by factor

2

). Then

SR = (01 - 2 0), RS = (02 - 1 0) .

Stretching and then rotating gives a different result from rotating and then stretching. The matrices are not commuting because the geometric operations are not commuting.

6 Change of basis: the same map in a different outfit

The same linear map can look very different depending on the choice of basis. If $P$ is the change-of-basis matrix from basis $B$ to basis $B^{'}$ , then the representation matrix transforms as

A^{'} = P^{- 1} A P .

Remark 4.

Diagonalization is the search for a basis in which

A

becomes a diagonal matrix — a basis in which the map is nothing more than scaling along each axis. In other words, it is the search for the simplest possible description of a linear map.

7 The takeaway

The "row-times-column" rule for matrix multiplication is not an arbitrary convention. It is the unique rule that makes the product of two matrices equal the matrix of the composed maps. Size constraints, associativity, non-commutativity — they all follow from thinking of matrices not as tables of numbers but as proxies for linear maps.

Linear Algebra Algebra Between the Lines

Folio Official

Mathematics "between the lines" — exploring the intuition textbooks leave out, written in LaTeX on Folio.

1 followers·105 articles

Why matrix multiplication works that way: when linear maps become matrices

1 Matrices as linear maps

2 Composition forces the rule

3 Size constraints become obvious

4 Associativity is just composition

5 Why $A B \neq = B A$

6 Change of basis: the same map in a different outfit

7 The takeaway

Share your expertise with the world

More from Folio Official

Monetizing Your Articles — How to Earn Revenue Safely with Stripe Connect

Mastering Theorem Environments — theorem, definition, proof, and Friends

Combinatorial Designs and Latin Squares: Balanced Arrangements

Burnside's Lemma and P\'olya Enumeration: Counting Under Symmetry

1 Matrices as linear maps

2 Composition forces the rule

3 Size constraints become obvious

4 Associativity is just composition

5 WhyAB=BA

6 Change of basis: the same map in a different outfit

7 The takeaway

Share your expertise with the world

More from Folio Official

Monetizing Your Articles — How to Earn Revenue Safely with Stripe Connect

Mastering Theorem Environments — theorem, definition, proof, and Friends

Combinatorial Designs and Latin Squares: Balanced Arrangements

Burnside's Lemma and P\'olya Enumeration: Counting Under Symmetry

5 Why $A B \neq = B A$