Matrices
Grids of numbers that move space
Prerequisites
Learning objectives
- Read matrix shapes and index entries with (row, column)
- Compute matrix–vector and matrix–matrix products by hand
- Track dimensions through a chain of products
- Recognize transpose, identity, diagonal, and symmetric matrices
Why matrices run machine learning
Every forward pass of every neural network is, underneath, a sequence of matrix multiplications. A dense layer is . A batch of inputs through that layer is one big matrix product . An attention block computes scores as . Stack more layers and you are simply chaining more matmuls. If the vector was the noun of linear algebra, the matrix is the verb: it is the object that acts on vectors.
Because of this, the single most valuable skill in this chapter is not arithmetic — NumPy does the arithmetic — it is shape reasoning. Experienced ML engineers debug models by tracking shapes through a computation the way you track types through a function signature. A shape mismatch is the linear-algebra equivalent of a type error, and learning to see it before you run the code is a genuine superpower. We will emphasize it relentlessly.
We meet the matrix two ways at once:
- As a grid — a rectangular table of numbers with rows and columns.
- As an operator — a machine that eats an -vector and produces an -vector.
Fluency means switching between "table of numbers" and "thing that transforms vectors" without friction.
Intuition: a grid that transforms space
Picture a matrix as a rule for moving every point in the plane. The matrix stretches everything horizontally by a factor of 2 and leaves the vertical direction alone. Feed it the point and it returns . The columns of tell you exactly where the two basis arrows land: the first column is where goes, the second column is where goes. That is the whole secret of the operator view — the columns of a matrix are the images of the basis vectors, and everything else follows by linear combination.
The first lab lets you build a product entry by entry and watch which row and which column each output cell comes from. Play with it before we write the formal rule.
Notice the pattern: the entry in output row , column is formed from row of the left matrix and column of the right matrix. Hold onto that; it is the definition.
Formal definitions
The row-first convention is worth burning in: A[i][j] is row , column ,
and NumPy's A.shape reports (rows, cols) in that order. Swapping the two is
the most common indexing bug in ML code.
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| A matrix | matrix | m×n | variable | |
| Entry in row i, column j (a number) | scalar | 1 | variable | |
| Transpose (rows become columns) | matrix | n×m | variable | |
| n×n identity matrix | matrix | n×n | fixed | |
| Trace: sum of diagonal entries | scalar | 1 | operation | |
| Matrix–vector product | vector | m×1 | operation | |
| Matrix–matrix product | matrix | m×p | operation |
Addition and scalar multiplication
Two matrices of the same shape add entry-by-entry, and any matrix scales by a number entry-by-entry:
These are exactly the vector rules applied to a grid. Like vector addition, matrix addition demands identical shapes — a and a cannot be added.
The matrix–vector product
There are two equally important ways to read this formula, and switching between them is the heart of the chapter.
- Row view (dot products). Entry of is the dot product of row of with . So stacks dot products.
- Column view (linear combination). is a weighted sum of the columns of , with the entries of as weights. We derive this below.
The matrix–matrix product
This inner/outer rule is the one you will use hundreds of times. Read a chain left to right and cross out each matching inner pair: If at any junction the inner numbers disagree, the product does not exist — no arithmetic required to know it fails.
Transpose, identity, diagonal, symmetric, trace
A few named square matrices earn their own vocabulary:
- The identity has s on the diagonal and s elsewhere; it is the "do nothing" operator: and .
- A diagonal matrix is zero off the diagonal; multiplying by it just scales each coordinate independently.
- A symmetric matrix satisfies (so ). Covariance matrices and Gram matrices are always symmetric.
- The trace sums the diagonal. It is only defined for square matrices.
A 2×2 product, worked by hand
Derivation: Ax is a linear combination of the columns
The column view is the one that makes neural networks click, so let us derive it. Write by its columns (each an -vector), and .
Two readings of the same product, then: rows dotted with (compute each output number) versus columns combined by (understand what the matrix does to space). Keep both.
ML use case: layers, batches, and attention are all shapes
Every one of these is the same operation. Master the shape rule once and you read model code like prose.
The transformation view
The second lab shows the operator picture directly: a matrix bends the plane, and you watch a shape and the basis arrows move as you edit the four entries. Try making a pure rotation, a shear, and a reflection, and watch where the columns (the basis-vector images) land.
Set the matrix to the identity and nothing moves — the "do nothing" operator. Make it diagonal and each axis scales on its own. This is the same from the grid view, seen as an action.
NumPy: three ways to the same product
In NumPy a matrix is a 2-D array; A.shape is (rows, cols). Let us compute a
matrix product with an explicit triple loop (what the entry formula says
literally), then with the @ operator, and confirm they agree. Run it:
The triple loop is scalar multiply-adds, and it is only for seeing the
formula. In real code you always write A @ B: NumPy dispatches to a tuned
BLAS routine that runs the same arithmetic in compiled, cache-aware code, orders
of magnitude faster. Prefer @; reach for loops only to explain.
Summary
- A matrix is both a grid of numbers
(entry =
A[i][j], row first) and an operator taking -vectors to -vectors. - Same-shape matrices add and scale entry-by-entry.
- reads two ways: as dot products of rows with , and as a linear combination of the columns of weighted by .
- needs matching inner dimensions ; track shapes through a chain by cancelling adjacent inner pairs. It is not commutative.
- Transpose flips shape () and reverses products; the identity does nothing, diagonal matrices scale axes, symmetric means , and the trace sums the diagonal.
- In ML a layer is , a batch is
, and attention scores are
— all read off by the shape rule. Use
A @ Band keep(n,)distinct from(n, 1).
Active recall
Answer from memory before checking the lesson:
Ahas shape(4, 7)andBhas shape(7, 3). What is the shape ofA @ B? What aboutB @ A?- Give the two readings of — one in terms of rows, one in terms of columns.
- Why does a batched dense layer use rather than ? Reason purely from shapes.
- What is , and for which matrices is it defined?
- In NumPy, what does
A[i][j]mean — row column , or column row ?
Exercises
Level ARecall & basic calculation
Shape of a matrix product
has shape and has shape . What is the shape of ? Enter as (rows, cols).
Matrix–vector product by hand
Let and . Compute . Enter as y1, y2.
Indexing an entry
For , what is A[1][2] using 0-based indexing (as in NumPy: row first, column second)?
Shape after transpose
has shape . What is the shape of ? Enter as (rows, cols).
Matrix addition
Compute for and . Enter the entries row by row: a11, a12, a21, a22.
Trace of a matrix
Compute for .
Level BConceptual understanding
Tracking shapes through a chain
With of shape , of shape , and of shape , what is the shape of ? Enter as (rows, cols).
Why AB is not BA
Let have shape and have shape . Which statement is correct?
Ax as a combination of columns
Using the column view, compute for and : form . Enter as y1, y2.
The identity does nothing
For of shape and the identity, what is ?
The batch layer shape
A dense layer has weights of shape (20 outputs, 10 inputs). A batch of data has shape (64 examples, each with 10 features). What is the shape of ? Enter as (rows, cols).
Level CDerivation & implementation
Implement matmul: loop vs @
Write matmul_loop(A, B) using an explicit triple loop over (the entry formula ), and confirm it agrees with A @ B on random matrices with a fixed seed. Assert the output shape is (A.shape[0], B.shape[1]), then print ok.
Matrix–vector product, both views
Implement matvec_rows(A, x) (stack of row dot products) and matvec_cols(A, x) (weighted sum of columns), and confirm both equal A @ x. Use a fixed seed with of shape and of length . Print ok.
Transpose reverses a product
Verify numerically that (and that the naive generally is not equal, and may even be a shape error). Use a fixed seed with of shape and of shape . Print ok.
A Gram matrix is symmetric
For any data matrix of shape , show numerically that the Gram matrix is square, has shape , and is symmetric (). Use a fixed seed with , , and print ok.
Level DResearch-thinking challenge
Attention, purely by shape
Self-attention computes , where each have shape (here = sequence length, = head dimension). Softmax is applied row-wise and does not change shape. Working purely from the inner/outer rule, (a) give the shape of the score matrix , (b) give the shape of the final output, and (c) explain in one sentence why the score matrix is regardless of .
Debug a stacked-layer shape chain
An engineer stacks two dense layers on a batch. The data is of shape ; layer 1 has weights of shape ; layer 2 has weights of shape . They write Z = X @ W1 @ W2 and get a shape error on the first product. Explain why it fails, write the corrected expression using transposes, and give the shape after each matmul.