Vector Spaces and Linear Transformations
Span, independence, basis, rank, and the four subspaces
Prerequisites
Learning objectives
- Define span, linear independence, basis, and dimension
- Read a matrix as a linear transformation of space
- Reason about column space, null space, and rank
- Connect rank to information preserved by a layer
Why "where the basis lands" is the whole game
By now a matrix is, for you, a grid of numbers you can multiply against a vector. That view is correct but inert. The view that makes matrices click — the one that turns linear algebra from bookkeeping into a spatial intuition you can reason with — is this: a matrix is a linear transformation of space. It picks up every point, stretches and rotates and shears the whole grid, and sets it back down. And a transformation that acts on infinitely many points is fully described by a tiny amount of data: where the basis vectors land.
That single reframing answers questions that matter in ML. When a weight matrix maps a 4096-dimensional hidden state into a 4096-dimensional output, how much information can it actually carry? When we replace a big matrix with the product of two skinny ones — the trick behind LoRA adapters and embedding tables — what exactly are we giving up? The vocabulary for all of this is span, basis, rank, and null space, and every one of them is a statement about which directions a transformation keeps and which it destroys.
Intuition: a transformation is pinned down by where the basis lands
Start in the plane with the two standard basis vectors and . Every other vector is a linear combination of them: . Now here is the defining property of a linear transformation — it commutes with addition and scaling:
Feed through it and the property lets you pull the transformation apart:
Read that again. To know what does to any vector, you only need to know and — the coordinates of the same vector follow along for the ride. So if I tell you lands on and lands on , you can transform anything.
And where do we store those two landing spots? As the columns of a matrix:
That is the entire secret of matrix–vector multiplication: the columns of are the images of the basis vectors, and just recombines those columns using 's coordinates as the weights. A matrix is a table of "where the basis lands."
Drag the columns above and watch the whole grid deform with them. Try to make the two columns point along the same line — the grid collapses from a 2-D sheet onto a 1-D line, and a whole dimension of space is crushed to nothing. Hold that picture; it is exactly what rank deficiency means, and we are about to name it.
Formal definitions
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| All linear combinations of the vectors | set | subspace | derived | |
| A matrix / linear transformation | matrix | m×n | variable | |
| Column space (span of columns; reachable outputs) | set | ⊆ ℝ^m | derived | |
| Null space (inputs mapped to 0) | set | ⊆ ℝ^n | derived | |
| Dimension of the column space | integer | 1 | derived | |
| Dimension (size of any basis of V) | integer | 1 | derived |
These five subspace facts are tied together by one clean accounting identity, the rank–nullity theorem, for an matrix:
Read it as conservation of dimensions. You start with input dimensions. The transformation routes some of them into genuinely distinct outputs (that count is the rank) and collapses the rest to zero (that count is the nullity). Nothing appears or disappears; the input dimensions are split between "kept" and "crushed."
Numerical example: are these vectors independent, and what do they span?
Take and . Ask the definition's question: is there a nontrivial with ? Notice , so with — nontrivial. The pair is linearly dependent. Their span is not the plane but a single line, the direction : every combination is just a rescaling of one direction.
Now nudge the second vector: . Suppose . The two component equations are and . From the first, ; substitute into the second: , hence too. Only the trivial solution — the vectors are independent, they span all of , and together they form a basis of the plane. Stacked as columns, the first matrix has rank 1 and the second has rank 2. Independence and rank are the same question asked two ways.
Why the columns are the images of the basis vectors
This also explains the collapse you saw in the lab. If two columns are dependent (one is a multiple of the other), they span only a line, so : no matter what input you feed, the output is stuck on that line. The other input direction — the combination that the dependent columns cancel — is sent to and lives in the null space. Rank-deficient means some direction of input never comes out the far side.
ML use case: rank bounds capacity, and low-rank is a feature
A dense layer computes (plus a bias and a nonlinearity). Everything the linear part can express is confined to , whose dimension is . So the rank is a hard ceiling on representational capacity: a layer mapping but with can only ever produce outputs in a 10-dimensional sheet, no matter how the inputs vary. It has 990 directions of null space — 990 input combinations it silently erases. Low rank means lost information.
That loss is sometimes exactly what you want. Two ideas across modern ML lean on it deliberately:
- Embeddings as a low-dimensional basis. A vocabulary of 50,000 tokens does not need 50,000 independent directions; meaning lives on a much lower-dimensional manifold. An embedding table maps each token to, say, a 256-dimensional vector — choosing a small basis in which "similar" tokens sit close together. The whole premise is that the useful information is low-rank.
- Low-rank factorization (the LoRA intuition). A full weight update of shape has parameters. But if the useful update is low-rank, we can write where is and is with . Since , this factorization cannot exceed rank — and that is the point: it captures the low-rank part of the adaptation with parameters instead of . When and , that is a 256× reduction. The bet, borne out in practice, is that fine-tuning lives in a low-rank subspace, so the discarded directions were not carrying much.
The through-line: rank is the currency of information a linear map moves. Bound it low to save memory and it costs you expressiveness; that trade is favorable exactly when the signal was low-rank to begin with.
NumPy: measuring rank and testing independence
NumPy computes rank directly with np.linalg.matrix_rank, which counts singular
values above a numerical tolerance — the robust way to ask "how many independent
directions?" without hand-solving systems. Below we build a rank-deficient matrix
on purpose (a third column that is a combination of the first two), confirm its
rank, and use rank as an independence test. Run it:
The pattern matrix_rank(M) == M.shape[1] is the practical linear-independence
test for a set of column vectors: full column rank means no column is redundant.
Prefer matrix_rank over checking the determinant — the determinant only works
for square matrices and is numerically fragile, whereas rank is defined for any
shape and thresholds singular values sensibly.
Summary
- A matrix is a linear transformation: it moves all of space, and it is fully determined by where the basis vectors land — those landing spots are its columns. is the -th column.
- The span of a set is all its linear combinations. A set is linearly independent when no vector is redundant (the only combination giving is trivial). A basis is an independent spanning set; its size is the dimension.
- The column space is the span of the columns — the reachable outputs. The null space is the inputs crushed to . Rank is the dimension of the column space = number of independent columns.
- Rank–nullity: . Input dimensions are conserved, split between "kept" and "crushed."
- In ML, rank bounds capacity: a low-rank layer collapses dimensions and loses information. Low-rank factorization (LoRA, embeddings) turns that collapse into a deliberate compression when the signal is genuinely low-rank.
- In NumPy,
np.linalg.matrix_rank(A)measures rank;rank == n_colstests independence. Independence orthogonality, and least squares needs full column rank for a unique fit.
Active recall
Answer from memory before checking the lesson:
- Where do the standard basis vectors "land" under the matrix , and how does that relate to the columns of ?
- State the difference between the column space and the null space of , including which ambient space each lives in.
- A matrix has rank 3. What is the dimension of its null space, and why? Which theorem did you use?
- Give two vectors that are linearly independent but not orthogonal.
- In one sentence, why does replacing a weight matrix with a rank- factorization () save parameters, and what is the risk?
Exercises
Level ARecall & basic calculation
Are these two vectors independent?
Are and linearly independent? Enter 1 for independent or 0 for dependent.
Read the rank off the columns
The matrix has how many linearly independent columns, i.e. what is ?
Where does e_2 land?
For , compute where . Enter as x, y.
Rank–nullity arithmetic
A matrix has 5 columns and . What is ?
Dimension of a span
What is the dimension of in ?
Definition of the null space
The null space of is the set of vectors satisfying which equation?
Level BConceptual understanding
Column space vs. null space — which space?
Let be a matrix. The column space is a subspace of which space, and the null space is a subspace of which space?
Maximum possible rank
What is the largest value can take for a matrix ?
Independent but not orthogonal
True or false: 'If two vectors are linearly independent, they must be orthogonal.' Enter 1 for true or 0 for false, and be ready to justify.
Why low rank loses information
A linear layer has but . In one or two sentences, explain what this implies about the outputs the layer can produce and about information lost from the input.
When does Ax = b have a solution?
For a fixed matrix , the system has at least one solution exactly when:
Level CDerivation & implementation
Compute rank and test independence in NumPy
Build the matrix with columns , , and using np.column_stack. Print its rank with np.linalg.matrix_rank, decide whether the three columns are independent (via rank == n_cols), and print ok.
Find a basis for a column space
The matrix has columns , , . Identify a basis for and state .
Find a null-space vector by reasoning
For , find a nonzero vector with , and explain why the null space must be nonzero here.
Level DResearch-thinking challenge
Why low-rank adapters work (LoRA)
A LoRA adapter replaces a weight update with a product where , , and . (a) Prove . (b) Count the parameters saved for . (c) State the empirical hypothesis that makes this a good trade, and one situation where it would fail.
Rank collapse through stacked layers
Consider a purely linear network with each . (a) If , what can you say about ? (b) What does this imply about information flowing through the network, and (c) why do real networks insert nonlinearities between layers?