Systems of Linear Equations
Ax = b, Gaussian elimination, and geometric meaning
Prerequisites
Learning objectives
- Write a linear system in matrix form Ax = b
- Solve by Gaussian elimination and back-substitution
- Classify a system as unique, infinite, or no solution
- Connect solvability to the normal equations of regression
Why solving linear systems is a core ML primitive
The moment you fit a model, you are almost always solving a system of linear equations under the hood. Ordinary least-squares regression? Its optimum is the solution of a linear system. The closed-form step of many classical methods — ridge regression, Gaussian processes, the Newton step of an optimizer — reduces to "solve ." Even when a modern network is trained by gradient descent rather than a direct solve, the local pictures it moves through are linear systems.
So the object of this chapter is the humble system of linear equations, and the one algorithm that solves it in full generality: Gaussian elimination. By the end you should be able to write a system in matrix form, run elimination by hand, read off which of the three possible outcomes you are in, and connect all of it to the normal equations that define linear regression.
Intuition: each equation is a constraint, the solution is where they meet
A single linear equation in two unknowns, say , is not one point — it is a whole line of pairs that satisfy it. A second equation, , is another line. Solving the system
means finding the points that satisfy both at once — the intersection of the two lines. Two lines in a plane usually cross at exactly one point, and indeed here is the unique meeting point.
That geometric picture is the whole story, and it already predicts the three things that can happen. Two lines can cross once (one solution), be the same line (infinitely many solutions), or be parallel but distinct (no solution). In three unknowns each equation is a plane, and we intersect planes instead of lines — but the trichotomy is identical.
Drag the equations above and watch the intersection move. Try to make the two lines parallel: the solution point shoots off to infinity and then vanishes — that is the "no solution" case appearing geometrically.
Formal definitions
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| Coefficient matrix | matrix | m×n | fixed | |
| Vector of unknowns (what we solve for) | vector | n×1 | variable | |
| Right-hand side (the constants) | vector | m×1 | fixed | |
| Augmented matrix (A with b appended) | matrix | m×(n+1) | derived | |
| Number of independent rows/pivots | integer | 1 | derived |
The three row operations are the only moves allowed, and each one produces an equivalent system — same solution set as before:
The goal of Gaussian elimination is to use operation 3 (with occasional swaps) to drive entries below each pivot to zero — forward elimination — until the matrix is upper-triangular (row echelon form), then solve from the bottom row upward by back-substitution. If you keep going and also clear entries above each pivot and scale every pivot to , you reach the unique reduced row echelon form (RREF), which reads off the solution directly.
Worked example: solve a 2×2 system by elimination
Continuing to full RREF would scale both pivots to and clear the entry above the second pivot, ending at whose right column is the solution .
The three outcomes, and how rank decides
Every linear system lands in exactly one of three cases. Which one is governed by rank — the number of pivots (independent rows) — of compared with the rank of the augmented matrix and the number of unknowns .
Geometrically, in two unknowns:
| Outcome | Rank picture | Two lines look like | | --- | --- | --- | | Unique | | crossing at one point | | Infinite | , consistent | the same line (coincident) | | None | inconsistent | parallel but distinct |
A square matrix () gives a unique solution for every exactly when — equivalently when is invertible, equivalently when . When the matrix is singular: depending on you get either no solution or infinitely many, never a unique one.
ML use case: the normal equations of least squares
Here is where this pays off. In linear regression we have data ( examples, features) and targets , and we want weights minimizing the squared error . When this system is overdetermined — more equations than unknowns, typically no exact solution. Instead of an exact hit we ask for the best fit, and the minimizer satisfies the normal equations:
This is once again a square linear system with (an matrix) and — and it is solved by exactly the Gaussian elimination of this chapter. The rank story carries over too: is invertible precisely when has full column rank (independent features). If two features are perfectly collinear, is singular, elimination stalls, and the weights are not uniquely determined — the linear-algebra fingerprint of multicollinearity. We derive equation 9.1 properly in the regression chapter; for now, notice that "fit a line to data" is "solve a linear system."
NumPy: solving systems, and what goes wrong
The workhorse is np.linalg.solve(A, b), which runs a pivoted elimination (LU
factorization) internally — never form the inverse by hand. It solves square,
non-singular systems and raises LinAlgError on a singular matrix. Run this:
The first block confirms the promised trichotomy from code: matrix_rank(A)
equals the number of unknowns, so the solution is unique, and A @ x reproduces
b to floating-point tolerance. The second block shows the failure mode you must
handle in practice — a singular makes solve raise, because there is no unique
answer to return.
Summary
- A system of linear equations is written ; each row is a constraint, and a solution is a point satisfying all of them at once.
- Gaussian elimination uses three row operations on — swap, scale, add-a-multiple — to reach row echelon form (forward elimination), then solves bottom-up (back-substitution). Pushing to RREF reads the solution off directly.
- Exactly three outcomes are possible — unique, infinitely many, or no solution — corresponding geometrically to lines/planes that cross once, coincide, or are parallel.
- Rank decides it: unique iff ; inconsistent iff the two ranks differ; infinite iff they agree but are (giving free variables).
- Least-squares regression solves the normal equations — a linear system — so "fitting a model" is "solving a system."
- In NumPy use
np.linalg.solve(A, b); it raisesLinAlgErroron singular , and watchnp.linalg.cond(A)for the near-singular / ill-conditioned trap.
Active recall
Answer from memory before checking the lesson:
- Write the system in matrix form . What are and ?
- During elimination a row becomes . What does this tell you about the number of solutions, and why?
- A system in unknowns has and is consistent. How many free variables are there, and how many solutions?
- Which linear system does linear regression solve, and when is its coefficient matrix singular?
Exercises
Level ARecall & basic calculation
Solve a 2×2 system by elimination
Solve the system
Enter the solution as x, y.
Another 2×2 solve
Solve
Enter as x, y.
Write a system in matrix form
The system is written as . What is the right-hand side vector ? Enter as b1, b2.
Count the free variables
A consistent system in unknowns has . How many free variables does its solution set have?
Reading an inconsistent row
After elimination, a row of the augmented matrix becomes . How many solutions does the system have?
Determinant and uniqueness
For a square system , a unique solution for every is guaranteed exactly when is which value?
Level BConceptual understanding
Classify: parallel lines
The two equations of a 2×2 system plot as parallel but distinct lines. How many solutions does the system have?
Rank decides the outcome
A system has unknowns with . Which outcome holds?
Shapes in the normal equations
In least squares the design matrix is with examples and features. In the normal equations , what is the shape of the coefficient matrix ?
When is XᵀX singular?
Explain, in one or two sentences, why becomes singular when two feature columns of are exactly proportional (perfectly collinear), and what this means for the regression weights.
Level CDerivation & implementation
Solve a 3×3 system in NumPy
Use np.linalg.solve to solve
Verify the solution with np.allclose(A @ x, b), then print ok.
Elimination to RREF by hand
Solve
by Gaussian elimination on the augmented matrix, showing the forward elimination step and back-substitution. Give the final solution.
Detect a singular system in NumPy
Build the singular system . Attempt np.linalg.solve inside a try/except np.linalg.LinAlgError, report the rank of versus the number of unknowns, and print ok.
Level DResearch-thinking challenge
Why not just invert XᵀX?
Textbooks write the least-squares weights as , yet mature libraries never form that inverse — they call a solver like np.linalg.lstsq. Explain (a) why explicitly inverting is numerically risky, referencing the condition number, and (b) what regularization (ridge, ) does to solvability and conditioning.