Part 3 · Core Linear AlgebraChapter 975 min

Systems of Linear Equations

Ax = b, Gaussian elimination, and geometric meaning

Prerequisites

Learning objectives

  • Write a linear system in matrix form Ax = b
  • Solve by Gaussian elimination and back-substitution
  • Classify a system as unique, infinite, or no solution
  • Connect solvability to the normal equations of regression

Why solving linear systems is a core ML primitive

The moment you fit a model, you are almost always solving a system of linear equations under the hood. Ordinary least-squares regression? Its optimum is the solution of a linear system. The closed-form step of many classical methods — ridge regression, Gaussian processes, the Newton step of an optimizer — reduces to "solve Ax=bA\mathbf{x} = \mathbf{b}." Even when a modern network is trained by gradient descent rather than a direct solve, the local pictures it moves through are linear systems.

So the object of this chapter is the humble system of linear equations, and the one algorithm that solves it in full generality: Gaussian elimination. By the end you should be able to write a system in matrix form, run elimination by hand, read off which of the three possible outcomes you are in, and connect all of it to the normal equations that define linear regression.

Intuition: each equation is a constraint, the solution is where they meet

A single linear equation in two unknowns, say 2x+y=52x + y = 5, is not one point — it is a whole line of (x,y)(x, y) pairs that satisfy it. A second equation, xy=1x - y = 1, is another line. Solving the system

2x+y=5xy=1\begin{aligned} 2x + y &= 5 \\ x - y &= 1 \end{aligned}

means finding the points that satisfy both at once — the intersection of the two lines. Two lines in a plane usually cross at exactly one point, and indeed here x=2, y=1x = 2,\ y = 1 is the unique meeting point.

That geometric picture is the whole story, and it already predicts the three things that can happen. Two lines can cross once (one solution), be the same line (infinitely many solutions), or be parallel but distinct (no solution). In three unknowns each equation is a plane, and we intersect planes instead of lines — but the trichotomy is identical.

Interactive LabLinear-System Solver
Loading interactive lab…

Drag the equations above and watch the intersection move. Try to make the two lines parallel: the solution point shoots off to infinity and then vanishes — that is the "no solution" case appearing geometrically.

Formal definitions

The three row operations are the only moves allowed, and each one produces an equivalent system — same solution set as before:

The goal of Gaussian elimination is to use operation 3 (with occasional swaps) to drive entries below each pivot to zero — forward elimination — until the matrix is upper-triangular (row echelon form), then solve from the bottom row upward by back-substitution. If you keep going and also clear entries above each pivot and scale every pivot to 11, you reach the unique reduced row echelon form (RREF), which reads off the solution directly.

Worked example: solve a 2×2 system by elimination

Continuing to full RREF would scale both pivots to 11 and clear the entry above the second pivot, ending at [102011],\left[\begin{array}{cc|c} 1 & 0 & 2 \\ 0 & 1 & 1 \end{array}\right], whose right column is the solution (2,1)(2, 1).

The three outcomes, and how rank decides

Every linear system lands in exactly one of three cases. Which one is governed by rank — the number of pivots (independent rows) — of AA compared with the rank of the augmented matrix [Ab][A \mid \mathbf{b}] and the number of unknowns nn.

Geometrically, in two unknowns:

| Outcome | Rank picture | Two lines look like | | --- | --- | --- | | Unique | rank(A)=n=2\operatorname{rank}(A) = n = 2 | crossing at one point | | Infinite | rank(A)=1<2\operatorname{rank}(A) = 1 < 2, consistent | the same line (coincident) | | None | inconsistent | parallel but distinct |

A square matrix AA (m=nm = n) gives a unique solution for every b\mathbf{b} exactly when rank(A)=n\operatorname{rank}(A) = n — equivalently when AA is invertible, equivalently when detA0\det A \neq 0. When detA=0\det A = 0 the matrix is singular: depending on b\mathbf{b} you get either no solution or infinitely many, never a unique one.

ML use case: the normal equations of least squares

Here is where this pays off. In linear regression we have data XRm×nX \in \mathbb{R}^{m \times n} (mm examples, nn features) and targets yRm\mathbf{y} \in \mathbb{R}^m, and we want weights θ\boldsymbol{\theta} minimizing the squared error Xθy2\lVert X\boldsymbol{\theta} - \mathbf{y}\rVert^2. When m>nm > n this system Xθ=yX\boldsymbol{\theta} = \mathbf{y} is overdetermined — more equations than unknowns, typically no exact solution. Instead of an exact hit we ask for the best fit, and the minimizer satisfies the normal equations:

This is once again a square linear system Aθ=bA\boldsymbol{\theta} = \mathbf{b} with A=XXA = X^\top X (an n×nn \times n matrix) and b=Xy\mathbf{b} = X^\top \mathbf{y} — and it is solved by exactly the Gaussian elimination of this chapter. The rank story carries over too: XXX^\top X is invertible precisely when XX has full column rank (independent features). If two features are perfectly collinear, XXX^\top X is singular, elimination stalls, and the weights are not uniquely determined — the linear-algebra fingerprint of multicollinearity. We derive equation 9.1 properly in the regression chapter; for now, notice that "fit a line to data" is "solve a linear system."

NumPy: solving systems, and what goes wrong

The workhorse is np.linalg.solve(A, b), which runs a pivoted elimination (LU factorization) internally — never form the inverse by hand. It solves square, non-singular systems and raises LinAlgError on a singular matrix. Run this:

solve_systems.py

The first block confirms the promised trichotomy from code: matrix_rank(A) equals the number of unknowns, so the solution is unique, and A @ x reproduces b to floating-point tolerance. The second block shows the failure mode you must handle in practice — a singular AA makes solve raise, because there is no unique answer to return.

Summary

  • A system of linear equations is written Ax=bA\mathbf{x} = \mathbf{b}; each row is a constraint, and a solution is a point satisfying all of them at once.
  • Gaussian elimination uses three row operations on [Ab][A \mid \mathbf{b}] — swap, scale, add-a-multiple — to reach row echelon form (forward elimination), then solves bottom-up (back-substitution). Pushing to RREF reads the solution off directly.
  • Exactly three outcomes are possible — unique, infinitely many, or no solution — corresponding geometrically to lines/planes that cross once, coincide, or are parallel.
  • Rank decides it: unique iff rank(A)=rank([Ab])=n\operatorname{rank}(A) = \operatorname{rank}([A \mid \mathbf{b}]) = n; inconsistent iff the two ranks differ; infinite iff they agree but are <n< n (giving free variables).
  • Least-squares regression solves the normal equations XXθ=XyX^\top X \boldsymbol{\theta} = X^\top \mathbf{y} — a linear system — so "fitting a model" is "solving a system."
  • In NumPy use np.linalg.solve(A, b); it raises LinAlgError on singular AA, and watch np.linalg.cond(A) for the near-singular / ill-conditioned trap.

Active recall

Answer from memory before checking the lesson:

  1. Write the system 3x+2y=12, xy=13x + 2y = 12,\ x - y = 1 in matrix form Ax=bA\mathbf{x} = \mathbf{b}. What are AA and b\mathbf{b}?
  2. During elimination a row becomes [0 0 04][\,0\ 0\ 0 \mid 4\,]. What does this tell you about the number of solutions, and why?
  3. A system in n=3n = 3 unknowns has rank(A)=2\operatorname{rank}(A) = 2 and is consistent. How many free variables are there, and how many solutions?
  4. Which linear system does linear regression solve, and when is its coefficient matrix XXX^\top X singular?

Exercises

Level ARecall & basic calculation

Level BConceptual understanding

Level CDerivation & implementation

Level DResearch-thinking challenge