Part 1 · Algebra and Mathematical NotationChapter 255 min

Equations and Inequalities

Solving, rearranging, and reasoning about relationships

Learning objectives

  • Solve linear equations and rearrange formulas for any variable
  • Reason about inequalities and the direction-flip rule
  • Interpret constraints of the form the ML literature uses
  • Translate a word problem into an equation and back

Why solving for a variable is the whole game

Almost every quantity you will chase in machine learning is defined implicitly. A loss function tells you how wrong a model is; training asks for the weight that makes the loss as small as possible. A convergence test says "stop when the update is tiny"; you have to turn that sentence into a number you can compare against. A probability must lie between 00 and 11; a learning rate must be strictly positive. None of these are handed to you as "x=answerx = \text{answer}." They arrive as an equation or an inequality, and your job is to isolate the thing you care about.

That single skill — rearranging a relationship until the unknown stands alone — is what this chapter drills. It is the algebraic reflex underneath the normal equations of linear regression, underneath every "solve for the parameter" step, and underneath the boolean masks you will write in NumPy to enforce a constraint. We will treat three closely related objects:

  • Linear equations, which pin a variable to one exact value.
  • Rearranged formulas, where the same relationship is re-solved for a different variable.
  • Inequalities and systems, which describe regions of allowed values — constraints — rather than single points.

Intuition: an equation is a balanced scale

Think of = as a balance beam. The left side and the right side weigh exactly the same. You are allowed to do anything to the equation as long as you do it to both sides — add the same weight to each pan, or halve the contents of both pans — and the beam stays level. Solving is nothing more than a sequence of these balanced moves, chosen to strip everything away from the variable until it sits alone.

An inequality is the same scale, but now one side is genuinely heavier. Written a<ba < b, it says the left pan is lighter. Most balanced moves preserve the tilt: add the same weight to both pans and the heavier side stays heavier. There is exactly one move that reverses the tilt — multiplying or dividing both sides by a negative number — and forgetting that reversal is the single most common algebra bug in ML code (it silently flips a constraint). We will make it precise below.

Formal definitions

The two operations we lean on are the balanced moves, and they are the entire toolkit:

For inequalities the additive move is identical, but the multiplicative move carries a caveat encoded by the sign of cc:

A numerical example

Now the inequality version, to expose the sign-flip. Solve 3x+17-3x + 1 \ge 7: subtract 11 from both sides to get 3x6-3x \ge 6, then divide both sides by 3-3. Because we divided by a negative number, the relation flips from \ge to \le, giving x2x \le -2, the interval (,2](-\infty, -2]. Had we kept \ge we would have reported exactly the wrong half of the line.

Rearranging a formula: isolate any target

The deeper skill is re-solving a known formula for a different variable. The relationship does not change; which symbol stands alone does. The recipe is always the same: peel operations off the target in reverse order, undoing each with its inverse (add/subtract, multiply/divide) applied to both sides.

ML use case: solving for a parameter

This rearranging reflex is exactly what turns a learning objective into a formula you can compute. Take the simplest case: fit a single number μ\mu (a mean) to data x1,,xnx_1, \ldots, x_n by minimizing the squared-error loss L(μ)=i=1n(xiμ)2.L(\mu) = \sum_{i=1}^{n} (x_i - \mu)^2. Calculus (next part) hands us the condition for the minimum: the derivative is zero. That condition is itself an equation to be solved for the parameter:

The optimal μ\mu is just the sample mean — but notice how we got there: we set an expression to zero and isolated μ\mu with the same balanced moves as the toy example. The full linear-regression normal equations are this exact move done with matrices, and constraints join the story as inequalities: a learning rate is required to satisfy η>0\eta > 0, and a probability output must satisfy 0p10 \le p \le 1. A convergence test is an inequality too — "stop when the change is below tolerance," Δ<ε\lvert \Delta \rvert < \varepsilon — which we make concrete next.

NumPy: verify a solution, and turn an inequality into a mask

Two habits worth building now. First, never trust a hand-solved equation you have not checked numerically — substitute the answer back and assert the two sides agree. Second, an inequality in NumPy does not return a single truth value; applied to an array it returns a boolean mask, one True/False per element, which is how constraints get enforced in vectorized code. Run it:

solve_and_mask.py

The masking idea scales: arr[arr > 0] selects the positive entries, and np.clip(p, 0.0, 1.0) enforces the probability constraint by projecting stray values back into [0,1][0, 1]. You are writing inequalities as code.

Summary

  • Solving an equation is a chain of balanced moves: whatever you do to one side, do to the other. Isolate the unknown by undoing operations in reverse.
  • A linear equation ax+b=0ax + b = 0 (a0a \neq 0) has the single solution x=b/ax = -b/a; always verify by substituting back.
  • Rearranging a formula for a different variable is the same peeling process aimed at a new target; note any excluded values (no dividing by zero).
  • Inequalities describe sets of values. All balanced moves preserve the relation except multiplying or dividing by a negative, which flips it.
  • Setting a loss's derivative to zero gives an equation you solve for the parameter; constraints like η>0\eta > 0 and 0p10 \le p \le 1, and convergence tests Δ<ε\lvert\Delta\rvert < \varepsilon, are inequalities.
  • In NumPy, verify equations with assert np.isclose(...); an inequality on an array yields a boolean mask used to select or clip values.

Active recall

Answer from memory before checking the lesson:

  1. State the two balanced moves you may apply to any equation, and say which one behaves differently for an inequality.
  2. Solve 4x+210-4x + 2 \le 10 for xx. Which direction does the relation end up, and why?
  3. Rearrange y=mx+by = mx + b to isolate bb. What (if any) value must be excluded?
  4. A training loop stops when Δ<ε\lvert\Delta\rvert < \varepsilon. If delta is a NumPy array of per-parameter updates, what does np.abs(delta) < eps return, and how would you ask "have all parameters converged?"

Exercises

Level ARecall & basic calculation

Level BConceptual understanding

Level CDerivation & implementation

Level DResearch-thinking challenge