Equations and Inequalities
Solving, rearranging, and reasoning about relationships
Prerequisites
Learning objectives
- Solve linear equations and rearrange formulas for any variable
- Reason about inequalities and the direction-flip rule
- Interpret constraints of the form the ML literature uses
- Translate a word problem into an equation and back
Why solving for a variable is the whole game
Almost every quantity you will chase in machine learning is defined implicitly. A loss function tells you how wrong a model is; training asks for the weight that makes the loss as small as possible. A convergence test says "stop when the update is tiny"; you have to turn that sentence into a number you can compare against. A probability must lie between and ; a learning rate must be strictly positive. None of these are handed to you as "." They arrive as an equation or an inequality, and your job is to isolate the thing you care about.
That single skill — rearranging a relationship until the unknown stands alone — is what this chapter drills. It is the algebraic reflex underneath the normal equations of linear regression, underneath every "solve for the parameter" step, and underneath the boolean masks you will write in NumPy to enforce a constraint. We will treat three closely related objects:
- Linear equations, which pin a variable to one exact value.
- Rearranged formulas, where the same relationship is re-solved for a different variable.
- Inequalities and systems, which describe regions of allowed values — constraints — rather than single points.
Intuition: an equation is a balanced scale
Think of = as a balance beam. The left side and the right side weigh exactly the
same. You are allowed to do anything to the equation as long as you do it to both
sides — add the same weight to each pan, or halve the contents of both pans — and
the beam stays level. Solving is nothing more than a sequence of these balanced
moves, chosen to strip everything away from the variable until it sits alone.
An inequality is the same scale, but now one side is genuinely heavier. Written , it says the left pan is lighter. Most balanced moves preserve the tilt: add the same weight to both pans and the heavier side stays heavier. There is exactly one move that reverses the tilt — multiplying or dividing both sides by a negative number — and forgetting that reversal is the single most common algebra bug in ML code (it silently flips a constraint). We will make it precise below.
Formal definitions
The two operations we lean on are the balanced moves, and they are the entire toolkit:
For inequalities the additive move is identical, but the multiplicative move carries a caveat encoded by the sign of :
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| The unknown we solve for | scalar | 1 | variable | |
| Known constants (coefficients) | scalar | 1 | fixed | |
| Equality: both sides identical | relation | — | fixed | |
| Inequality relations (region of values) | relation | — | fixed | |
| A small positive threshold (tolerance) | scalar | 1 | fixed | |
| Open interval: all x greater than 3 | set | — | variable |
A numerical example
Now the inequality version, to expose the sign-flip. Solve : subtract from both sides to get , then divide both sides by . Because we divided by a negative number, the relation flips from to , giving , the interval . Had we kept we would have reported exactly the wrong half of the line.
Rearranging a formula: isolate any target
The deeper skill is re-solving a known formula for a different variable. The relationship does not change; which symbol stands alone does. The recipe is always the same: peel operations off the target in reverse order, undoing each with its inverse (add/subtract, multiply/divide) applied to both sides.
ML use case: solving for a parameter
This rearranging reflex is exactly what turns a learning objective into a formula you can compute. Take the simplest case: fit a single number (a mean) to data by minimizing the squared-error loss Calculus (next part) hands us the condition for the minimum: the derivative is zero. That condition is itself an equation to be solved for the parameter:
The optimal is just the sample mean — but notice how we got there: we set an expression to zero and isolated with the same balanced moves as the toy example. The full linear-regression normal equations are this exact move done with matrices, and constraints join the story as inequalities: a learning rate is required to satisfy , and a probability output must satisfy . A convergence test is an inequality too — "stop when the change is below tolerance," — which we make concrete next.
NumPy: verify a solution, and turn an inequality into a mask
Two habits worth building now. First, never trust a hand-solved equation you
have not checked numerically — substitute the answer back and assert the two
sides agree. Second, an inequality in NumPy does not return a single truth value;
applied to an array it returns a boolean mask, one True/False per element,
which is how constraints get enforced in vectorized code. Run it:
The masking idea scales: arr[arr > 0] selects the positive entries, and
np.clip(p, 0.0, 1.0) enforces the probability constraint by projecting stray
values back into . You are writing inequalities as code.
Summary
- Solving an equation is a chain of balanced moves: whatever you do to one side, do to the other. Isolate the unknown by undoing operations in reverse.
- A linear equation () has the single solution ; always verify by substituting back.
- Rearranging a formula for a different variable is the same peeling process aimed at a new target; note any excluded values (no dividing by zero).
- Inequalities describe sets of values. All balanced moves preserve the relation except multiplying or dividing by a negative, which flips it.
- Setting a loss's derivative to zero gives an equation you solve for the parameter; constraints like and , and convergence tests , are inequalities.
- In NumPy, verify equations with
assert np.isclose(...); an inequality on an array yields a boolean mask used to select or clip values.
Active recall
Answer from memory before checking the lesson:
- State the two balanced moves you may apply to any equation, and say which one behaves differently for an inequality.
- Solve for . Which direction does the relation end up, and why?
- Rearrange to isolate . What (if any) value must be excluded?
- A training loop stops when . If
deltais a NumPy array of per-parameter updates, what doesnp.abs(delta) < epsreturn, and how would you ask "have all parameters converged?"
Exercises
Level ARecall & basic calculation
Solve a one-step linear equation
Solve for .
Variables on both sides
Solve for .
Solve a linear inequality
Solve and give the threshold value of (the boundary of the solution interval).
The sign-flip rule
Solve and give the boundary value of . (Remember what happens to the relation when you divide by a negative.)
Rearrange the line for the intercept
The line is . Solve for the intercept in terms of , , and .
Formula from ax + b = 0
For the general linear equation with , what is the unique solution ?
Level BConceptual understanding
When does the relation flip?
Which single operation, applied to both sides of an inequality, reverses its direction (e.g. turns into )?
Constraints as inequalities
In machine learning, a learning rate must satisfy and a predicted probability must satisfy . In one or two sentences, explain why these are written as inequalities rather than equations, and what would go wrong if a value violated its constraint.
Inequality becomes a boolean mask
In NumPy, p is a 1-D array of shape . What does the expression p >= 0.5 evaluate to?
Translate a word problem
A batch has examples. Each epoch processes the whole batch once, and you want to run enough epochs that the model sees at least examples total. Write an inequality for the number of epochs in terms of , then solve it for .
Level CDerivation & implementation
Verify a hand solution in NumPy
You solved by hand and got . Write NumPy code that substitutes into both sides, asserts they agree with np.isclose, and prints ok.
Rearrange and derive: solve for the parameter
The standardization (z-score) formula is with . Derive the formula that recovers the original value from a given , showing each balanced move.
Enforce a probability constraint with a mask
Generate 8 values with a fixed seed via rng = np.random.default_rng(0) and raw = rng.standard_normal(8) (these can fall outside ). Build a boolean mask of which entries already satisfy , then use np.clip to force all entries into . Assert every clipped value satisfies the constraint and print ok.
Level DResearch-thinking challenge
A system of constraints defines a feasible region
A tiny training budget imposes two simultaneous constraints on the number of training steps : memory allows , and you need enough steps to converge, . (1) Describe the set of valid as an interval. (2) Now suppose someone also requires . Explain what happens to the feasible region and what that means practically. (3) Relate this to why an optimization problem can be infeasible.