Equations and Inequalities

Why solving for a variable is the whole game

Almost every quantity you will chase in machine learning is defined implicitly. A loss function tells you how wrong a model is; training asks for the weight that makes the loss as small as possible. A convergence test says "stop when the update is tiny"; you have to turn that sentence into a number you can compare against. A probability must lie between $0$ and $1$ ; a learning rate must be strictly positive. None of these are handed to you as " $x = \text{answer}$ ." They arrive as an equation or an inequality, and your job is to isolate the thing you care about.

That single skill — rearranging a relationship until the unknown stands alone — is what this chapter drills. It is the algebraic reflex underneath the normal equations of linear regression, underneath every "solve for the parameter" step, and underneath the boolean masks you will write in NumPy to enforce a constraint. We will treat three closely related objects:

Linear equations, which pin a variable to one exact value.
Rearranged formulas, where the same relationship is re-solved for a different variable.
Inequalities and systems, which describe regions of allowed values — constraints — rather than single points.

Intuition: an equation is a balanced scale

Think of = as a balance beam. The left side and the right side weigh exactly the same. You are allowed to do anything to the equation as long as you do it to both sides — add the same weight to each pan, or halve the contents of both pans — and the beam stays level. Solving is nothing more than a sequence of these balanced moves, chosen to strip everything away from the variable until it sits alone.

An inequality is the same scale, but now one side is genuinely heavier. Written $a < b$ , it says the left pan is lighter. Most balanced moves preserve the tilt: add the same weight to both pans and the heavier side stays heavier. There is exactly one move that reverses the tilt — multiplying or dividing both sides by a negative number — and forgetting that reversal is the single most common algebra bug in ML code (it silently flips a constraint). We will make it precise below.

Formal definitions

The two operations we lean on are the balanced moves, and they are the entire toolkit:

\text{if } L = R \text{ then } L + c = R + c \quad\text{and}\quad cL = cR \ \ (c \neq 0)

(2.1)

For inequalities the additive move is identical, but the multiplicative move carries a caveat encoded by the sign of $c$ :

a < b \iff a + c < b + c, \qquad a < b \iff \begin{cases} ca < cb & c > 0 \\\\ ca > cb & c < 0 \end{cases}

(2.2)

Symbol	Meaning	Type	Shape	Role
$x$	The unknown we solve for	scalar	1	variable
$a, b, c$	Known constants (coefficients)	scalar	1	fixed
$=$	Equality: both sides identical	relation	—	fixed
$<,\ \le,\ >,\ \ge$	Inequality relations (region of values)	relation	—	fixed
$\varepsilon$	A small positive threshold (tolerance)	scalar	1	fixed
$(3, \infty)$	Open interval: all x greater than 3	set	—	variable

A numerical example

Now the inequality version, to expose the sign-flip. Solve $-3x + 1 \ge 7$ : subtract $1$ from both sides to get $-3x \ge 6$ , then divide both sides by $-3$ . Because we divided by a negative number, the relation flips from $\ge$ to $\le$ , giving $x \le -2$ , the interval $(-\infty, -2]$ . Had we kept $\ge$ we would have reported exactly the wrong half of the line.

Rearranging a formula: isolate any target

The deeper skill is re-solving a known formula for a different variable. The relationship does not change; which symbol stands alone does. The recipe is always the same: peel operations off the target in reverse order, undoing each with its inverse (add/subtract, multiply/divide) applied to both sides.

solve the line equation for the slope, then for x

Start from the familiar line $y = mx + b$ , and suppose we now want the slope $m$ in terms of everything else. The target $m$ is currently multiplied by $x$ and has $b$ added. Undo in reverse: first strip the $+b$ , then strip the $\times x$ . $y = mx + b \;\xrightarrow{-b}\; y - b = mx \;\xrightarrow{\div x}\; m = \frac{y - b}{x} \quad (x \neq 0).$ The proviso $x \neq 0$ is not decoration: dividing by a variable is only legal when that variable is nonzero, and flagging the excluded case is part of a correct rearrangement.

Solving the same equation for $x$ instead follows the identical peeling logic — subtract $b$ , then divide by $m$ — and yields $x = \frac{y - b}{m} \quad (m \neq 0).$ One relationship, three different "subjects," each obtained by the same mechanical undoing.

ML use case: solving for a parameter

This rearranging reflex is exactly what turns a learning objective into a formula you can compute. Take the simplest case: fit a single number $\mu$ (a mean) to data $x_1, \ldots, x_n$ by minimizing the squared-error loss $L(\mu) = \sum_{i=1}^{n} (x_i - \mu)^2.$ Calculus (next part) hands us the condition for the minimum: the derivative is zero. That condition is itself an equation to be solved for the parameter:

\frac{dL}{d\mu} = -2\sum_{i=1}^{n} (x_i - \mu) = 0 \;\Longrightarrow\; \sum_{i=1}^{n} x_i - n\mu = 0 \;\Longrightarrow\; \mu = \frac{1}{n}\sum_{i=1}^{n} x_i

(2.3)

The optimal $\mu$ is just the sample mean — but notice how we got there: we set an expression to zero and isolated $\mu$ with the same balanced moves as the toy example. The full linear-regression normal equations are this exact move done with matrices, and constraints join the story as inequalities: a learning rate is required to satisfy $\eta > 0$ , and a probability output must satisfy $0 \le p \le 1$ . A convergence test is an inequality too — "stop when the change is below tolerance," $\lvert \Delta \rvert < \varepsilon$ — which we make concrete next.

NumPy: verify a solution, and turn an inequality into a mask

Two habits worth building now. First, never trust a hand-solved equation you have not checked numerically — substitute the answer back and assert the two sides agree. Second, an inequality in NumPy does not return a single truth value; applied to an array it returns a boolean mask, one True/False per element, which is how constraints get enforced in vectorized code. Run it:

solve_and_mask.py

import numpy as np

rng = np.random.default_rng(0)

# 1) Verify the hand-solved equation 5x - 3 = 2x + 9  ->  x = 4
x = 4.0
lhs = 5 * x - 3
rhs = 2 * x + 9
print("lhs =", lhs, " rhs =", rhs)      # 17.0  17.0
assert np.isclose(lhs, rhs), "x = 4 must satisfy the equation"

# 2) An inequality applied to an array is a boolean MASK, not one bool.
p = rng.random(6)                       # six values, each in [0, 1)
valid = (p >= 0.0) & (p <= 1.0)         # elementwise constraint check
print("mask dtype:", valid.dtype)       # bool
print("all valid probs:", bool(valid.all()))   # True

# 3) A convergence test is an inequality against a tolerance epsilon.
eps = 1e-8
delta = 3e-9
converged = abs(delta) < eps            # a single boolean here
print("converged:", converged)          # True
assert converged
print("ok")

The masking idea scales: arr[arr > 0] selects the positive entries, and np.clip(p, 0.0, 1.0) enforces the probability constraint by projecting stray values back into $[0, 1]$ . You are writing inequalities as code.

Summary

Solving an equation is a chain of balanced moves: whatever you do to one side, do to the other. Isolate the unknown by undoing operations in reverse.
A linear equation $ax + b = 0$ ( $a \neq 0$ ) has the single solution $x = -b/a$ ; always verify by substituting back.
Rearranging a formula for a different variable is the same peeling process aimed at a new target; note any excluded values (no dividing by zero).
Inequalities describe sets of values. All balanced moves preserve the relation except multiplying or dividing by a negative, which flips it.
Setting a loss's derivative to zero gives an equation you solve for the parameter; constraints like $\eta > 0$ and $0 \le p \le 1$ , and convergence tests $\lvert\Delta\rvert < \varepsilon$ , are inequalities.
In NumPy, verify equations with assert np.isclose(...); an inequality on an array yields a boolean mask used to select or clip values.

Active recall

Answer from memory before checking the lesson:

State the two balanced moves you may apply to any equation, and say which one behaves differently for an inequality.
Solve $-4x + 2 \le 10$ for $x$ . Which direction does the relation end up, and why?
Rearrange $y = mx + b$ to isolate $b$ . What (if any) value must be excluded?
A training loop stops when $\lvert\Delta\rvert < \varepsilon$ . If delta is a NumPy array of per-parameter updates, what does np.abs(delta) < eps return, and how would you ask "have all parameters converged?"

Exercises

Level ARecall & basic calculation

Level AHand calculationch02-A1

Solve a one-step linear equation

Solve $3x + 5 = 20$ for $x$ .

Level AHand calculationch02-A2

Variables on both sides

Solve $5x - 3 = 2x + 9$ for $x$ .

Level AHand calculationch02-A3

Solve a linear inequality

Solve $2x - 6 > 0$ and give the threshold value of $x$ (the boundary of the solution interval).

Level AHand calculationch02-A4

The sign-flip rule

Solve $-2x > 6$ and give the boundary value of $x$ . (Remember what happens to the relation when you divide by a negative.)

Level AEquation interpretationch02-A5

Rearrange the line for the intercept

The line is $y = mx + b$ . Solve for the intercept $b$ in terms of $y$ , $m$ , and $x$ .

Level AEquation interpretationch02-A6

Formula from ax + b = 0

For the general linear equation $ax + b = 0$ with $a \neq 0$ , what is the unique solution $x$ ?

Level BConceptual understanding

Level BEquation interpretationch02-B1

When does the relation flip?

Which single operation, applied to both sides of an inequality, reverses its direction (e.g. turns $<$ into $>$ )?

Level BML applicationch02-B2

Constraints as inequalities

In machine learning, a learning rate must satisfy $\eta > 0$ and a predicted probability must satisfy $0 \le p \le 1$ . In one or two sentences, explain why these are written as inequalities rather than equations, and what would go wrong if a value violated its constraint.

Level BShape reasoningch02-B3

Inequality becomes a boolean mask

In NumPy, p is a 1-D array of shape $(6,)$ . What does the expression p >= 0.5 evaluate to?

Level BML applicationch02-B4

Translate a word problem

A batch has $n$ examples. Each epoch processes the whole batch once, and you want to run enough epochs that the model sees at least $10{,}000$ examples total. Write an inequality for the number of epochs $E$ in terms of $n$ , then solve it for $E$ .

Level CDerivation & implementation

Level CNumPy implementationch02-C1

Verify a hand solution in NumPy

You solved $7x - 4 = 3x + 12$ by hand and got $x = 4$ . Write NumPy code that substitutes $x = 4$ into both sides, asserts they agree with np.isclose, and prints ok.

Level CDerivationch02-C2

Rearrange and derive: solve for the parameter

The standardization (z-score) formula is $z = \dfrac{x - \mu}{\sigma}$ with $\sigma > 0$ . Derive the formula that recovers the original value $x$ from a given $z$ , showing each balanced move.

Level CNumPy implementationch02-C3

Enforce a probability constraint with a mask

Generate 8 values with a fixed seed via rng = np.random.default_rng(0) and raw = rng.standard_normal(8) (these can fall outside $[0,1]$ ). Build a boolean mask of which entries already satisfy $0 \le \text{raw} \le 1$ , then use np.clip to force all entries into $[0,1]$ . Assert every clipped value satisfies the constraint and print ok.

Level DResearch-thinking challenge

Level DML applicationch02-D1

A system of constraints defines a feasible region

A tiny training budget imposes two simultaneous constraints on the number of training steps $s$ : memory allows $s \le 1000$ , and you need enough steps to converge, $s \ge 200$ . (1) Describe the set of valid $s$ as an interval. (2) Now suppose someone also requires $s \ge 1200$ . Explain what happens to the feasible region and what that means practically. (3) Relate this to why an optimization problem can be infeasible.