Numbers, Variables, and Expressions
The alphabet of mathematics, from ℕ to ℝ
Prerequisites
None — a good place to start.
Learning objectives
- Name the number systems (ℕ, ℤ, ℚ, ℝ) and where each appears in ML
- Read variables, constants, coefficients, and subscripted/indexed symbols
- Evaluate expressions using correct operator precedence
- Manipulate fractions, signs, and powers without slips
Why numbers deserve a first chapter
You have spent years treating numbers as a solved problem. int, float,
0.1 + 0.2, done. So it is fair to ask why a machine-learning course opens with
number systems and expressions.
The answer is that ML lives on a specific stack of number sets, and the whole
field is built by moving between them deliberately. A dataset is a grid of
real numbers. The row you are looking at is picked out by an integer
index. A learning rate is a rational you chose by hand. The weights the model
learns are reals that only ever exist as floats in memory — with all the
rounding that implies. When a training run silently produces nan, or an
accuracy metric comes out as 0 when you expected 0.75, the cause is almost
always a confusion between two of these worlds: an integer where a float was
meant, or a division that truncated.
So our first object of study is not a vector or a matrix but the humble number, seen on two levels at once:
- Mathematically, as an element of a set — , , , or — with a role in a formula.
- Computationally, as a value of a concrete dtype —
int64orfloat64— that occupies fixed bits and obeys machine arithmetic.
Fluency means never confusing the two. That is the goal of this chapter.
Intuition: four nested worlds of number
Start with counting: — the natural numbers . They answer "how many?" and index things: example number , dimension number . There is no such thing as example number or example number .
Allow subtraction to go past zero and you get the integers : . Now differences and offsets have a home.
Allow division and you get the rationals : every ratio of integers, like or . A learning rate of is really the rational .
Finally, fill in the gaps between the rationals — numbers like and that no ratio of integers can express — and you get the real numbers , the continuous number line. Each set sits inside the next:
The picture to keep: integers label, reals measure. Indices, counts, and class IDs come from ; features, weights, and losses come from . Almost every dtype decision in ML is a decision about which of these two worlds a number belongs to.
Formal definitions
A quantity in a formula is either a constant (a fixed known value, like or the number ), a variable (a symbol standing for a value that can change, like ), or a coefficient (a constant that multiplies a variable, like the in ). In the linear expression , the symbol is the variable, is a coefficient, and is a constant term.
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| Natural numbers (counts, indices) | set | — | fixed | |
| Integers (labels, offsets) | set | — | fixed | |
| Rationals (ratios p/q) | set | — | fixed | |
| Real numbers (measurements, weights) | set | — | fixed | |
| The i-th component of x (a scalar) | scalar | 1 | variable | |
| A constant, e.g. a learning rate | scalar | 1 | fixed |
Subscripts and indices
ML notation is dense with subscripted symbols, and the subscript is almost always an integer index. Read as "the -th component of the vector "; read as "the -th training example." A doubly-indexed symbol is the entry in row , column of a matrix. The symbol carries a real value; the index carries an integer position. Keeping those apart — value versus position — is the single habit that makes summation notation readable:
Here ranges over (the positions through ), while each and is a real number (a value).
Operator precedence
An expression is a combination of numbers, variables, and operators. To evaluate it to a single value you apply operators in a fixed order of precedence, highest first:
- Parentheses
- Exponents
- Multiplication and division (left to right)
- Addition and subtraction (left to right)
This is the same PEMDAS rule you know from code, and it is worth stating precisely because a misread precedence is a silent bug: , not , because exponentiation binds tighter than the leading minus.
A small expression, evaluated by hand
ML use case: which set does each number come from?
Take one line of a training loop, a gradient-descent update on a weight:
Every symbol here has both a set and a dtype:
- is a weight — a measured, learned quantity. It must be a float; the whole point of learning is to nudge it by tiny fractional amounts.
- is the learning rate, a small positive real like that you fix by hand. Also a float.
- is the gradient, a real number the computation produces. Float again.
Now contrast that with the machinery around the update: the epoch counter, the batch index, the row you slice out of the data matrix, the class label you compare against. Those are integers — elements of (or ) — because they count and label, they do not measure. A feature vector holds real measurements, but the index that reaches into it, , is a natural number. Integers index; floats hold the weights. Get this mapping right and dtype bugs mostly disappear before they start.
NumPy implementation
NumPy makes the set-versus-dtype distinction concrete. Every array has a
dtype, and int64 and float64 behave differently in ways that matter. Let us
evaluate the expression from above by hand and in NumPy, confirm they agree,
and then meet the integer-division pitfall head-on. Run it:
Two lessons are baked into that output. First, NumPy picks a dtype from the
literals you write: 2 gives int64, 2.0 gives float64. Second, / and
// are different operators — true division promotes to float, floor division
truncates toward negative infinity. In pure Python 3 the plain expression 3/4
is 0.75, but the moment you write //, or feed two integer arrays to an
operation that truncates, you can silently compute 0 where you meant 0.75.
The mistake that costs an afternoon
Summary
- ML runs on four nested number sets, . Integers label and count; reals measure.
- A symbol in a formula is a constant, a variable, or a coefficient; a subscript is an integer index (position), while the symbol itself carries a real value.
- Expressions evaluate by fixed precedence (PEMDAS): parentheses, exponents, then , then , each left to right. , not .
- In a gradient update , the weight, learning rate, and gradient are all reals (floats); the epoch, index, and label around them are integers.
- In NumPy the
dtype(int64vsfloat64) is inferred from your literals and determines behavior./is true division (float);//truncates — the source of the classic "accuracy comes out " bug.
Active recall
Answer from memory before checking the lesson:
- Name the four number sets in order of inclusion, and say which two are used for indexing versus measuring in ML.
- Evaluate by hand. What is the value, and why is it not ?
- In NumPy, what is the
dtypeofnp.array([1, 2, 3])? What one-character change to the literals would make itfloat64? - What does
7 // 2evaluate to, and how does it differ from7 / 2? Why does this distinction matter when computing an accuracy?
Exercises
Level ARecall & basic calculation
Evaluate with precedence
Evaluate the expression by hand, applying operator precedence.
The unary minus trap
Evaluate . (This is the classic precedence trap — the exponent binds tighter than the leading minus.)
Subtract two fractions
Compute and give the result as a decimal.
Smallest set containing a number
What is the smallest of the sets that contains the number ?
Infer the NumPy dtype
In NumPy, what is the dtype of np.array([1, 2, 3])? Answer with the exact dtype name (e.g. int64 or float64).
Floor division
What does the floor-division expression 17 // 5 evaluate to in Python/NumPy?
Level BConceptual understanding
Which quantity must be a float?
In a training loop, which of the following quantities should be stored as a float rather than an integer?
Integers cannot hold weights
Explain, in one or two sentences, why initializing a weight array with dtype=int and then applying a fractional gradient update leaves the weights unchanged.
Default dtype of zeros
What is the dtype of the array produced by np.zeros(3) (with no dtype argument)?
Reading a subscripted sum
In the expression , the index and the symbols come from different number worlds. Which set does the index range over, and which set do the values belong to?
Level CDerivation & implementation
Fix the accuracy bug
A metric function reports accuracy when it should report . The buggy line is acc = correct // total with correct = 3, total = 4. Write a corrected accuracy(correct, total) that returns a float, verify it gives 0.75 for these inputs, and print ok.
Hand-evaluate, then check in NumPy
By hand, evaluate . Then confirm the value in NumPy, print the dtype you observe for the result, and print ok.
When 0.1 + 0.2 is not 0.3
Reals in are exact, but float64 is not. Write NumPy code showing that 0.1 + 0.2 == 0.3 is False, then show that np.isclose(0.1 + 0.2, 0.3) is True, and print ok. In a comment, explain in one line why exact equality fails.
Level DResearch-thinking challenge
Why dtype choice is a modeling decision
Modern deep learning increasingly trains in float16 / bfloat16 (16-bit) rather than float64. Give one concrete benefit of the lower-precision float, one concrete risk it introduces, and explain why an integer dtype is nonetheless still the right choice for token IDs and class labels. Tie each point back to the -labels-versus--measures distinction.