Numbers, Variables, and Expressions

Why numbers deserve a first chapter

You have spent years treating numbers as a solved problem. int, float, 0.1 + 0.2, done. So it is fair to ask why a machine-learning course opens with number systems and expressions.

The answer is that ML lives on a specific stack of number sets, and the whole field is built by moving between them deliberately. A dataset is a grid of real numbers. The row you are looking at is picked out by an integer index. A learning rate is a rational you chose by hand. The weights the model learns are reals that only ever exist as floats in memory — with all the rounding that implies. When a training run silently produces nan, or an accuracy metric comes out as 0 when you expected 0.75, the cause is almost always a confusion between two of these worlds: an integer where a float was meant, or a division that truncated.

So our first object of study is not a vector or a matrix but the humble number, seen on two levels at once:

Mathematically, as an element of a set — $\mathbb{N}$ , $\mathbb{Z}$ , $\mathbb{Q}$ , or $\mathbb{R}$ — with a role in a formula.
Computationally, as a value of a concrete dtype — int64 or float64 — that occupies fixed bits and obeys machine arithmetic.

Fluency means never confusing the two. That is the goal of this chapter.

Intuition: four nested worlds of number

Start with counting: $1, 2, 3, \ldots$ — the natural numbers $\mathbb{N}$ . They answer "how many?" and index things: example number $7$ , dimension number $3$ . There is no such thing as example number $-2$ or example number $1.5$ .

Allow subtraction to go past zero and you get the integers $\mathbb{Z}$ : $\ldots, -2, -1, 0, 1, 2, \ldots$ . Now differences and offsets have a home.

Allow division and you get the rationals $\mathbb{Q}$ : every ratio $p/q$ of integers, like $\tfrac{3}{4}$ or $-\tfrac{7}{2}$ . A learning rate of $0.01$ is really the rational $\tfrac{1}{100}$ .

Finally, fill in the gaps between the rationals — numbers like $\sqrt{2}$ and $\pi$ that no ratio of integers can express — and you get the real numbers $\mathbb{R}$ , the continuous number line. Each set sits inside the next:

\mathbb{N} \subset \mathbb{Z} \subset \mathbb{Q} \subset \mathbb{R}

(1.1)

The picture to keep: integers label, reals measure. Indices, counts, and class IDs come from $\mathbb{Z}$ ; features, weights, and losses come from $\mathbb{R}$ . Almost every dtype decision in ML is a decision about which of these two worlds a number belongs to.

Formal definitions

A quantity in a formula is either a constant (a fixed known value, like $\pi$ or the number $2$ ), a variable (a symbol standing for a value that can change, like $x$ ), or a coefficient (a constant that multiplies a variable, like the $3$ in $3x$ ). In the linear expression $y = 3x + 2$ , the symbol $x$ is the variable, $3$ is a coefficient, and $2$ is a constant term.

Symbol	Meaning	Type	Shape	Role
$\mathbb{N}$	Natural numbers (counts, indices)	set	—	fixed
$\mathbb{Z}$	Integers (labels, offsets)	set	—	fixed
$\mathbb{Q}$	Rationals (ratios p/q)	set	—	fixed
$\mathbb{R}$	Real numbers (measurements, weights)	set	—	fixed
$x_i$	The i-th component of x (a scalar)	scalar	1	variable
$\alpha$	A constant, e.g. a learning rate	scalar	1	fixed

Subscripts and indices

ML notation is dense with subscripted symbols, and the subscript is almost always an integer index. Read $x_i$ as "the $i$ -th component of the vector $\mathbf{x}$ "; read $x^{(j)}$ as "the $j$ -th training example." A doubly-indexed symbol $X_{ij}$ is the entry in row $i$ , column $j$ of a matrix. The symbol carries a real value; the index carries an integer position. Keeping those apart — value versus position — is the single habit that makes summation notation readable:

\sum_{i=1}^{n} w_i x_i \;=\; w_1 x_1 + w_2 x_2 + \cdots + w_n x_n

(1.2)

Here $i$ ranges over $\mathbb{N}$ (the positions $1$ through $n$ ), while each $w_i$ and $x_i$ is a real number (a value).

Operator precedence

An expression is a combination of numbers, variables, and operators. To evaluate it to a single value you apply operators in a fixed order of precedence, highest first:

Parentheses $(\;)$
Exponents $x^2$
Multiplication and division (left to right)
Addition and subtraction (left to right)

This is the same PEMDAS rule you know from code, and it is worth stating precisely because a misread precedence is a silent bug: $-3^2 = -(3^2) = -9$ , not $(-3)^2 = 9$ , because exponentiation binds tighter than the leading minus.

A small expression, evaluated by hand

ML use case: which set does each number come from?

Take one line of a training loop, a gradient-descent update on a weight:

w \;\leftarrow\; w - \alpha \, \frac{\partial L}{\partial w}

(1.3)

Every symbol here has both a set and a dtype:

$w \in \mathbb{R}$ is a weight — a measured, learned quantity. It must be a float; the whole point of learning is to nudge it by tiny fractional amounts.
$\alpha \in \mathbb{R}$ is the learning rate, a small positive real like $0.01$ that you fix by hand. Also a float.
$\partial L / \partial w \in \mathbb{R}$ is the gradient, a real number the computation produces. Float again.

Now contrast that with the machinery around the update: the epoch counter, the batch index, the row you slice out of the data matrix, the class label you compare against. Those are integers — elements of $\mathbb{Z}$ (or $\mathbb{N}$ ) — because they count and label, they do not measure. A feature vector $\mathbf{x} \in \mathbb{R}^n$ holds $n$ real measurements, but the index $i$ that reaches into it, $x_i$ , is a natural number. Integers index; floats hold the weights. Get this mapping right and dtype bugs mostly disappear before they start.

NumPy implementation

NumPy makes the set-versus-dtype distinction concrete. Every array has a dtype, and int64 and float64 behave differently in ways that matter. Let us evaluate the expression from above by hand and in NumPy, confirm they agree, and then meet the integer-division pitfall head-on. Run it:

numbers_and_dtypes.py

import numpy as np

rng = np.random.default_rng(0)  # seeded; unused values are reproducible

# 1) dtype is inferred from what you write.
i = np.array([2, 3, 4])          # shape (3,), dtype int64  (no decimal points)
f = np.array([2.0, 3.0, 4.0])    # shape (3,), dtype float64 (decimal points)
print("int dtype  :", i.dtype)   # int64
print("float dtype:", f.dtype)   # float64

# 2) Evaluate E = 2 + 3 * 4**2 - 10/2 the way Python/NumPy parse it (PEMDAS).
E = 2 + 3 * 4**2 - 10 / 2
print("E =", E)                  # 45.0  (note: the /2 made it a float)
assert E == 45.0

# 3) The integer-vs-float division pitfall.
#    True division (/) always yields a float; floor division (//) truncates.
true_div  = 7 / 2                # 3.5
floor_div = 7 // 2               # 3   (integer floor, the classic accuracy bug)
print("7 / 2  =", true_div)      # 3.5
print("7 // 2 =", floor_div)     # 3

# A realistic trap: "accuracy" as correct // total silently rounds to 0.
correct, total = 3, 4
bad  = correct // total          # 0    -- wrong!
good = correct / total           # 0.75 -- right
assert bad == 0 and np.isclose(good, 0.75)

print("ok")

Two lessons are baked into that output. First, NumPy picks a dtype from the literals you write: 2 gives int64, 2.0 gives float64. Second, / and // are different operators — true division promotes to float, floor division truncates toward negative infinity. In pure Python 3 the plain expression 3/4 is 0.75, but the moment you write //, or feed two integer arrays to an operation that truncates, you can silently compute 0 where you meant 0.75.

The mistake that costs an afternoon

Integers do not hold weights

The most common numeric bug in ML code is doing real-valued math in an integer dtype. Two shapes it takes:

Integer division truncation. correct // total or NumPy integer arrays divided with // throw away the fractional part, so an accuracy of $0.75$ reads as $0$ . Use / for true division, or cast with .astype(float) first.
Integer weights that cannot move. If you initialize an array as np.zeros(n, dtype=int) and then add a fractional gradient, the update is rounded back to an integer every step, so the weights never actually change.

Rule of thumb: if a number measures (a weight, a loss, a rate), it is a float; if it counts or labels (an index, an epoch, a class id), it is an int. When in doubt, print the .dtype and check that x really is float64 before you train on it.

Summary

ML runs on four nested number sets, $\mathbb{N} \subset \mathbb{Z} \subset \mathbb{Q} \subset \mathbb{R}$ . Integers label and count; reals measure.
A symbol in a formula is a constant, a variable, or a coefficient; a subscript is an integer index (position), while the symbol itself carries a real value.
Expressions evaluate by fixed precedence (PEMDAS): parentheses, exponents, then $\times\,/$ , then $+\,-$ , each left to right. $-3^2 = -9$ , not $9$ .
In a gradient update $w \leftarrow w - \alpha\,\partial L/\partial w$ , the weight, learning rate, and gradient are all reals (floats); the epoch, index, and label around them are integers.
In NumPy the dtype (int64 vs float64) is inferred from your literals and determines behavior. / is true division (float); // truncates — the source of the classic "accuracy comes out $0$ " bug.

Active recall

Answer from memory before checking the lesson:

Name the four number sets in order of inclusion, and say which two are used for indexing versus measuring in ML.
Evaluate $10 - 2 \times 3^2$ by hand. What is the value, and why is it not $72$ ?
In NumPy, what is the dtype of np.array([1, 2, 3])? What one-character change to the literals would make it float64?
What does 7 // 2 evaluate to, and how does it differ from 7 / 2? Why does this distinction matter when computing an accuracy?

Exercises

Level ARecall & basic calculation

Level AHand calculationch01-A1

Evaluate with precedence

Evaluate the expression $10 - 2 \times 3^2$ by hand, applying operator precedence.

Level AHand calculationch01-A2

The unary minus trap

Evaluate $-3^2$ . (This is the classic precedence trap — the exponent binds tighter than the leading minus.)

Level AHand calculationch01-A3

Subtract two fractions

Compute $\dfrac{2}{3} - \dfrac{1}{6}$ and give the result as a decimal.

Level AEquation interpretationch01-A4

Smallest set containing a number

What is the smallest of the sets $\mathbb{N}, \mathbb{Z}, \mathbb{Q}, \mathbb{R}$ that contains the number $-\dfrac{7}{2}$ ?

Level AShape reasoningch01-A5

Infer the NumPy dtype

In NumPy, what is the dtype of np.array([1, 2, 3])? Answer with the exact dtype name (e.g. int64 or float64).

Level AHand calculationch01-A6

Floor division

What does the floor-division expression 17 // 5 evaluate to in Python/NumPy?

Level BConceptual understanding

Level BML applicationch01-B1

Which quantity must be a float?

In a training loop, which of the following quantities should be stored as a float rather than an integer?

Level BML applicationch01-B2

Integers cannot hold weights

Explain, in one or two sentences, why initializing a weight array with dtype=int and then applying a fractional gradient update leaves the weights unchanged.

Level BShape reasoningch01-B3

Default dtype of zeros

What is the dtype of the array produced by np.zeros(3) (with no dtype argument)?

Level BEquation interpretationch01-B4

Reading a subscripted sum

In the expression $\displaystyle\sum_{i=1}^{n} w_i x_i$ , the index $i$ and the symbols $w_i, x_i$ come from different number worlds. Which set does the index $i$ range over, and which set do the values $w_i, x_i$ belong to?

Level CDerivation & implementation

Level CNumPy implementationch01-C1

Fix the accuracy bug

A metric function reports $0$ accuracy when it should report $0.75$ . The buggy line is acc = correct // total with correct = 3, total = 4. Write a corrected accuracy(correct, total) that returns a float, verify it gives 0.75 for these inputs, and print ok.

Level CNumPy implementationch01-C2

Hand-evaluate, then check in NumPy

By hand, evaluate $E = 2 + 3 \times 4^2 - \dfrac{10}{2}$ . Then confirm the value in NumPy, print the dtype you observe for the result, and print ok.

Level CNumerical experimentch01-C3

When 0.1 + 0.2 is not 0.3

Reals in $\mathbb{R}$ are exact, but float64 is not. Write NumPy code showing that 0.1 + 0.2 == 0.3 is False, then show that np.isclose(0.1 + 0.2, 0.3) is True, and print ok. In a comment, explain in one line why exact equality fails.

Level DResearch-thinking challenge

Level DPaper-reading practicech01-D1

Why dtype choice is a modeling decision

Modern deep learning increasingly trains in float16 / bfloat16 (16-bit) rather than float64. Give one concrete benefit of the lower-precision float, one concrete risk it introduces, and explain why an integer dtype is nonetheless still the right choice for token IDs and class labels. Tie each point back to the $\mathbb{Z}$ -labels-versus- $\mathbb{R}$ -measures distinction.