Part 1 · Algebra and Mathematical NotationChapter 155 min

Numbers, Variables, and Expressions

The alphabet of mathematics, from ℕ to ℝ

Prerequisites

None — a good place to start.

Learning objectives

  • Name the number systems (ℕ, ℤ, ℚ, ℝ) and where each appears in ML
  • Read variables, constants, coefficients, and subscripted/indexed symbols
  • Evaluate expressions using correct operator precedence
  • Manipulate fractions, signs, and powers without slips

Why numbers deserve a first chapter

You have spent years treating numbers as a solved problem. int, float, 0.1 + 0.2, done. So it is fair to ask why a machine-learning course opens with number systems and expressions.

The answer is that ML lives on a specific stack of number sets, and the whole field is built by moving between them deliberately. A dataset is a grid of real numbers. The row you are looking at is picked out by an integer index. A learning rate is a rational you chose by hand. The weights the model learns are reals that only ever exist as floats in memory — with all the rounding that implies. When a training run silently produces nan, or an accuracy metric comes out as 0 when you expected 0.75, the cause is almost always a confusion between two of these worlds: an integer where a float was meant, or a division that truncated.

So our first object of study is not a vector or a matrix but the humble number, seen on two levels at once:

  • Mathematically, as an element of a set — N\mathbb{N}, Z\mathbb{Z}, Q\mathbb{Q}, or R\mathbb{R} — with a role in a formula.
  • Computationally, as a value of a concrete dtypeint64 or float64 — that occupies fixed bits and obeys machine arithmetic.

Fluency means never confusing the two. That is the goal of this chapter.

Intuition: four nested worlds of number

Start with counting: 1,2,3,1, 2, 3, \ldots — the natural numbers N\mathbb{N}. They answer "how many?" and index things: example number 77, dimension number 33. There is no such thing as example number 2-2 or example number 1.51.5.

Allow subtraction to go past zero and you get the integers Z\mathbb{Z}: ,2,1,0,1,2,\ldots, -2, -1, 0, 1, 2, \ldots. Now differences and offsets have a home.

Allow division and you get the rationals Q\mathbb{Q}: every ratio p/qp/q of integers, like 34\tfrac{3}{4} or 72-\tfrac{7}{2}. A learning rate of 0.010.01 is really the rational 1100\tfrac{1}{100}.

Finally, fill in the gaps between the rationals — numbers like 2\sqrt{2} and π\pi that no ratio of integers can express — and you get the real numbers R\mathbb{R}, the continuous number line. Each set sits inside the next:

The picture to keep: integers label, reals measure. Indices, counts, and class IDs come from Z\mathbb{Z}; features, weights, and losses come from R\mathbb{R}. Almost every dtype decision in ML is a decision about which of these two worlds a number belongs to.

Formal definitions

A quantity in a formula is either a constant (a fixed known value, like π\pi or the number 22), a variable (a symbol standing for a value that can change, like xx), or a coefficient (a constant that multiplies a variable, like the 33 in 3x3x). In the linear expression y=3x+2y = 3x + 2, the symbol xx is the variable, 33 is a coefficient, and 22 is a constant term.

Subscripts and indices

ML notation is dense with subscripted symbols, and the subscript is almost always an integer index. Read xix_i as "the ii-th component of the vector x\mathbf{x}"; read x(j)x^{(j)} as "the jj-th training example." A doubly-indexed symbol XijX_{ij} is the entry in row ii, column jj of a matrix. The symbol carries a real value; the index carries an integer position. Keeping those apart — value versus position — is the single habit that makes summation notation readable:

Here ii ranges over N\mathbb{N} (the positions 11 through nn), while each wiw_i and xix_i is a real number (a value).

Operator precedence

An expression is a combination of numbers, variables, and operators. To evaluate it to a single value you apply operators in a fixed order of precedence, highest first:

  1. Parentheses (  )(\;)
  2. Exponents x2x^2
  3. Multiplication and division (left to right)
  4. Addition and subtraction (left to right)

This is the same PEMDAS rule you know from code, and it is worth stating precisely because a misread precedence is a silent bug: 32=(32)=9-3^2 = -(3^2) = -9, not (3)2=9(-3)^2 = 9, because exponentiation binds tighter than the leading minus.

A small expression, evaluated by hand

ML use case: which set does each number come from?

Take one line of a training loop, a gradient-descent update on a weight:

Every symbol here has both a set and a dtype:

  • wRw \in \mathbb{R} is a weight — a measured, learned quantity. It must be a float; the whole point of learning is to nudge it by tiny fractional amounts.
  • αR\alpha \in \mathbb{R} is the learning rate, a small positive real like 0.010.01 that you fix by hand. Also a float.
  • L/wR\partial L / \partial w \in \mathbb{R} is the gradient, a real number the computation produces. Float again.

Now contrast that with the machinery around the update: the epoch counter, the batch index, the row you slice out of the data matrix, the class label you compare against. Those are integers — elements of Z\mathbb{Z} (or N\mathbb{N}) — because they count and label, they do not measure. A feature vector xRn\mathbf{x} \in \mathbb{R}^n holds nn real measurements, but the index ii that reaches into it, xix_i, is a natural number. Integers index; floats hold the weights. Get this mapping right and dtype bugs mostly disappear before they start.

NumPy implementation

NumPy makes the set-versus-dtype distinction concrete. Every array has a dtype, and int64 and float64 behave differently in ways that matter. Let us evaluate the expression from above by hand and in NumPy, confirm they agree, and then meet the integer-division pitfall head-on. Run it:

numbers_and_dtypes.py

Two lessons are baked into that output. First, NumPy picks a dtype from the literals you write: 2 gives int64, 2.0 gives float64. Second, / and // are different operators — true division promotes to float, floor division truncates toward negative infinity. In pure Python 3 the plain expression 3/4 is 0.75, but the moment you write //, or feed two integer arrays to an operation that truncates, you can silently compute 0 where you meant 0.75.

The mistake that costs an afternoon

Summary

  • ML runs on four nested number sets, NZQR\mathbb{N} \subset \mathbb{Z} \subset \mathbb{Q} \subset \mathbb{R}. Integers label and count; reals measure.
  • A symbol in a formula is a constant, a variable, or a coefficient; a subscript is an integer index (position), while the symbol itself carries a real value.
  • Expressions evaluate by fixed precedence (PEMDAS): parentheses, exponents, then ×/\times\,/, then ++\,-, each left to right. 32=9-3^2 = -9, not 99.
  • In a gradient update wwαL/ww \leftarrow w - \alpha\,\partial L/\partial w, the weight, learning rate, and gradient are all reals (floats); the epoch, index, and label around them are integers.
  • In NumPy the dtype (int64 vs float64) is inferred from your literals and determines behavior. / is true division (float); // truncates — the source of the classic "accuracy comes out 00" bug.

Active recall

Answer from memory before checking the lesson:

  1. Name the four number sets in order of inclusion, and say which two are used for indexing versus measuring in ML.
  2. Evaluate 102×3210 - 2 \times 3^2 by hand. What is the value, and why is it not 7272?
  3. In NumPy, what is the dtype of np.array([1, 2, 3])? What one-character change to the literals would make it float64?
  4. What does 7 // 2 evaluate to, and how does it differ from 7 / 2? Why does this distinction matter when computing an accuracy?

Exercises

Level ARecall & basic calculation

Level BConceptual understanding

Level CDerivation & implementation

Level DResearch-thinking challenge