Functions
Mappings, domain, range, composition, and inverses
Learning objectives
- State the formal definition of a function and read function notation
- Determine domain and range, and interpret graphs
- Compose functions and reason about invertibility
- See a model f(x; θ) and a loss L(θ) as functions
Why functions are the whole game
Strip away the jargon and a machine-learning model is one thing: a function. It takes an input — an image, a sentence, a row of features — and returns an output — a label, a probability, a next word. Training is the search for a good function inside a huge family of candidates. Even the training objective, the loss, is itself a function: it takes the candidate's parameters and returns a single number saying how badly it does.
So before we can talk about learning anything, we need to be fluent with functions — not the vague "plug in a number" version from school, but the precise object mathematicians use: a rule that assigns to every input exactly one output. That precision is what lets us compose functions into deep networks, invert them when we need to undo a transform, and reason about which inputs are even allowed. This chapter builds that fluency, seen three ways at once:
- Formally, as a mapping from a domain to a codomain.
- Graphically, as a curve you can read off an axis.
- Computationally, as a Python function you can evaluate on a grid.
Intuition: a function is a reliable machine
Picture a machine with an input slot and an output slot. Feed it a value and it hands back a value . The one rule that makes it a function rather than just a "process" is determinism: the same input always yields the same output. Put in and get ; put in again and you must get again, forever. A machine that sometimes returns and sometimes for the same input is not a function.
That single rule has teeth. It forbids one input mapping to two outputs, which is exactly the "vertical line test" you may remember: no vertical line may cross the graph twice. It says nothing, though, about two inputs sharing one output — that is allowed, and whether it happens is precisely the question of invertibility we return to later.
Drag the input slider above and watch the output move. Notice the graph is just a record of every (input, output) pair the machine can produce: the height of the curve at horizontal position is the value .
Formal definitions
Two clauses do all the work. "Each element of " means the function must be defined on every input in the domain — no gaps. "Exactly one element of " means it must be single-valued — no input produces two outputs. Together they are the determinism from the intuition, stated set-theoretically.
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| A function from domain X to codomain Y | mapping | — | definition | |
| An input (element of the domain) | scalar | 1 | variable | |
| The output at x (the value) | scalar | 1 | variable | |
| Domain — set of allowed inputs | set | — | fixed | |
| Codomain — set of possible outputs | set | — | fixed | |
| Set of outputs actually attained | set | — | derived |
The domain is not decoration — it is part of the function's identity. The rule on is not a function (it is undefined at ); the same rule on is one. Changing the domain changes the object.
Composition
Feeding one machine's output into another's input composes them.
The right-to-left reading trips everyone up once. In the function written last runs first, because it is the one sitting next to the input . Order matters: in general
Composition is the reason a deep network is "deep": a two-layer network is , an -layer network is , and the chain rule we meet later is exactly the rule for differentiating such a stack.
Inverse functions
One-to-one is the crux. If two different inputs share an output , then cannot decide between them, so no inverse can exist. Graphically this is the horizontal line test: is invertible iff no horizontal line meets the graph more than once. Note means the inverse function, not the reciprocal — a genuinely unfortunate collision of notation.
A numerical example
Let and , both on domain . Evaluate each composition at .
Worked composition: the two orders as formulas
Rather than plug in one point, compose symbolically to see why the orders differ everywhere, not just at .
We can also invert . To undo , solve for : subtract , divide by , giving . Check: . Because is a line with nonzero slope it is one-to-one, so the inverse exists. By contrast on all of is not invertible: , so is ambiguous. Restrict the domain to and it becomes one-to-one, with inverse — the domain restriction is what makes the square root a function at all.
ML use case: a model and a loss are just functions
Two functions sit at the heart of every supervised learner, and keeping their inputs straight is half the battle.
The model is a function of the input, with the parameters held fixed:
Here is the data point and (the weights and biases) is a knob-setting you carry along. The semicolon is doing real work: it separates the input from the parameters . At prediction time is frozen and varies — the model is a function of .
The loss flips which argument varies. It measures how wrong the predictions are over a fixed dataset, as a function of the parameters:
Now the data is frozen and varies — the loss is a function of . Training means finding the that minimizes . This input-swap is the single most important reframing in the course: the same expression is read as a function of when predicting and as a function of when learning.
And is a composition: apply the model , then the per-example loss , then average. That layered structure is exactly what the chain rule will let us differentiate, so that we can compute and descend. Every idea in this chapter — mapping, domain, composition, invertibility — resurfaces the moment we start training.
NumPy implementation
Let us make composition concrete. We implement and as ordinary Python
functions, evaluate both compositions on a grid built with np.linspace, and
confirm numerically that in general. Run it:
The grid pattern — build inputs with np.linspace, push them through a function,
read off the outputs — is how we will visualize every function from here on,
including loss curves. Because f and g are written with array-friendly
operations (*, +, **), they evaluate on the whole grid at once with no loop:
the same vectorized thinking from the previous chapter.
Summary
- A function assigns to each input in the domain exactly one output; the range is the set of outputs actually produced, a subset of the codomain .
- The domain is part of the function's identity: is undefined at , and restricting a domain can turn a non-function or a non-invertible map into a valid, invertible one.
- Composition chains functions and is read right-to-left; in general , so order changes the result.
- A function is invertible exactly when it is one-to-one; then undoes it. on fails (it maps and to ) until the domain is restricted.
- In ML, the model is a function of the input ; the loss is a function of the parameters . The loss is a composition, which previews the chain rule and layered networks.
Active recall
Answer from memory before checking the lesson:
- State the two conditions a rule must satisfy to be a function .
- What is the difference between the codomain and the range?
- Evaluate and for and . Are they equal?
- Why is on all of not invertible, and how do you fix it?
- In versus , which argument varies at prediction time and which varies at training time?
Exercises
Level ARecall & basic calculation
Evaluate a function
Let . Compute .
Compose at a point
Let and . Compute .
Order matters
For and , compute . (Compare with A2, where .)
Find an inverse value
The function has inverse . Compute .
Domain of a reciprocal
For over the real numbers, which single value of must be excluded from the domain?
Range of a square
For with domain all real numbers, what is the smallest value in the range?
Level BConceptual understanding
Is it a function?
A relation is graphed in the plane. Which single test decides whether it defines as a function of ?
Codomain versus range
Explain in one or two sentences the difference between the codomain and the range of a function , using with as an example.
Which function is invertible?
Each function below has domain all of . Which one is invertible (one-to-one)?
Inputs of a model and a loss
A model is written and its loss . In one or two sentences, say which quantity is held fixed and which varies (a) at prediction time and (b) at training time, and why is written as a function of alone.
Level CDerivation & implementation
Compose two functions symbolically
Let and . Derive closed-form expressions for and , and confirm they are different functions.
Derive an inverse
The function is one-to-one on . Derive its inverse and verify .
Composition on a grid in NumPy
Implement and as Python functions. Using np.linspace(-3, 3, 7), evaluate and on the grid, confirm with np.allclose that they are not equal everywhere, and print ok.
Level DResearch-thinking challenge
Why deep networks are compositions
A network layer is a function , and an -layer network is the composition . (a) Explain why stacking only linear layers gains no expressive power over a single linear layer. (b) Explain what a nonlinear activation inserted between layers changes. (c) Connect the composition structure to why the chain rule is central to training.