Limits and Continuity
What happens as we get arbitrarily close
Prerequisites
Learning objectives
- State an operational definition of a limit
- Evaluate limits, including indeterminate 0/0 forms
- Define continuity and spot discontinuities
- See why differentiability needs a well-behaved limit
Why limits sit under every gradient
Everything you will do to train a model comes down to one question: if I nudge a parameter a tiny bit, how does the loss respond? That "response to a tiny nudge" is a derivative, and a derivative is defined as a limit — the value a ratio approaches as the nudge shrinks toward zero. Backpropagation is nothing but the chain rule applied to millions of these limits. So before we can differentiate anything (next chapter) or descend any gradient (the chapter after), we need to be fluent in the one idea underneath them all: what it means for a function to approach a value.
Limits also explain a class of bugs you will actually hit. Why does 0/0 show up
in a correctly-derived formula and still give the right answer once simplified? Why
does a finite-difference gradient check get worse when you make the step too
small? Why is ReLU — a function every network uses — differentiable everywhere
except one point, and what do frameworks do at that point? All three are limit
questions, and this chapter answers them.
Intuition: approaching, not arriving
Here is the whole idea in one sentence: a limit describes where a function is headed, not where it lands. Those can differ, and keeping them separate is the entire game.
Take . At exactly this is — undefined, a hole in the graph. The function has no value there. And yet, as creeps toward from either side, the outputs march steadily toward . Plug in and you get ; plug in and you get ; plug in and you get . The function never reaches , but it is unmistakably aiming at . That destination — — is the limit.
This is why "just plug in the point" is not what a limit means. Sometimes plugging in works (for nice functions it does), but the definition is about the approach. The value at the point is a separate question, and continuity, later in this chapter, is exactly the statement that the two happen to agree.
Formal definitions
Because may approach from two directions, we split the idea into one-sided limits. The left-hand limit uses only and the right-hand limit uses only :
The two-sided limit exists only when both one-sided limits exist and are equal. If the function approaches from the left and from the right, there is no single destination, and does not exist. This is the exact mechanism behind a "jump" in a step function.
The three conditions name the three ways continuity can fail, i.e. the three kinds of discontinuity:
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| Limit exists but ≠ f(a), or f(a) undefined (a hole) | discontinuity | — | fixable | |
| Left and right limits exist but disagree (a step) | discontinuity | — | structural | |
| f(x) blows up to ±∞ near a (a vertical asymptote) | discontinuity | — | structural | |
| The value f(x) approaches as x → a | scalar | 1 | target | |
| The actual value at a (may differ from L, or not exist) | scalar | 1 | value |
A removable discontinuity is the hole we just saw: patch a single point and the function becomes continuous. A jump cannot be patched — the one-sided limits genuinely disagree. An infinite discontinuity means the limit is not a finite number at all.
Evaluating a 0/0 limit by factoring
When direct substitution gives a definite number, that number is the limit — polynomials and other "nice" functions cooperate. The interesting case is the indeterminate form : both numerator and denominator vanish at , so substitution tells you nothing. The standard move is factor and cancel the term that is causing both to be zero.
Watch the same limit numerically. The table below evaluates the original unfactored expression as closes in on from both sides:
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| f(x) = 1.9 | left | → | approach | |
| f(x) = 1.99 | left | → | approach | |
| f(x) = 1.999 | left | → | approach | |
| f(x) = 0/0 — undefined (the hole) | point | ✗ | gap | |
| f(x) = 2.001 | right | ← | approach | |
| f(x) = 2.01 | right | ← | approach | |
| f(x) = 2.1 | right | ← | approach |
Both columns converge on even though the middle row does not exist. The limit is a statement about the rows around the gap, never the gap itself.
Derivation: the difference quotient is a limit
Now the payoff. The slope of a straight line is rise over run, . A curve has no single slope, but it has a slope at each point, and we get it by a limit. Fix a point and a small step . The line through the two points and — a secant — has slope
This ratio is the difference quotient. Note that at it is exactly — the same indeterminate form as before, and for the same reason. We never evaluate it at ; we take the limit as approaches . As shrinks, the secant line pivots and settles onto the tangent line, and its slope settles onto the instantaneous rate of change. That limiting value is the derivative:
A function is differentiable at exactly when this limit exists — which, being a limit, requires the left and right difference quotients to agree. A well-behaved, two-sided limit is not a technicality; it is the whole requirement.
ML use case: gradients are limits, and ReLU has a kink
Two facts from this chapter run straight through modern deep learning.
Gradients are difference-quotient limits. The gradient of a loss with respect to a weight is , and each such partial is exactly the limit in eq. 13.3 taken along that one coordinate. When you cannot get the derivative in closed form, you approximate the limit by stopping at a small finite value — the finite-difference check used to validate a hand-written backprop: This is a limit you deliberately do not finish taking. It is trustworthy only in the window where is small enough to be accurate but not so small that floating-point error swamps it — a tension we make concrete below.
ReLU is continuous everywhere but not differentiable at . The rectifier is continuous at : the left limit, the right limit, and the value all equal , so it passes the continuity test. But the derivative limit fails there. Approaching from the right the difference quotient is (the graph is the line ); approaching from the left it is (the graph is flat). The two one-sided slopes disagree, so of the difference quotient does not exist — a kink. Frameworks handle this by picking a value from the valid range (a subgradient); PyTorch and TensorFlow return at exactly by convention. It works in practice because a single input landing exactly on is a measure-zero event, and the choice within rarely changes the direction of a step. Continuity buys you "no jumps"; differentiability is the stronger promise ReLU cannot keep at one point.
NumPy: watch a limit converge — then break it
Let us numerically approach the derivative of at (true value ) by shrinking . First we see the difference quotient converge; then we push too small and watch floating-point catastrophic cancellation destroy the answer. Run it:
The table tells the whole story: the estimate marches toward as falls from to about , then reverses and degrades as keeps shrinking. The limit is mathematically exact at , but the floating-point evaluation has a floor. The algebraic cancellation sidesteps the whole problem — which is exactly why we cancel on paper before ever touching a computer.
Summary
- A limit says approaches as nears ; it is about the approach, not the value , which may differ or not exist.
- The two-sided limit exists iff the left- and right-hand limits both exist and agree. Disagreement is a jump; agreement-but-mismatch-with- is a removable hole.
- Continuity at means exists, the limit exists, and they are equal. The three failures are removable, jump, and infinite discontinuities.
- For an indeterminate , substitution is uninformative; factor and cancel the vanishing term, then take the limit of what remains.
- The derivative is the limit of the difference quotient — itself a resolved by cancelling . Differentiability requires this two-sided limit to exist.
- In ML, gradients are these limits; ReLU is continuous but non-differentiable at (frameworks use a subgradient), and finite-difference checks fail if is pushed too small (catastrophic cancellation).
Active recall
Answer from memory before checking the lesson:
- State the three conditions for to be continuous at . Which one fails for a removable discontinuity?
- Evaluate by factoring. Why is cancelling legal even though it is zero at ?
- Write the definition of as a limit. What indeterminate form does the difference quotient take at , and how is it resolved?
- ReLU is continuous at but not differentiable there. Explain both halves in terms of one-sided limits.
- Why does making smaller eventually make a finite-difference gradient estimate worse rather than better?
Exercises
Level ARecall & basic calculation
Limit by direct substitution
Evaluate . (The function is a polynomial, so it is continuous everywhere.)
A 0/0 limit by factoring
Evaluate . (Substitution gives ; factor first.)
One-sided limits and existence
For a piecewise function, and , but . What is ?
Difference quotient of a linear function
For , the difference quotient simplifies to a constant. What is (the limit as )?
Classify the discontinuity
A step function has and . Which type of discontinuity is at ?
Continuity of ReLU at zero
Is continuous at ? Enter for yes, for no.
Level BConceptual understanding
Limit vs. value
Which statement best captures why can exist even when is undefined?
Why ReLU is not differentiable at 0
Explain, in terms of one-sided limits of the difference quotient, why is not differentiable at even though it is continuous there. What do frameworks do at that point?
The indeterminate form in the derivative
At , the difference quotient has which form, and how is a finite derivative recovered from it?
Why the finite-difference step can't be too small
A colleague validates a gradient with and reasons: 'smaller is always closer to the true limit, so I'll use .' Explain why this makes the numerical estimate worse, not better.
Level CDerivation & implementation
Derive the derivative of a cubic
Using the limit definition , derive for . Show the cancellation explicitly.
Numerically approach a limit
Write numeric_derivative(f, a, h) returning the forward difference quotient . For at , print the estimate for , assert the estimate is within of the exact value , then print ok.
Detect a discontinuity numerically
For the step function if else , estimate the left and right limits at by evaluating at and for a shrinking sequence of . Print both estimates, assert they differ (confirming a jump discontinuity), and print ok.
Level DResearch-thinking challenge
Subgradients and where non-differentiability bites
Deep networks are trained with gradient descent, yet ReLU, max-pooling, and the L1 penalty are all non-differentiable at isolated points. Explain why training still works (invoke measure-zero and subgradients), then give one concrete situation where a non-differentiable point does cause real trouble and how practitioners mitigate it.