Part 4 · CalculusChapter 1360 min

Limits and Continuity

What happens as we get arbitrarily close

Prerequisites

Learning objectives

  • State an operational definition of a limit
  • Evaluate limits, including indeterminate 0/0 forms
  • Define continuity and spot discontinuities
  • See why differentiability needs a well-behaved limit

Why limits sit under every gradient

Everything you will do to train a model comes down to one question: if I nudge a parameter a tiny bit, how does the loss respond? That "response to a tiny nudge" is a derivative, and a derivative is defined as a limit — the value a ratio approaches as the nudge shrinks toward zero. Backpropagation is nothing but the chain rule applied to millions of these limits. So before we can differentiate anything (next chapter) or descend any gradient (the chapter after), we need to be fluent in the one idea underneath them all: what it means for a function to approach a value.

Limits also explain a class of bugs you will actually hit. Why does 0/0 show up in a correctly-derived formula and still give the right answer once simplified? Why does a finite-difference gradient check get worse when you make the step too small? Why is ReLU — a function every network uses — differentiable everywhere except one point, and what do frameworks do at that point? All three are limit questions, and this chapter answers them.

Intuition: approaching, not arriving

Here is the whole idea in one sentence: a limit describes where a function is headed, not where it lands. Those can differ, and keeping them separate is the entire game.

Take f(x)=x21x1f(x) = \dfrac{x^2 - 1}{x - 1}. At exactly x=1x = 1 this is 00\frac{0}{0} — undefined, a hole in the graph. The function has no value there. And yet, as xx creeps toward 11 from either side, the outputs march steadily toward 22. Plug in 0.90.9 and you get 1.91.9; plug in 0.990.99 and you get 1.991.99; plug in 1.0011.001 and you get 2.0012.001. The function never reaches x=1x = 1, but it is unmistakably aiming at 22. That destination — 22 — is the limit.

This is why "just plug in the point" is not what a limit means. Sometimes plugging in works (for nice functions it does), but the definition is about the approach. The value at the point is a separate question, and continuity, later in this chapter, is exactly the statement that the two happen to agree.

Formal definitions

Because xx may approach from two directions, we split the idea into one-sided limits. The left-hand limit uses only x<ax < a and the right-hand limit uses only x>ax > a:

The two-sided limit exists only when both one-sided limits exist and are equal. If the function approaches 33 from the left and 55 from the right, there is no single destination, and limxaf(x)\lim_{x\to a} f(x) does not exist. This is the exact mechanism behind a "jump" in a step function.

The three conditions name the three ways continuity can fail, i.e. the three kinds of discontinuity:

A removable discontinuity is the hole we just saw: patch a single point and the function becomes continuous. A jump cannot be patched — the one-sided limits genuinely disagree. An infinite discontinuity means the limit is not a finite number at all.

Evaluating a 0/0 limit by factoring

When direct substitution gives a definite number, that number is the limit — polynomials and other "nice" functions cooperate. The interesting case is the indeterminate form 00\frac{0}{0}: both numerator and denominator vanish at aa, so substitution tells you nothing. The standard move is factor and cancel the term that is causing both to be zero.

Watch the same limit numerically. The table below evaluates the original unfactored expression as xx closes in on 11 from both sides:

Both columns converge on 22 even though the middle row does not exist. The limit is a statement about the rows around the gap, never the gap itself.

Derivation: the difference quotient is a limit

Now the payoff. The slope of a straight line is rise over run, ΔyΔx\frac{\Delta y}{\Delta x}. A curve has no single slope, but it has a slope at each point, and we get it by a limit. Fix a point aa and a small step h0h \neq 0. The line through the two points (a,f(a))(a, f(a)) and (a+h,f(a+h))(a + h, f(a + h)) — a secant — has slope

This ratio is the difference quotient. Note that at h=0h = 0 it is exactly 00\frac{0}{0} — the same indeterminate form as before, and for the same reason. We never evaluate it at h=0h = 0; we take the limit as hh approaches 00. As hh shrinks, the secant line pivots and settles onto the tangent line, and its slope settles onto the instantaneous rate of change. That limiting value is the derivative:

A function is differentiable at aa exactly when this limit exists — which, being a limit, requires the left and right difference quotients to agree. A well-behaved, two-sided limit is not a technicality; it is the whole requirement.

ML use case: gradients are limits, and ReLU has a kink

Two facts from this chapter run straight through modern deep learning.

Gradients are difference-quotient limits. The gradient of a loss with respect to a weight wiw_i is L/wi\partial L / \partial w_i, and each such partial is exactly the limit in eq. 13.3 taken along that one coordinate. When you cannot get the derivative in closed form, you approximate the limit by stopping hh at a small finite value — the finite-difference check used to validate a hand-written backprop: f(a)f(a+h)f(ah)2h(small h).f'(a) \approx \frac{f(a + h) - f(a - h)}{2h} \quad (\text{small } h). This is a limit you deliberately do not finish taking. It is trustworthy only in the window where hh is small enough to be accurate but not so small that floating-point error swamps it — a tension we make concrete below.

ReLU is continuous everywhere but not differentiable at 00. The rectifier ReLU(x)=max(0,x)\mathrm{ReLU}(x) = \max(0, x) is continuous at 00: the left limit, the right limit, and the value all equal 00, so it passes the continuity test. But the derivative limit fails there. Approaching from the right the difference quotient is 11 (the graph is the line y=xy = x); approaching from the left it is 00 (the graph is flat). The two one-sided slopes disagree, so limh0\lim_{h\to 0} of the difference quotient does not exist — a kink. Frameworks handle this by picking a value from the valid range [0,1][0, 1] (a subgradient); PyTorch and TensorFlow return 00 at exactly x=0x = 0 by convention. It works in practice because a single input landing exactly on 00 is a measure-zero event, and the choice within [0,1][0,1] rarely changes the direction of a step. Continuity buys you "no jumps"; differentiability is the stronger promise ReLU cannot keep at one point.

NumPy: watch a limit converge — then break it

Let us numerically approach the derivative of f(x)=x2f(x) = x^2 at a=3a = 3 (true value 2a=62a = 6) by shrinking hh. First we see the difference quotient converge; then we push hh too small and watch floating-point catastrophic cancellation destroy the answer. Run it:

approach_a_limit.py

The table tells the whole story: the estimate marches toward 66 as hh falls from 10110^{-1} to about 10810^{-8}, then reverses and degrades as hh keeps shrinking. The limit is mathematically exact at h0h \to 0, but the floating-point evaluation has a floor. The algebraic cancellation h(2a+h)h=2a+h\frac{h(2a+h)}{h} = 2a + h sidesteps the whole problem — which is exactly why we cancel on paper before ever touching a computer.

Summary

  • A limit limxaf(x)=L\lim_{x\to a} f(x) = L says f(x)f(x) approaches LL as xx nears aa; it is about the approach, not the value f(a)f(a), which may differ or not exist.
  • The two-sided limit exists iff the left- and right-hand limits both exist and agree. Disagreement is a jump; agreement-but-mismatch-with-f(a)f(a) is a removable hole.
  • Continuity at aa means f(a)f(a) exists, the limit exists, and they are equal. The three failures are removable, jump, and infinite discontinuities.
  • For an indeterminate 00\frac{0}{0}, substitution is uninformative; factor and cancel the vanishing term, then take the limit of what remains.
  • The derivative is the limit of the difference quotient f(a)=limh0f(a+h)f(a)hf'(a) = \lim_{h\to 0}\frac{f(a+h)-f(a)}{h} — itself a 00\frac{0}{0} resolved by cancelling hh. Differentiability requires this two-sided limit to exist.
  • In ML, gradients are these limits; ReLU is continuous but non-differentiable at 00 (frameworks use a subgradient), and finite-difference checks fail if hh is pushed too small (catastrophic cancellation).

Active recall

Answer from memory before checking the lesson:

  1. State the three conditions for ff to be continuous at aa. Which one fails for a removable discontinuity?
  2. Evaluate limx2x24x2\displaystyle\lim_{x\to 2}\frac{x^2 - 4}{x - 2} by factoring. Why is cancelling (x2)(x-2) legal even though it is zero at x=2x = 2?
  3. Write the definition of f(a)f'(a) as a limit. What indeterminate form does the difference quotient take at h=0h = 0, and how is it resolved?
  4. ReLU is continuous at 00 but not differentiable there. Explain both halves in terms of one-sided limits.
  5. Why does making hh smaller eventually make a finite-difference gradient estimate worse rather than better?

Exercises

Level ARecall & basic calculation

Level BConceptual understanding

Level CDerivation & implementation

Level DResearch-thinking challenge

Related lessons