Powers, Roots, Exponentials, and Logarithms

Why exponentials and logs run underneath ML

Two functions show up on almost every page of a machine-learning derivation, and they are inverses of each other. The exponential $e^x$ turns sums into products and appears the instant you write a softmax, a Gaussian, or a growth process. The logarithm $\ln x$ runs the machine in reverse: it turns products into sums, and that single algebraic fact is why we train models by maximizing log-likelihood instead of likelihood, why loss functions are cross-entropies, and why a pipeline that multiplies ten thousand probabilities does not silently collapse to zero.

Before any of that, though, both functions are just the top of a short ladder you already half-know from a scientific calculator:

Powers — repeated multiplication, $b^n$ .
Roots — powers with fractional exponents, $\sqrt{x} = x^{1/2}$ .
Exponentials — a fixed base raised to a variable, $b^x$ , with $e^x$ the natural choice.
Logarithms — the inverse question: " $b$ to what power gives $x$ ?"

This chapter climbs that ladder and stops at the one identity you will reuse more than any other: $\log(ab) = \log a + \log b$ .

Intuition: logs turn "times" into "plus"

Here is the whole idea in one sentence. Multiplying is expensive and, on a computer, dangerous — multiply enough small numbers and the result rounds to exactly zero. Adding is cheap and safe. A logarithm is the dictionary that translates every multiplication into an addition:

$\underbrace{a \times b \times c}_{\text{a product}} \;\xrightarrow{\;\log\;}\; \underbrace{\log a + \log b + \log c}_{\text{a sum}}.$

The exponential is the same dictionary read backwards: it translates addition back into multiplication, $e^{p+q} = e^p\, e^q$ . That is all an exponential and a logarithm do — they are a matched pair of translators between the additive world and the multiplicative world. Everything else in this chapter is a consequence of that one relationship.

The base $e \approx 2.71828$ is the "natural" choice for a reason we will make precise: it is the base whose exponential is its own rate of change, so calculus (and therefore gradient-based learning) stays clean. Before the formalities, get a feel for the two curves by dragging on them.

Interactive LabFunction Explorer

Loading interactive lab…

Select exponential and notice it never dips to zero or below — $e^x > 0$ for every real $x$ — and that it explodes for positive $x$ . Now select logarithm: it is the mirror image across the line $y = x$ , defined only for $x > 0$ , crossing the axis at $\ln 1 = 0$ and diving to $-\infty$ as $x \to 0^+$ . That last fact is the reason "the log of a tiny probability is a large negative number," which is exactly how a loss function should behave.

Formal definitions

From that single definition, three laws of exponents follow by just counting factors:

b^{m} \, b^{n} = b^{m+n}, \qquad \frac{b^{m}}{b^{n}} = b^{m-n}, \qquad \left(b^{m}\right)^{n} = b^{mn}

(3.1)

Fractional exponents are defined to make the third law keep working. If we want $\left(b^{1/n}\right)^{n} = b^{1} = b$ , then $b^{1/n}$ must be the number that gives $b$ when raised to the $n$ -th power — that is, the $n$ -th root.

Symbol	Meaning	Type	Shape	Role
$b^{n}$	Base b raised to power n	scalar	1	variable
$b^{1/n}$	n-th root of b (fractional power)	scalar	1	variable
$e$	Euler's number, approx 2.71828 (the natural base)	scalar	1	fixed
$\exp(x)=e^{x}$	The natural exponential	function	1	fixed
$\ln x = \log_{e} x$	Natural logarithm (inverse of exp)	function	1	fixed
$\log_{b} x$	Logarithm base b: the power giving x	function	1	fixed

Why $e$ is the "natural" base

Among all bases $b$ , exactly one makes the exponential its own derivative: $\frac{d}{dx} e^{x} = e^{x}$ . For any other base, $\frac{d}{dx} b^{x} = b^{x}\ln b$ picks up a stray constant factor $\ln b$ . Choosing $b = e$ sets that factor to $\ln e = 1$ , so growth rate equals value. Equivalently, $e$ is the limit of compounding infinitely often, $e = \lim_{n \to \infty}\left(1 + \tfrac{1}{n}\right)^{n},$ which is why continuously-compounded interest, radioactive decay, and exponential learning-rate schedules are all written with $e$ . When derivatives stay clean, so do gradients — and gradients are how models learn.

The log rules

Because $\ln$ is the inverse of a function that turns sums into products, it turns products back into sums. The three rules you must know cold:

\ln(xy) = \ln x + \ln y, \qquad \ln\!\left(\frac{x}{y}\right) = \ln x - \ln y, \qquad \ln\!\left(x^{p}\right) = p\,\ln x

(3.2)

To move between bases (e.g. from $\ln$ to $\log_{2}$ ), use change of base:

\log_{b} x = \frac{\ln x}{\ln b}

(3.3)

These hold for any valid base, not just $e$ ; the product, quotient, and power rules are the same shape whether you write $\log_{2}$ , $\log_{10}$ , or $\ln$ .

A worked numerical example

Worked Example — reading and applying the log rules

Evaluate $\log_{2} 8$ , then simplify $\ln\!\left(\dfrac{a^{3}}{b}\right)$ .

First, $\log_{2} 8$ asks " $2$ to what power is $8$ ?" Since $2^{3} = 8$ , the answer is $\log_{2} 8 = 3$ . (Check with change of base: $\log_{2} 8 = \ln 8 / \ln 2 = 2.0794 / 0.6931 = 3$ .)

Second, apply the quotient rule, then the power rule: $\ln\!\left(\frac{a^{3}}{b}\right) = \ln(a^{3}) - \ln b = 3\ln a - \ln b.$ A single logarithm of a messy fraction becomes a tidy weighted sum — the same move that converts a product of probabilities into a sum of log-probabilities below.

Deriving a log rule from an exponent law

The log rules are not new facts; each is an exponent law from eq. 3.1 seen through the inverse. Here is the product rule, derived from $b^{m}b^{n} = b^{m+n}$ .

the product rule for logarithms

Let $x, y > 0$ and write them in exponential form with base $b$ . Define $m = \log_{b} x \quad\Longleftrightarrow\quad x = b^{m}, \qquad n = \log_{b} y \quad\Longleftrightarrow\quad y = b^{n}.$ Multiply the two right-hand equations and apply the exponent law $b^{m}b^{n} = b^{m+n}$ : $xy = b^{m}\, b^{n} = b^{m+n}.$ Now read that back through the logarithm — take $\log_{b}$ of both sides, using that $\log_{b}$ and $b^{(\cdot)}$ are inverses: $\log_{b}(xy) = m + n = \log_{b} x + \log_{b} y. \qquad \blacksquare$ The quotient rule falls out the same way from $b^{m}/b^{n} = b^{m-n}$ , and the power rule from $\left(b^{m}\right)^{p} = b^{mp}$ . Every log rule is an exponent law wearing a mirror.

ML use case: log-likelihood and avoiding underflow

Suppose a model assigns probability $p_i$ to the correct label of the $i$ -th example, and the $N$ examples are independent. The probability of getting the whole dataset right — the likelihood — is the product

L = \prod_{i=1}^{N} p_{i}

(3.4)

Each $p_i$ lies in $(0, 1)$ , so this product shrinks fast. With $N = 10{,}000$ examples each around $p_i = 0.1$ , we get $L = 10^{-10000}$ , which is far below the smallest positive number a 64-bit float can represent (about $10^{-308}$ ). The computed likelihood underflows to exactly $0.0$ , and $\ln 0 = -\infty$ destroys the gradient. Multiplication is the enemy.

Take the logarithm and the product rule (eq. 3.2) turns it into a sum — the log-likelihood:

\ln L = \ln \prod_{i=1}^{N} p_{i} = \sum_{i=1}^{N} \ln p_{i}

(3.5)

Now each term $\ln p_i$ is a modest negative number (for $p_i = 0.1$ , $\ln p_i \approx -2.3$ ), and their sum $\approx -23{,}000$ is perfectly representable. This is why training objectives are written in log-space:

Cross-entropy loss is exactly the negative log-likelihood, $-\sum_i \ln p_i$ (averaged), so minimizing it maximizes the probability of the data.
Softmax produces those $p_i$ as $e^{z_j}/\sum_k e^{z_k}$ ; its denominator is a sum of exponentials, and taking $\ln$ of the whole thing (the log-sum-exp operation, $\ln\sum_k e^{z_k}$ ) is done with a numerically stable trick — subtract $\max_k z_k$ first so no $e^{z_k}$ overflows.
Learning-rate decay often uses the exponential $\eta_t = \eta_0\, e^{-\lambda t}$ , a smooth multiplicative shrink that is a straight line in log-space.

The through-line: products of probabilities become sums of log-probabilities, and that conversion is both algebraically convenient (sums differentiate term by term) and numerically essential (sums do not underflow).

NumPy: products vs sums in log-space

Let us watch the underflow happen and then fix it with the product rule. Run this:

log_space.py

import numpy as np

rng = np.random.default_rng(0)

# 1) log(a*b) == log(a) + log(b): the product rule, numerically.
a, b = 7.0, 3.0
lhs = np.log(a * b)             # log of the product
rhs = np.log(a) + np.log(b)    # sum of the logs
assert np.isclose(lhs, rhs), "product rule must hold"

# exp and log are inverses: exp(log(x)) == x
assert np.isclose(np.exp(np.log(5.0)), 5.0)

# 2) Multiplying many small probabilities underflows to 0.0.
p = rng.uniform(0.001, 0.01, size=5000)   # 5000 tiny probabilities
naive = np.prod(p)                         # the raw product
print("naive product :", naive)            # -> 0.0 (underflow!)

# 3) Summing the logs stays finite and correct.
log_like = np.sum(np.log(p))               # sum of log-probabilities
print("sum of logs   :", round(float(log_like), 3))

# The naive product underflowed; the log-space sum did not.
assert naive == 0.0
assert np.isfinite(log_like)
print("ok")

The np.prod path collapses to 0.0 because the true value is astronomically small; np.sum(np.log(p)) computes the same quantity in log-space, where it is a finite negative number you can actually optimize. In real code you would go one step further and use np.logaddexp / a log-sum-exp helper whenever you must combine log-space terms that came from a sum rather than a product.

Summary

Powers are repeated multiplication; the three exponent laws ( $b^{m}b^{n}=b^{m+n}$ , $b^{m}/b^{n}=b^{m-n}$ , $(b^{m})^{n}=b^{mn}$ ) follow by counting factors, and roots are just fractional powers, $\sqrt[n]{b}=b^{1/n}$ .
The exponential $e^{x}$ and natural log $\ln x$ are inverses; $e$ is "natural" because $\frac{d}{dx}e^{x}=e^{x}$ , keeping gradients clean.
The log rules turn products into sums, quotients into differences, and powers into multipliers; change of base is $\log_{b}x = \ln x / \ln b$ . Each rule is an exponent law seen through the inverse.
Exponential $\leftrightarrow$ log form: $y=b^{x} \iff x=\log_{b}y$ .
In ML, likelihood is a product of probabilities that underflows; taking $\ln$ turns it into a sum of log-probabilities (cross-entropy / log-likelihood), which is both differentiable term-by-term and numerically safe. Softmax denominators use log-sum-exp; learning-rate decay uses $e^{-\lambda t}$ .

Active recall

Answer from memory before checking the lesson:

State the three laws of exponents and rewrite $\sqrt[3]{x^{2}}$ as a single power of $x$ .
What is $\log_{2} 32$ ? What is $\ln 1$ , and what does $\ln x$ approach as $x \to 0^{+}$ ?
Derive $\ln(xy) = \ln x + \ln y$ starting from an exponent law.
A pipeline multiplies $50{,}000$ probabilities and the result prints as 0.0. What went wrong, and what one-line change fixes it?
Convert $\log_{5} 20$ into an expression using only $\ln$ .

Exercises

Level ARecall & basic calculation

Level AHand calculationch03-A1

Evaluate a logarithm

Evaluate $\log_{2} 8$ . (Read it as: ' $2$ to what power gives $8$ ?')

Level AHand calculationch03-A2

Product law of exponents

Use $b^{m} b^{n} = b^{m+n}$ to evaluate $2^{3} \cdot 2^{4}$ as a single integer.

Level AHand calculationch03-A3

Root as a fractional power

Write $\sqrt[3]{x^{2}}$ as a single power $x^{p}$ . Enter the exponent $p$ as a decimal.

Level AHand calculationch03-A4

A base-10 logarithm

Evaluate $\log_{10} 1000$ .

Level AHand calculationch03-A5

exp and ln are inverses

Evaluate $\ln\!\left(e^{5}\right)$ .

Level AHand calculationch03-A6

Negative exponent

Evaluate $2^{-3}$ as a decimal.

Level BConceptual understanding

Level BEquation interpretationch03-B1

Exponential to log form

The statement $2^{5} = 32$ is written in exponential form. Which is its correct logarithmic form?

Level BEquation interpretationch03-B2

Spot the invalid log identity

Which of the following is not a valid logarithm identity?

Level BHand calculationch03-B3

Change of base, numerically

Using change of base $\log_{b} x = \dfrac{\ln x}{\ln b}$ with $\ln 10 \approx 2.302585$ and $\ln 2 \approx 0.693147$ , evaluate $\log_{2} 10$ . Give three decimals.

Level BML applicationch03-B4

Why maximize log-likelihood?

Training maximizes the log-likelihood $\sum_i \ln p_i$ rather than the raw likelihood $\prod_i p_i$ . Give two distinct reasons this substitution is safe (same optimum) and better.

Level BEquation interpretationch03-B5

Reading the log-likelihood equation

A model reports a total log-likelihood of $-23000$ over a dataset. Which statement is the best interpretation?

Level CDerivation & implementation

Level CNumPy implementationch03-C1

Implement a stable log-likelihood

Write log_likelihood(p) that returns $\sum_i \ln p_i$ for a 1-D array of probabilities. Show that for $5000$ tiny probabilities the naive product np.prod(p) underflows to 0.0 while your log-space sum stays finite, then print ok.

Level CDerivationch03-C2

Derive the quotient rule

Derive the quotient rule $\log_{b}\!\left(\tfrac{x}{y}\right) = \log_{b} x - \log_{b} y$ starting from an exponent law, for $x, y > 0$ .

Level CNumPy implementationch03-C3

Numerically stable log-sum-exp

Implement logsumexp(z) computing $\ln\sum_k e^{z_k}$ with the max-subtraction trick, and show it agrees with the naive formula on safe inputs but does not overflow on large logits like z = [1000, 1001, 1002]. Print ok.

Level DResearch-thinking challenge

Level DPaper-reading practicech03-D1

Why subtract the max in log-sum-exp?

The log-sum-exp identity $\ln\sum_k e^{z_k} = m + \ln\sum_k e^{z_k - m}$ holds for any constant $m$ . Prove the identity algebraically, then explain why the specific choice $m = \max_k z_k$ is the numerically safe one — addressing both overflow and underflow.

Level DPaper-reading practicech03-D2

Cross-entropy as negative log-likelihood

For a classifier that outputs probability $p_i$ on the correct class of example $i$ , the average cross-entropy loss is $\mathcal{L} = -\tfrac{1}{N}\sum_i \ln p_i$ . Explain why minimizing $\mathcal{L}$ is the same as maximizing the likelihood $\prod_i p_i$ , and interpret the per-example loss $-\ln p_i$ as an amount of 'surprise' — including what happens as $p_i \to 0$ and as $p_i \to 1$ .

Powers, Roots, Exponentials, and Logarithms

Prerequisites

Learning objectives

Why exponentials and logs run underneath ML

Intuition: logs turn "times" into "plus"

Formal definitions

Why $e$ is the "natural" base

The log rules

A worked numerical example

Deriving a log rule from an exponent law

ML use case: log-likelihood and avoiding underflow

NumPy: products vs sums in log-space

Summary

Active recall

Exercises

Level ARecall & basic calculation

Evaluate a logarithm

Product law of exponents

Root as a fractional power

A base-10 logarithm

exp and ln are inverses

Negative exponent

Level BConceptual understanding

Exponential to log form

Spot the invalid log identity

Change of base, numerically

Why maximize log-likelihood?

Reading the log-likelihood equation

Level CDerivation & implementation

Implement a stable log-likelihood

Derive the quotient rule

Numerically stable log-sum-exp

Level DResearch-thinking challenge

Why subtract the max in log-sum-exp?

Cross-entropy as negative log-likelihood

Related lessons

Prerequisites

Learning objectives

Why exponentials and logs run underneath ML

Intuition: logs turn "times" into "plus"

Formal definitions

Why eee is the "natural" base

The log rules

A worked numerical example

Deriving a log rule from an exponent law

ML use case: log-likelihood and avoiding underflow

NumPy: products vs sums in log-space

Summary

Active recall

Exercises

Level ARecall & basic calculation

Evaluate a logarithm

Product law of exponents

Root as a fractional power

A base-10 logarithm

exp and ln are inverses

Negative exponent

Level BConceptual understanding

Exponential to log form

Spot the invalid log identity

Change of base, numerically

Why maximize log-likelihood?

Reading the log-likelihood equation

Level CDerivation & implementation

Implement a stable log-likelihood

Derive the quotient rule

Numerically stable log-sum-exp

Level DResearch-thinking challenge

Why subtract the max in log-sum-exp?

Cross-entropy as negative log-likelihood

Related lessons

Why $e$ is the "natural" base