Greek Letters

How each Greek letter is conventionally used in machine learning.

Lowercase letters

The letters you meet most in optimization, statistics, and model definitions.

Letter	Name	Typical ML use
$\alpha$	alpha	Learning rate / step size; also a mixing or momentum coefficient.
$\beta$	beta	Regression coefficients $\beta$ ; Adam decay rates $\beta_1,\beta_2$ .
$\gamma$	gamma	Discount factor in RL; scale parameter in batch/kernel methods.
$\delta$	delta	Small change or error term; the TD-error $\delta$ in RL.
$\epsilon$	epsilon	Tiny constant for numerical stability ( $x/(\sigma+\epsilon)$ ); noise; exploration rate.
$\eta$	eta	Learning rate (alternative to $\alpha$ ).
$\theta$	theta	Model parameters/weights; the thing gradient descent optimizes.
$\lambda$	lambda	Regularization strength ( $\lambda \lVert w \rVert_2^2$ ); eigenvalue; Lagrange multiplier.
$\mu$	mu	Mean of a distribution or a running average (BatchNorm mean).
$\rho$	rho	Correlation coefficient; spectral radius; RMSProp decay.
$\sigma$	sigma	Standard deviation; the sigmoid function $\sigma(z)$ ; activation nonlinearity.
$\phi$	phi	Feature map $\phi(x)$ ; also a set of parameters (e.g. an encoder).
$\psi$	psi	A secondary parameter set or basis function.
$\omega$	omega	Angular frequency; sometimes weights in older notation.

Uppercase & special forms

Letter	Name	Typical ML use
$\Sigma$	capital sigma	Covariance matrix; also the summation operator $\sum$ .
$\Pi$	capital pi	The product operator $\prod$ .
$\Delta$	capital delta	A finite change/update, e.g. $\Delta w$ (weight update).
$\Theta$	capital theta	Full parameter set; also asymptotic order $\Theta(n)$ .
$\Phi$	capital phi	Matrix of stacked feature maps; the standard-normal CDF.
$\Omega$	capital omega	A regularization penalty $\Omega(\theta)$ ; sample space in probability.
$\nabla$	nabla	Gradient operator $\nabla_\theta L$ (a vector of partials).
$\pi$	pi	The constant $3.14159\ldots$ ; a policy $\pi(a\mid s)$ in RL.