Greek Letters
How each Greek letter is conventionally used in machine learning.
Lowercase letters
The letters you meet most in optimization, statistics, and model definitions.
| Letter | Name | Typical ML use |
|---|---|---|
alpha | Learning rate / step size; also a mixing or momentum coefficient. | |
beta | Regression coefficients ; Adam decay rates . | |
gamma | Discount factor in RL; scale parameter in batch/kernel methods. | |
delta | Small change or error term; the TD-error in RL. | |
epsilon | Tiny constant for numerical stability (); noise; exploration rate. | |
eta | Learning rate (alternative to ). | |
theta | Model parameters/weights; the thing gradient descent optimizes. | |
lambda | Regularization strength (); eigenvalue; Lagrange multiplier. | |
mu | Mean of a distribution or a running average (BatchNorm mean). | |
rho | Correlation coefficient; spectral radius; RMSProp decay. | |
sigma | Standard deviation; the sigmoid function ; activation nonlinearity. | |
phi | Feature map ; also a set of parameters (e.g. an encoder). | |
psi | A secondary parameter set or basis function. | |
omega | Angular frequency; sometimes weights in older notation. |
Uppercase & special forms
| Letter | Name | Typical ML use |
|---|---|---|
capital sigma | Covariance matrix; also the summation operator . | |
capital pi | The product operator . | |
capital delta | A finite change/update, e.g. (weight update). | |
capital theta | Full parameter set; also asymptotic order . | |
capital phi | Matrix of stacked feature maps; the standard-normal CDF. | |
capital omega | A regularization penalty ; sample space in probability. | |
nabla | Gradient operator (a vector of partials). | |
pi | The constant ; a policy in RL. |