Vectors

Why vectors are everywhere in ML

Open almost any machine-learning paper and within the first page you will meet a vector. A word becomes a vector (an embedding). An image becomes a vector of pixel intensities. A user, a molecule, a sentence — all become vectors. The reason is simple: once a thing is a list of numbers, we can measure it, compare it, transform it, and optimize over it with the machinery of linear algebra.

So our first real object of study is the vector, seen three ways at once:

Geometrically, as an arrow with a length and a direction.
Algebraically, as an ordered list of numbers.
Computationally, as a 1-D NumPy array of a fixed shape.

Fluency means switching between these views without friction. That is the goal of this chapter.

Intuition: an arrow and a list are the same thing

Picture the point $(3, 1)$ in the plane. Draw an arrow from the origin to it. That arrow is the vector $\mathbf{a} = (3, 1)$ . The two numbers are the instructions "go 3 right, then 1 up." Every arrow from the origin corresponds to exactly one list of numbers, and vice versa. In $n$ dimensions we lose the ability to draw the arrow, but the correspondence still holds: a vector is $n$ instructions, one per axis.

Interactive LabVector Playground

Loading interactive lab…

Drag the tips above. Notice that the coordinates (the list) and the arrow (the geometry) always agree — moving one moves the other.

Formal definitions

By convention a vector is a column unless stated otherwise, so $\mathbf{x} \in \mathbb{R}^n$ is really an $n \times 1$ matrix. Its transpose $\mathbf{x}^\top$ is the corresponding $1 \times n$ row vector. This column default matters the moment we multiply by matrices in the next chapter.

Symbol	Meaning	Type	Shape	Role
$\mathbf{x}$	A vector	vector	n×1	variable
$x_i$	The i-th component (a number)	scalar	1	variable
$n$	Dimension (number of components)	integer	1	fixed
$\mathbf{x}^\top$	Transpose (row form)	vector	1×n	variable
$\mathbb{R}^n$	The set of all real n-vectors	set	—	fixed

The three basic operations

Two vectors of the same dimension can be added component-by-component, and any vector can be scaled by a number:

\mathbf{a} + \mathbf{b} = (a_1 + b_1,\ \ldots,\ a_n + b_n), \qquad c\,\mathbf{a} = (c\,a_1,\ \ldots,\ c\,a_n)

(7.1)

A linear combination applies both at once: given scalars $c_1, \ldots, c_k$ and vectors $\mathbf{v}_1, \ldots, \mathbf{v}_k$ ,

c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k

(7.2)

Linear combinations are the single most important operation in linear algebra — a neural network layer, a weighted average, and a regression prediction are all linear combinations.

The dot product, three ways

The dot product takes two vectors of the same dimension and returns a single number:

\mathbf{a} \cdot \mathbf{b} \;=\; \mathbf{a}^\top \mathbf{b} \;=\; \sum_{i=1}^{n} a_i\, b_i

(7.3)

That is the algebraic view. There are two more, and holding all three together is what makes the dot product intuitive.

the geometric form

The dot product also equals $\mathbf{a}\cdot\mathbf{b} = \lVert \mathbf{a}\rVert\,\lVert \mathbf{b}\rVert \cos\theta,$ where $\lVert\cdot\rVert$ is length and $\theta$ is the angle between the vectors. Sketch of why: place $\mathbf{a}$ and $\mathbf{b}$ tail-to-tail and apply the law of cosines to the triangle they form with $\mathbf{a}-\mathbf{b}$ . Expanding $\lVert\mathbf{a}-\mathbf{b}\rVert^2$ two ways — once as a dot product and once via the law of cosines — and cancelling gives the identity. The consequence is the useful one: the sign of $\mathbf{a}\cdot\mathbf{b}$ tells you whether the vectors point in broadly the same direction ( $>0$ ), are orthogonal ( $=0$ ), or point apart ( $<0$ ).

The third view is projection: $\mathbf{a}\cdot\mathbf{b}$ measures how much of $\mathbf{a}$ lies along $\mathbf{b}$ (scaled by $\lVert\mathbf{b}\rVert$ ). Toggle "Show projection" in the lab above to see it.

ML use case: a neuron is a dot product

A single artificial neuron computes exactly

z = \mathbf{w}^\top \mathbf{x} + b = \sum_{i=1}^{n} w_i x_i + b

(7.4)

where $\mathbf{x} \in \mathbb{R}^n$ is the input (a feature vector), $\mathbf{w} \in \mathbb{R}^n$ are the learned weights, and $b \in \mathbb{R}$ is a bias. The dot product $\mathbf{w}^\top\mathbf{x}$ is a weighted sum of the features — the weights say how much each feature matters. Every dense layer of every neural network is a stack of these dot products. Understand the dot product and you understand the arithmetic core of deep learning.

Similarly, the similarity between two embeddings — "how alike are these two words / images / users?" — is almost always a (normalized) dot product. We make that precise in the next chapter with cosine similarity.

NumPy implementation

In NumPy a vector is a 1-D array. Its shape is a one-element tuple (n,). Let us implement the dot product two ways — an explicit loop and the vectorized call — and confirm they agree. Run it:

dot_product.py

import numpy as np

a = np.array([3.0, 1.0])
b = np.array([1.0, 3.0])

# Shapes: a vector is 1-D. Its shape is (n,), NOT (n, 1).
print("a shape:", a.shape)          # (2,)

# 1) Manual dot product with a loop (what the sum in eq. 7.3 says literally)
manual = 0.0
for i in range(a.shape[0]):
  manual += a[i] * b[i]

# 2) Vectorized: the @ operator (preferred) or np.dot
vectorized = a @ b

print("manual     =", manual)       # 6.0
print("vectorized =", vectorized)   # 6.0
assert np.isclose(manual, vectorized), "the two must agree"
print("agree OK")

The vectorized version is not just shorter — for large n it is dramatically faster, because NumPy runs the multiply-and-add loop in optimized C over contiguous memory instead of in the Python interpreter. Prefer vectorized operations; reach for an explicit loop only to explain what an operation means.

Interactive experiment

Return to the Vector Playground and build intuition for these facts by dragging:

Make $\mathbf{a}\cdot\mathbf{b}$ as large as possible with fixed lengths — you will find the vectors must point the same way ( $\theta = 0$ ).
Make the dot product zero — the vectors become perpendicular.
Make it negative — they point more than 90° apart.

These three regimes are exactly the sign behavior predicted by the geometric form.

Summary

A vector in $\mathbb{R}^n$ is an ordered list of $n$ numbers, equivalently an arrow with length and direction. Column form is the default.
Vectors of equal dimension add component-wise and scale by a number; combining both gives linear combinations, the workhorse operation of the field.
The dot product $\mathbf{a}\cdot\mathbf{b} = \sum_i a_i b_i$ returns a scalar and has three readings: sum of products, $\lVert a\rVert\lVert b\rVert\cos\theta$ , and a projection. Its sign encodes directional agreement.
A neuron $z = \mathbf{w}^\top\mathbf{x} + b$ is a dot product plus a bias — the arithmetic core of neural networks.
In NumPy a vector is a 1-D array of shape (n,); use a @ b for the dot product and keep (n,) distinct from (n, 1).

Active recall

Answer from memory before checking the lesson:

State the dot product of $\mathbf{a}, \mathbf{b} \in \mathbb{R}^n$ as a sum, and say what shape the result has.
Two nonzero vectors have a dot product of $0$ . What is the angle between them?
Why is a @ b preferred over a Python for loop for the dot product?
What is the shape of a NumPy vector np.array([1, 2, 3]) — is it (3,) or (3, 1)? Why does the distinction matter?

Exercises

Level ARecall & basic calculation

Level AHand calculationch07-A1

Compute a dot product

Let $\mathbf{a} = (2, -1, 3)$ and $\mathbf{b} = (4, 5, -2)$ . Compute $\mathbf{a} \cdot \mathbf{b}$ .

Level AHand calculationch07-A2

Vector addition

Compute $\mathbf{a} + \mathbf{b}$ for $\mathbf{a} = (1, 2)$ and $\mathbf{b} = (3, -5)$ . Enter the result as x, y.

Level AHand calculationch07-A3

Scalar multiplication

Compute $-2\,\mathbf{v}$ for $\mathbf{v} = (3, -1, 0)$ . Enter as x, y, z.

Level AHand calculationch07-A4

Length of a vector

Compute the Euclidean length $\lVert \mathbf{a} \rVert = \sqrt{\mathbf{a}\cdot\mathbf{a}}$ for $\mathbf{a} = (3, 4)$ .

Level AHand calculationch07-A5

A linear combination

With $\mathbf{v}_1 = (1, 0)$ and $\mathbf{v}_2 = (0, 1)$ , compute $3\mathbf{v}_1 + (-2)\mathbf{v}_2$ . Enter as x, y.

Level AEquation interpretationch07-A6

Orthogonality check

Two vectors are orthogonal when their dot product is which value?

Level BConceptual understanding

Level BEquation interpretationch07-B1

Sign of the dot product

Two nonzero vectors have a negative dot product. Which is true about the angle $\theta$ between them?

Level BML applicationch07-B2

Why the weighted sum?

In a neuron $z = \mathbf{w}^\top\mathbf{x} + b$ , explain in one or two sentences what a large positive weight $w_i$ means about feature $x_i$ 's influence on $z$ , and what a weight near zero means.

Level BShape reasoningch07-B3

Shape of a dot product

If $\mathbf{a}, \mathbf{b} \in \mathbb{R}^{768}$ (say, two word embeddings), what is the shape of $\mathbf{a}\cdot\mathbf{b}$ ?

Level BProof-style reasoningch07-B4

Commutativity of the dot product

Show that the dot product is commutative: $\mathbf{a}\cdot\mathbf{b} = \mathbf{b}\cdot\mathbf{a}$ for all $\mathbf{a},\mathbf{b}\in\mathbb{R}^n$ .

Level CDerivation & implementation

Level CNumPy implementationch07-C1

Implement cosine similarity

Implement cosine_similarity(a, b) for two 1-D NumPy arrays, returning $\dfrac{\mathbf{a}\cdot\mathbf{b}}{\lVert a\rVert\,\lVert b\rVert}$ . Verify it returns approximately 1.0 for two parallel vectors, then print ok.

Level CNumPy implementationch07-C2

Loop vs vectorized dot product

Write dot_loop(a, b) (an explicit Python loop) and dot_vec(a, b) (using @). Generate a, b of length 100000 with a fixed seed, confirm the results agree with np.isclose, and print match. In a comment, state which is faster and why.

Level CDerivationch07-C3

Derive the projection formula

The projection of $\mathbf{a}$ onto $\mathbf{b}$ is the vector $\mathbf{p} = t\,\mathbf{b}$ such that the error $\mathbf{a} - \mathbf{p}$ is orthogonal to $\mathbf{b}$ . Derive $t$ .

Level DResearch-thinking challenge

Level DPaper-reading practicech07-D1

Why cosine, not dot product, for similarity?

Retrieval systems and embedding papers usually rank by cosine similarity rather than the raw dot product. Give a concrete example of vectors where the raw dot product is misleading but cosine is not, explain what property cosine adds, then name one situation where the raw dot product is the right choice.