Vectors
Arrows, lists, feature vectors, and embeddings
Prerequisites
Learning objectives
- Hold the geometric and the list-of-numbers views simultaneously
- Compute sums, scalar multiples, and linear combinations
- Compute the dot product and interpret it three ways
- See feature vectors and embeddings as points in ℝⁿ
Why vectors are everywhere in ML
Open almost any machine-learning paper and within the first page you will meet a vector. A word becomes a vector (an embedding). An image becomes a vector of pixel intensities. A user, a molecule, a sentence — all become vectors. The reason is simple: once a thing is a list of numbers, we can measure it, compare it, transform it, and optimize over it with the machinery of linear algebra.
So our first real object of study is the vector, seen three ways at once:
- Geometrically, as an arrow with a length and a direction.
- Algebraically, as an ordered list of numbers.
- Computationally, as a 1-D NumPy array of a fixed shape.
Fluency means switching between these views without friction. That is the goal of this chapter.
Intuition: an arrow and a list are the same thing
Picture the point in the plane. Draw an arrow from the origin to it. That arrow is the vector . The two numbers are the instructions "go 3 right, then 1 up." Every arrow from the origin corresponds to exactly one list of numbers, and vice versa. In dimensions we lose the ability to draw the arrow, but the correspondence still holds: a vector is instructions, one per axis.
Drag the tips above. Notice that the coordinates (the list) and the arrow (the geometry) always agree — moving one moves the other.
Formal definitions
By convention a vector is a column unless stated otherwise, so is really an matrix. Its transpose is the corresponding row vector. This column default matters the moment we multiply by matrices in the next chapter.
| Symbol | Meaning | Type | Shape | Role |
|---|---|---|---|---|
| A vector | vector | n×1 | variable | |
| The i-th component (a number) | scalar | 1 | variable | |
| Dimension (number of components) | integer | 1 | fixed | |
| Transpose (row form) | vector | 1×n | variable | |
| The set of all real n-vectors | set | — | fixed |
The three basic operations
Two vectors of the same dimension can be added component-by-component, and any vector can be scaled by a number:
A linear combination applies both at once: given scalars and vectors ,
Linear combinations are the single most important operation in linear algebra — a neural network layer, a weighted average, and a regression prediction are all linear combinations.
The dot product, three ways
The dot product takes two vectors of the same dimension and returns a single number:
That is the algebraic view. There are two more, and holding all three together is what makes the dot product intuitive.
The third view is projection: measures how much of lies along (scaled by ). Toggle "Show projection" in the lab above to see it.
ML use case: a neuron is a dot product
A single artificial neuron computes exactly
where is the input (a feature vector), are the learned weights, and is a bias. The dot product is a weighted sum of the features — the weights say how much each feature matters. Every dense layer of every neural network is a stack of these dot products. Understand the dot product and you understand the arithmetic core of deep learning.
Similarly, the similarity between two embeddings — "how alike are these two words / images / users?" — is almost always a (normalized) dot product. We make that precise in the next chapter with cosine similarity.
NumPy implementation
In NumPy a vector is a 1-D array. Its shape is a one-element tuple (n,). Let
us implement the dot product two ways — an explicit loop and the vectorized call —
and confirm they agree. Run it:
The vectorized version is not just shorter — for large n it is dramatically
faster, because NumPy runs the multiply-and-add loop in optimized C over
contiguous memory instead of in the Python interpreter. Prefer vectorized
operations; reach for an explicit loop only to explain what an operation means.
Interactive experiment
Return to the Vector Playground and build intuition for these facts by dragging:
- Make as large as possible with fixed lengths — you will find the vectors must point the same way ().
- Make the dot product zero — the vectors become perpendicular.
- Make it negative — they point more than 90° apart.
These three regimes are exactly the sign behavior predicted by the geometric form.
Summary
- A vector in is an ordered list of numbers, equivalently an arrow with length and direction. Column form is the default.
- Vectors of equal dimension add component-wise and scale by a number; combining both gives linear combinations, the workhorse operation of the field.
- The dot product returns a scalar and has three readings: sum of products, , and a projection. Its sign encodes directional agreement.
- A neuron is a dot product plus a bias — the arithmetic core of neural networks.
- In NumPy a vector is a 1-D array of shape
(n,); usea @ bfor the dot product and keep(n,)distinct from(n, 1).
Active recall
Answer from memory before checking the lesson:
- State the dot product of as a sum, and say what shape the result has.
- Two nonzero vectors have a dot product of . What is the angle between them?
- Why is
a @ bpreferred over a Pythonforloop for the dot product? - What is the shape of a NumPy vector
np.array([1, 2, 3])— is it(3,)or(3, 1)? Why does the distinction matter?
Exercises
Level ARecall & basic calculation
Compute a dot product
Let and . Compute .
Vector addition
Compute for and . Enter the result as x, y.
Scalar multiplication
Compute for . Enter as x, y, z.
Length of a vector
Compute the Euclidean length for .
A linear combination
With and , compute . Enter as x, y.
Orthogonality check
Two vectors are orthogonal when their dot product is which value?
Level BConceptual understanding
Sign of the dot product
Two nonzero vectors have a negative dot product. Which is true about the angle between them?
Why the weighted sum?
In a neuron , explain in one or two sentences what a large positive weight means about feature 's influence on , and what a weight near zero means.
Shape of a dot product
If (say, two word embeddings), what is the shape of ?
Commutativity of the dot product
Show that the dot product is commutative: for all .
Level CDerivation & implementation
Implement cosine similarity
Implement cosine_similarity(a, b) for two 1-D NumPy arrays, returning . Verify it returns approximately 1.0 for two parallel vectors, then print ok.
Loop vs vectorized dot product
Write dot_loop(a, b) (an explicit Python loop) and dot_vec(a, b) (using @). Generate a, b of length 100000 with a fixed seed, confirm the results agree with np.isclose, and print match. In a comment, state which is faster and why.
Derive the projection formula
The projection of onto is the vector such that the error is orthogonal to . Derive .
Level DResearch-thinking challenge
Why cosine, not dot product, for similarity?
Retrieval systems and embedding papers usually rank by cosine similarity rather than the raw dot product. Give a concrete example of vectors where the raw dot product is misleading but cosine is not, explain what property cosine adds, then name one situation where the raw dot product is the right choice.