Week 8 Checkpoint

Critical points and the second-derivative test, the gradient-descent update and learning-rate regimes, MSE, the linear-regression gradient $\frac{\partial L}{\partial w}=\frac{2}{n}X^\top(\hat{y}-y)$, gradient checking, and the closed-form normal equations.

  1. 1.

    For f(x)=x33xf(x)=x^3-3x, find the critical points and classify x=1x=1 with the second-derivative test.

  2. 2.

    Minimizing f(w)=w2f(w)=w^2 by gradient descent from w=1w=1 with learning rate η=0.1\eta=0.1, the update wwηf(w)w\leftarrow w-\eta f'(w) gives:

  3. 3.

    During gradient descent you observe the loss oscillating and increasing over iterations. The most likely cause is:

  4. 4.

    For predictions y^=(3,5)\hat{y}=(3,5) and targets y=(2,5)y=(2,5), the mean squared error 1ni(y^iyi)2\frac{1}{n}\sum_i(\hat{y}_i-y_i)^2 is:

  5. 5.

    For linear regression with y^=Xw\hat{y}=Xw and L=1ny^y2L=\frac{1}{n}\lVert \hat{y}-y\rVert^2 (with XRn×dX\in\mathbb{R}^{n\times d}), the gradient with respect to ww is:

  6. 6.

    Gradient checking compares an analytic gradient to a central-difference estimate. Which outcome indicates your analytic gradient is correct?

  7. 7.

    The closed-form (normal-equations) solution minimizing the linear-regression MSE is: