Description
Short Answer and \True or False” Conceptual questions

The answers to these questions should be answerable without referring to external materials. Brie y justify your answers with a few words.


[2 points] What is biasvariance tradeo ?



[2 points] What typically happens to bias and variance when the model complexity increases/decreases?



[1 point] True or False: A learning algorithm will always generalize better if we use fewer features to represent our data.



[2 points] True or False: Hyperparameters should be tuned on the test set. Explain your choice and detail a procedure for hyperparameter tuning.



[1 point] True or False: The training error of a function on the training set provides an overestimate of the true error of that function.

What to Submit:

Parts ce: True or False

Parts ae: True or False with brief (23 sentence) explanation
Maximum Likelihood Estimation (MLE)

You’re the Reign FC manager, and the team is ve games into its 2021 season. The number of goals scored by the team in each game so far are given below:
[2; 4; 6; 0; 1]:
Let’s call these scores x_{1}; : : : ; x_{5}. Based on your (assumed iid) data, you’d like to build a model to understand how many goals the Reign are likely to score in their next game. You decide to model the number of goals scored per game using a Poisson distribution. Recall that the Poisson distribution with parameter assigns every nonnegative integer x = 0; 1; 2; : : : a probability given by
Poi(xj ) = e ^{x} :
x!
a. [5 points] Derive an expression for the maximumlikelihood estimate of the parameter governing the Poisson distribution in terms of goal counts for the rst n games: x_{1}; : : : ; x_{n}. (Hint: remember that the log of the likelihood has the same maximizer as the likelihood function itself.)

[2 points] Give a numerical estimate of after the rst ve games. Given this , what is the probability that the Reign score 6 goals in their next game?
What to Submit:

Part a: An expression for the MLE of after n games and relevant derivation

Parts b: A numerical estimate for and the probability that the Reign score 6 next game.
Polynomial Regression
Relevant Files^{1}

•
polyreg.py
•
test
polyreg
learningCurve.py
•
linreg
closedform.py
•
data/polydata.dat


test polyreg univariate.py


[10 points] Recall that polynomial regression learns a function h (x) = _{0} + _{1}x + _{2}x^{2} + : : : + _{d}x^{d}, where d represents the polynomial’s highest degree. We can equivalently write this in the form of a linear model with d features





h (x) = _{0} + _{1 1}(x) + _{2 2}(x) + : : : + _{d d}(x) ;
(1)




using the basis expansion that _{j}(x) = x^{j}. Notice that, with this basis expansion, we obtain a linear model where the features are various powers of the single univariate x. We’re still solving a linear regression problem, but are tting a polynomial function of the input.
Implement regularized polynomial regression in polyreg.py. You may implement it however you like, using gradient descent or a closedform solution. However, I would recommend the closedform solution since the data sets are small; for this reason, we’ve included an example closedform implementation of linear regression in linreg closedform.py. Please follow the API below. Note that all matrices are actually 2D numpy arrays in the implementation.

init (degree=1, regLambda=1E8) : constructor with arguments of d and

fit(X,Y): method to train the polynomial regression model

predict(X): method to use the trained polynomial regression model for prediction

polyfeatures(X, degree): expands the given n 1 matrix X into an n d matrix of polynomial features of degree d. Note that the returned matrix will not include the zeroth power.
Note that the polyfeatures(X, degree) function maps the original univariate data into its higher order powers. Speci cally, X will be an n 1 matrix (X 2 R^{n} ^{1}) and this function will return the polynomial expansion of this data, a n d matrix. Note that this function will not add in the zeroth order feature (i.e., x_{0} = 1). You should add the x_{0} feature separately, outside of this function, before training the model.
By not including the x_{0} column in the matrix polyfeatures(), this allows the polyfeatures function to be more general, so it could be applied to multivariate data as well. (If it did add the x_{0} feature, we’d end up with multiple columns of 1’s for multivariate data.)
Also, notice that the resulting features will be badly scaled if we 

use them in raw form. For example, with a polynomial of de 

gree d = 8 and x = 20, the basis expansion yields x^{1} = 20 while 

x^{8} = 2:56 10^{10} { an absolutely huge di erence in range. Con 

sequently, we will need to standardize the data before solving 

linear regression. In fit(), after you perform the polynomial 

feature expansion, you should standardize the values with the 

same degrees by subtracting their mean and divide the result 
Figure 1: Fit of polynomial regression with = 
with their standard deviation. You’ll need to apply the same 
0 and d = 8 
standardization transformation in predict() before you apply it to new data. After standardization, be sure to add bias term before the column with lowest degree (i.e. add a column of ones to the left most of resulting matrix). You can achieve this with pad() function in numpy.
Run test polyreg univariate.py to test your implementation, which will plot the learned function. In this case, the script ts a polynomial of degree d = 8 with no regularization = 0. From the plot, we see that the function ts the data well, but will not generalize well to new data points. Try increasing the amount of regularization, and in 12 sentences, describe the resulting e ect on the function (you may also provide an additional plot to support your analysis).

Writeup: The description of resulting e ect of increasing regularization.

Code: Implement all functions with NotImplementedError in polreg.py and submit on on Gradescope
Ridge Regression on MNIST

In this problem we will implement a regularized least squares classi er for the MNIST data set. The task is to classify handwritten images of numbers between 0 to 9.
You are NOT allowed to use any of the prebuilt classi ers in sklearn. Feel free to use any method from numpy or scipy. Remember: if you are inverting a matrix in your code, you are probably doing something wrong (Hint: look at scipy.linalg.solve).
Each example has features x_{i} 2 R^{d} (with d = 28 28 = 784) and label z_{j} 2 f0; : : : ; 9g. You can visualize a single example x_{i} with imshow after reshaping it to its original 28 28 image shape (and noting that the label z_{j} is accurate). We wish to learn a predictor fbthat takes as input a vector in R^{d} and outputs an index in f0; : : : ; 9g. We de ne our training and testing classi cation error on a predictor f as





_{train}(f) =
1
X
1ff(x) 6= zg
^{N}train
b
1
(x;z)2Training Set
_{test}(f) =
X
1ff(x) 6= zg
^{N}^{test }(x;z)2Test Set
b




We will use onehot encoding of the labels: for each observation (x; z), the original label z 2 f0; : : : ; 9g is mapped to the standard basis vector e_{z+1} where e_{i} is a vector of size k containing all zeros except for a 1 in the i^{th} position (positions in these vectors are indexed starting at one, hence the z + 1 o set for the digit labels). We adopt the notation where we have n data points in our training objective with features x_{i} 2 R^{d} and label onehot encoded as y_{i} 2 f0; 1g^{k}. Here, k = 10 since there are 10 digits.
a. [10 points] In this problem we will choose a linear classi er to minimize the regularized least squares objective:





c
n
X_{i}
y_{i}k_{2}^{2} + kW k_{F}^{2}
W = argmin_{W} _{2R}d k
kW ^{T} x_{i}
=1




_{W}c 2 _{R}^{d k}_{.}
• Implement a function one_hot that takes as input Y 2 f0; :::; k 1g^{n}, and returns Y 2 f0; 1g^{n k}.
5