Description
Short Answer and \True or False” Conceptual questions
-
The answers to these questions should be answerable without referring to external materials. Brie y justify your answers with a few words.
-
-
[2 points] What is bias-variance tradeo ?
-
-
-
[2 points] What typically happens to bias and variance when the model complexity increases/decreases?
-
-
-
[1 point] True or False: A learning algorithm will always generalize better if we use fewer features to represent our data.
-
-
-
[2 points] True or False: Hyperparameters should be tuned on the test set. Explain your choice and detail a procedure for hyperparameter tuning.
-
-
-
[1 point] True or False: The training error of a function on the training set provides an overestimate of the true error of that function.
-
What to Submit:
-
Parts c-e: True or False
-
Parts a-e: True or False with brief (2-3 sentence) explanation
Maximum Likelihood Estimation (MLE)
-
You’re the Reign FC manager, and the team is ve games into its 2021 season. The number of goals scored by the team in each game so far are given below:
[2; 4; 6; 0; 1]:
Let’s call these scores x1; : : : ; x5. Based on your (assumed iid) data, you’d like to build a model to understand how many goals the Reign are likely to score in their next game. You decide to model the number of goals scored per game using a Poisson distribution. Recall that the Poisson distribution with parameter assigns every non-negative integer x = 0; 1; 2; : : : a probability given by
Poi(xj ) = e x :
x!
a. [5 points] Derive an expression for the maximum-likelihood estimate of the parameter governing the Poisson distribution in terms of goal counts for the rst n games: x1; : : : ; xn. (Hint: remember that the log of the likelihood has the same maximizer as the likelihood function itself.)
-
[2 points] Give a numerical estimate of after the rst ve games. Given this , what is the probability that the Reign score 6 goals in their next game?
What to Submit:
-
Part a: An expression for the MLE of after n games and relevant derivation
-
Parts b: A numerical estimate for and the probability that the Reign score 6 next game.
Polynomial Regression
Relevant Files1
-
•
polyreg.py
•
test
polyreg
learningCurve.py
•
linreg
closedform.py
•
data/polydata.dat
-
-
test polyreg univariate.py
-
-
[10 points] Recall that polynomial regression learns a function h (x) = 0 + 1x + 2x2 + : : : + dxd, where d represents the polynomial’s highest degree. We can equivalently write this in the form of a linear model with d features
-
-
-
-
-
h (x) = 0 + 1 1(x) + 2 2(x) + : : : + d d(x) ;
(1)
-
-
-
-
using the basis expansion that j(x) = xj. Notice that, with this basis expansion, we obtain a linear model where the features are various powers of the single univariate x. We’re still solving a linear regression problem, but are tting a polynomial function of the input.
Implement regularized polynomial regression in polyreg.py. You may implement it however you like, using gradient descent or a closed-form solution. However, I would recommend the closed-form solution since the data sets are small; for this reason, we’ve included an example closed-form implementation of linear regression in linreg closedform.py. Please follow the API below. Note that all matrices are actually 2D numpy arrays in the implementation.
-
init (degree=1, regLambda=1E-8) : constructor with arguments of d and
-
fit(X,Y): method to train the polynomial regression model
-
predict(X): method to use the trained polynomial regression model for prediction
-
polyfeatures(X, degree): expands the given n 1 matrix X into an n d matrix of polynomial features of degree d. Note that the returned matrix will not include the zero-th power.
Note that the polyfeatures(X, degree) function maps the original univariate data into its higher order powers. Speci cally, X will be an n 1 matrix (X 2 Rn 1) and this function will return the polynomial expansion of this data, a n d matrix. Note that this function will not add in the zero-th order feature (i.e., x0 = 1). You should add the x0 feature separately, outside of this function, before training the model.
By not including the x0 column in the matrix polyfeatures(), this allows the polyfeatures function to be more general, so it could be applied to multi-variate data as well. (If it did add the x0 feature, we’d end up with multiple columns of 1’s for multivariate data.)
Also, notice that the resulting features will be badly scaled if we |
|
use them in raw form. For example, with a polynomial of de- |
|
gree d = 8 and x = 20, the basis expansion yields x1 = 20 while |
|
x8 = 2:56 1010 { an absolutely huge di erence in range. Con- |
|
sequently, we will need to standardize the data before solving |
|
linear regression. In fit(), after you perform the polynomial |
|
feature expansion, you should standardize the values with the |
|
same degrees by subtracting their mean and divide the result |
Figure 1: Fit of polynomial regression with = |
with their standard deviation. You’ll need to apply the same |
0 and d = 8 |
standardization transformation in predict() before you apply it to new data. After standardization, be sure to add bias term before the column with lowest degree (i.e. add a column of ones to the left most of resulting matrix). You can achieve this with pad() function in numpy.
Run test polyreg univariate.py to test your implementation, which will plot the learned function. In this case, the script ts a polynomial of degree d = 8 with no regularization = 0. From the plot, we see that the function ts the data well, but will not generalize well to new data points. Try increasing the amount of regularization, and in 1-2 sentences, describe the resulting e ect on the function (you may also provide an additional plot to support your analysis).
-
Write-up: The description of resulting e ect of increasing regularization.
-
Code: Implement all functions with NotImplementedError in polreg.py and submit on on Gradescope
Ridge Regression on MNIST
-
In this problem we will implement a regularized least squares classi er for the MNIST data set. The task is to classify handwritten images of numbers between 0 to 9.
You are NOT allowed to use any of the pre-built classi ers in sklearn. Feel free to use any method from numpy or scipy. Remember: if you are inverting a matrix in your code, you are probably doing something wrong (Hint: look at scipy.linalg.solve).
Each example has features xi 2 Rd (with d = 28 28 = 784) and label zj 2 f0; : : : ; 9g. You can visualize a single example xi with imshow after reshaping it to its original 28 28 image shape (and noting that the label zj is accurate). We wish to learn a predictor fbthat takes as input a vector in Rd and outputs an index in f0; : : : ; 9g. We de ne our training and testing classi cation error on a predictor f as
-
-
-
-
-
train(f) =
1
X
1ff(x) 6= zg
Ntrain
b
1
(x;z)2Training Set
test(f) =
X
1ff(x) 6= zg
Ntest (x;z)2Test Set
b
-
-
-
-
We will use one-hot encoding of the labels: for each observation (x; z), the original label z 2 f0; : : : ; 9g is mapped to the standard basis vector ez+1 where ei is a vector of size k containing all zeros except for a 1 in the ith position (positions in these vectors are indexed starting at one, hence the z + 1 o set for the digit labels). We adopt the notation where we have n data points in our training objective with features xi 2 Rd and label one-hot encoded as yi 2 f0; 1gk. Here, k = 10 since there are 10 digits.
a. [10 points] In this problem we will choose a linear classi er to minimize the regularized least squares objective:
-
-
-
-
-
c
n
Xi
yik22 + kW kF2
W = argminW 2Rd k
kW T xi
=1
-
-
-
-
Wc 2 Rd k.
• Implement a function one_hot that takes as input Y 2 f0; :::; k 1gn, and returns Y 2 f0; 1gn k.
5