Description

Assume that you are given the following sample . Estimate the weight of people whose heights are 150, 155, 165, and 190 cm, using KNN with k = 3:
_{y}_{^}_{KNN}_{ = }^{y}1 ^{+} ^{y}2 ^{+} ^{+} ^{y}k
k
where y_{1}; y_{2}; ; y_{k} are the labels of the k nearest neighbors to your test instance. (10 pts)





Person
Height (cm)
Weight (kg)
1
171
80
2
168
78
3
191
100
4
182
80
5
150
65
6
178
83





Repeat 1, but instead of using the simple average of the labels of k nearest neighbors, which is use the following weighted average:
_{y}_{^}_{KNN}_{ = }^{w}1^{y}1 ^{+} ^{w}2^{y}2 ^{+} ^{+} ^{w}k^{y}k
w_{1} + w_{2} + + w_{k}
where the weight w_{i} for the label y_{i} of instance i is determined as 1=d_{i}, where d_{i} the distance between the instance i and the test instance. (10 pts)
3. Assume that J(x) = x^{T} Qx + d^{T} x + c where Q = Q^{T} 2 R^{n n} , x; d 2 R^{n}, and c 2 Rj. Show that r_{x}J(x) = 2Qx + d and H = ^{@}^{2}^{J}_{T} = 2Q. H_{ij} = ^{@}^{2}^{J} and H is called the
@x@x @x_{i}@x_{j}
Hessian matrix of J. (10 pts)

Write down the prediction yb for a test row vector x^{0}_{1} _{p} made by a linear regression model in terms of y the vector of labels of the training set and X_{n} _{(p+1)}, the (augmented) feature matrix, and explain why yb can be viewed as a special case of KNN regression. (10 pts)
n 
T 
1 T 
member of the column space of 

5. 
Show that the for y 2 R , y = X(X X) 
X y is a 

n (p+1) 
. (10 pts) 

X, i.e. is a linear 
combination of the columns of X 
2 R 

b 

6. 
Show that in linear regression, if minimizes RSS( ), then y 
y is orthogonal to the 

column space of X. (10 pts) 
b 

b 

Programming Part: Vertebral Column Data Set
This Biomedical data set was built by Dr. Henrique da Mota during a medical residence period in Lyon, France. Each patient in the data set is represented in the data set by six biomechanical attributes derived from the shape and orientation of the pelvis and lumbar spine (in this order): pelvic incidence, pelvic tilt, lumbar lordosis angle,
Homework 1 EE 559, Instructor: Mohammad Reza Rajati
sacral slope, pelvic radius and grade of spondylolisthesis. The following convention is used for the class labels: DH (Disk Hernia), Spondylolisthesis (SL), Normal (NO) and Abnormal (AB). In this exercise, we only focus on a binary classi cation task NO=0 and AB=1.^{1}

Download the Vertebral Column Data Set from: https://archive.ics.uci. edu/ml/datasets/Vertebral+Column.

PreProcessing and Exploratory data analysis: (10 pts)


Make scatterplots of the independent variables in the dataset. Use color to show Classes 0 and 1.



Make boxplots for each of the independent variables. Use color to show Classes 0 and 1 (see ISLR p. 129).



Select the rst 70 rows of Class 0 and the rst 140 rows of Class 1 as the training set and the rest of the data as the test set.


Classi cation using KNN on Vertebral Column Data Set (20 pts)


Write code for knearest neighbors with Euclidean metric (or use a software package).



Test all the data in the test database with k nearest neighbors. Take decisions by majority polling. Plot train and test errors in terms of k for k 2 f208; 205; : : : ; 7; 4; 1; g (in reverse order). You are welcome to use smaller increments of k. Which k is the most suitable k among those values? Calculate the confusion matrix, true positive rate, true negative rate, precision, and F_{1}score when k = k .^{2}



Since the computation time depends on the size of the training set, one may only use a subset of the training set. Plot the best test error rate,^{3} which is obtained by some value of k, against the size of training set, when the size of training set is N 2 f10; 20; 30; : : : ; 210g.^{4} Note: for each N, select your training set by choosing the rst bN=3c rows of Class 0 and the rst N b N=3c rows of Class 1 in the training set you created in 7(b)iii. Also, for each N, select the optimal k from a set starting from k = 1, increasing by 5. For example, if N = 200, the optimal k is selected from f1; 6; 11; : : : ; 196g. This plot is called a Learning Curve.

Let us further explore some variants of KNN.

Replace the Euclidean metric with the following metrics^{5} and test them. Summarize the test errors (i.e., when k = k ) in a table. Use all of your training data and select the best k when f1; 6; 11; : : : ; 196g. (10 pts)
^{1}Make sure that you convert labels to 0 and 1, otherwise you may not obtain correct answers.
^{2}We will learn in the lectures what these mean, for now research how they are computed and compute them.

Obviously, use the test data you created in 7(b)iii

For extra practice, you are welcome to choose smaller increments of N.
^{5}You can use sklearn.neighbors.DistanceMetric. Research what each distance means.
Homework 1 EE 559, Instructor: Mohammad Reza Rajati


Minkowski Distance:




which becomes Manhattan Distance with p = 1.





with log_{10}(p) 2 f0:1; 0:2; 0:3; : : : ; 1g. In this case, use the k you found for the Manhattan distance in 7(d)iA. What is the best log_{10}(p)?





which becomes Chebyshev Distance with p ! 1




Mahalanobis Distance.^{6}


The majority polling decision can be replaced by weighted decision, in which the weight of each point in voting is inversely proportional to its distance from the query/test data point. In this case, closer neighbors of a query point will have a greater in uence than neighbors which are further away. Use weighted voting with Euclidean, Manhattan, and Chebyshev distances and report the best test errors when k 2 f1; 6; 11; 16; : : : ; 196g. (10 pts)

What is the lowest training error rate you achieved in this homework? (5 pts)
^{6}Mahalanobis Distance requires inverting the covariance matrix of the data. When the covariance matrix is singular or illconditioned, the data live in a linear subspace of the feature space. In this case, the features have to be transformed into a reduced feature set in the linear subspace, which is equivalent to using a pseudoinverse instead of an inverse.
3