Description
1. The pdf of a 
; 1) random variable is p(z) = z exp( 
z); 
z > 0, and the pmf of a 
Poisson random variable X is p_{X} (x) = ^{x}e =x!; > 0; 
x = 0; 1; : : :. Assuming that 

X_{1}; X_{2}; : : : ; X_{n} is an i.i.d Poisson sample given that has a 
; 1) prior distribution, 
nd the MAP estimate of and prove that what you nd is actually a value that maximizes the posterior. (10 pts)

Assume that you have an i.i.d sample from a population with Poisson pmf, i.e. p_{X} (x) = ^{x}e =x!; > 0; x = 0; 1; : : :. Calculate the MLE of and its asymptotic distribution by calculating Fisher information and compare the results with those of the Central Limit Theorem. (10 pts)

Assume that Y = _{0} + _{1}X_{1} + + _{p}X_{p} + , where N (0; ^{2}). Show that the MLE and least squares estimates of the vector are the same, which means MLE is also BLUE according to GaussMarkov. Remember that the likelihood function here is based on the conditional density p(Y jX_{1}; : : : ; X_{p}). (10 pts)

Find the MAP estimate of under the assumption that Y = _{0} + _{1}X_{1} + + _{p}X_{p} + , where N (0; ^{2}) and that the prior distribution of (independent) _{i}; i = 1; 2; : : : ; p is N (0; ^{2}= ). Interpret your results. (15 pts)

Find the MAP estimate of under the assumption that Y = _{0} + _{1}X_{1} + + _{p}X_{p} + , where N (0; ^{2}) and that the prior distribution of (independent) _{i}; i = 1; 2; : : : ; p is Lap(0; ^{2}= ). Interpret your results. (15 pts)

In the least squares problem, assume that the singular value decomposition of X is U V^{T}.


Show that the vector of predicted values is: (10 pts)







p
_{j}^{2}
b
b
X_{j}
^{u}^{j} 2
_{+ }^{u}j^{T} ^{y}
_{y = X} Ridge _{=}
=1
j





where u_{j} are the columns of U. Conclude that greater amount of shrinkage is applied to basis vectors u_{j} that have smaller singular values _{j}, for a xed 0.
(b) Use SVD to show that (10 pts)






p
^{2}
X_{j}
j
tr[X(X^{T} X + I) ^{1}X^{T} ] =
^{2}
+
=1
j





This quantity is equal to the degrees of freedom p when = 0 and is called the e ective degrees of freedom for the Ridgeregularized model.

Time Series Classi cation Part 1: Feature Creation/Extraction
An interesting task in machine learning is classi cation of time series. In this problem, we will classify the activities of humans based on time series obtained by a Wireless Sensor Network.
Homework 4 EE 559, Instructor:

Download the AReM data from: https://archive.ics.uci.edu/ml/datasets/ Activity+Recognition+system+based+on+Multisensor+data+fusion+\%28AReM\ %29 . The dataset contains 7 folders that represent seven types of activities. In each folder, there are multiple les each of which represents an instant of a human performing an activity.^{1} Each le containis 6 time series collected from activities of the same person, which are called avg rss12, var rss12, avg rss13, var rss13, vg rss23, and ar rss23. There are 88 instances in the dataset, each of which contains 6 time series and each time series has 480 consecutive values.

Keep datasets 1 and 2 in folders bending1 and bending 2, as well as datasets 1, 2, and 3 in other folders as test data and other datasets as train data.

Feature Extraction
Classi cation of time series usually needs extracting features from them. In this problem, we focus on timedomain features.


Research what types of timedomain features are usually used in time series classi cation and list them (examples are minimum, maximum, mean, etc).



Extract the timedomain features minimum, maximum, mean, median, standard deviation, rst quartile, and third quartile for all of the 6 time series in each instance. You are free to normalize/standardize features or use them directly.^{2} (20 pts)

Your new dataset will look like this:



Instance
min_{1}
max_{1}
mean_{1}
median_{1}
1st quart_{6}
3rd quart_{6}
1
2
3
.
.
.
.
.
.
.
.
.
.
.
.
. . .
.
.
.
.
.
.
.
.
.
88


where, for example, 1st quart_{6}, means the rst quartile of the sixth time series in each of the 88 instances.

Estimate the standard deviation of each of the timedomain features you extracted from the data. Then, use Python’s bootstrapped or any other method to build a 90% bootsrap con dence interval for the standard deviation of each feature. (10)

Use your judgement to select the three most important timedomain features (one option may be min, mean, and max).

Assume that you want to use the training set to classify bending from other activities, i.e. you have a binary classi cation problem. Depict scatter plots of the features you speci ed in 7(c)iv extracted from time series 1, 2, and 6 of each instance, and use color to distinguish bending vs. other activities. (See p. 129 of the ISLR textbook).^{3}(10 pts)
^{1}Some of the data les need very minor cleaning. You can do it by Excel or Python.
^{2}You are welcome to experiment to see if they make a di erence.
^{3}You are welcome to repeat this experiment with other features as well as with time series 3, 4, and 5 in each instance.
Homework 4 EE 559, Instructor:

Time Series Classi cation Part 2: Binary and Multiclass Classi cation
Important Note: You will NOT submit this part with Homework 4. It will be the programming assignment of Homework 5. However, because it uses the features you extracted from time series data in Homework 4, and because some of you may want to start using your features to build models earlier, you are provided with the instructions of the next programming assignment. Thus, you may want to submit the code for Homework 4 with Homework 5 again, since it might need the feature creation code. Also, since this part involves building various models, you are strongly recommended to start as early as you can.


Binary Classi cation Using Logistic Regression^{4}




Break each time series in your training set into two (approximately) equal length time series. Now instead of 6 time series for each of the training instances, you have 12 time series for each training instance. Repeat the experiment in 7(c)v, i.e depict scatter plots of the features extracted from both parts of the time series 1,2, and 12. Do you see any considerable di erence in the results with those of 7(c)v?





Break each time series in your training set into l 2 f1; 2; : : : ; 20g time series of approximately equal length and use logistic regression^{5} to solve the binary classi cation problem, using timedomain features. Remember that breaking each of the time series does not change the number of instances. It only changes the number of features for each instance. Calculate the pvalues for your logistic regression parameters in each model corresponding to each value of l and re t a logistic regression model using your pruned set of features.^{6} Alternatively, you can use backward selection using sklearn.feature selection or glm in R. Use 5fold crossvalidation to determine the best value of the pair (l; p), where p is the number of features used in recursive feature elimination. Explain what the right way and the wrong way are to perform crossvalidation in this problem.^{7} Obviously, use the right way! Also, you may encounter the problem of class imbalance, which may make some of your folds not having any instances of the rare class. In such a case, you can use strati ed cross validation. Research what it means and use it if needed.


In the following, you can see an example of applying Python’s Recursive Feature Elimination, which is a backward selection algorithm, to logistic regression.
^{4}Some logistic regression packages have a builtin L_{2} regularization. To remove the e ect of L_{2} regularization, set = 0 or set the budget C ! 1 (i.e. a very large value).

If you encountered instability of the logistic regression problem because of linearly separable classes, modify the MaxIter parameter in logistic regression to stop the algorithm immaturely and prevent from its instability.
^{6}R calculates the pvalues for logistic regression automatically. One way of calculating them in Python is to call R within Python. There are other ways to obtain the pvalues as well.
^{7}This is an interesting problem in which the number of features changes depending on the value of the parameter l that is selected via cross validation. Another example of such a problem is Principal Component Regression, where the number of principal components is selected via cross validation.
Homework 4 
EE 559, Instructor: 

# R e c u r s i v e Feature E l i m i n a t i o n 

from 
s k l e a r n import 
d a t a s e t s 

from 
s k l e a r n . f e a t u r e 
s e l e c t i o n import RFE 

from 
s k l e a r n . l i n e a r 
m o d e l import L o g i s t i c R e g r e s s i o n 




# l o a d the
i r i s
d a t a s e t s
d a t a s e t = d a t a s e t s . l o a d
i r i s ( )
# c r e a t e
a
base
c l a s s i f i e r
used to e v a l u a t e a s u b s e t o f a t t r i b u t e s
model =
L o g i s t i c R e g r e s s i o n ( )
# c r e a t e
the RFE
model and
s e l e c t
3 a t t r i b u t e s
r f e
= RFE( model ,
3)
r f e
= r f e . f i t ( d a t a s e t . data ,
d a t a s e t . t a r g e t )
# summarize
the
s e l e c t i o n o f the
a t t r i b u t e s
p r i n t ( r f e . s u p p o r t
)
p r i n t ( r f e . r a n k i n g
)




Report the confusion matrix and show the ROC and AUC for your classi er on train data. Report the parameters of your logistic regression _{i}’s as well as the pvalues associated with them.



Test the classi er on the test set. Remember to break the time series in your test set into the same number of time series into which you broke your training set. Remember that the classi er has to be tested using the features extracted from the test set. Compare the accuracy on the test set with the crossvalidation accuracy you obtained previously.



Do your classes seem to be wellseparated to cause instability in calculating logistic regression parameters?



From the confusion matrices you obtained, do you see imbalanced classes? If yes, build a logistic regression model based on casecontrol sampling and adjust its parameters. Report the confusion matrix, ROC, and AUC of the model.


Binary Classi cation Using L_{1}penalized logistic regression


Repeat 8(a)ii using L_{1}penalized logistic regression,^{8} i.e. instead of using pvalues for variable selection, use L_{1} regularization. Note that in this problem, you have to crossvalidate for both l, the number of time series into which you break each of your instances, and , the weight of L_{1} penalty in your logistic regression objective function (or C, the budget). Packages usually perform crossvalidation for automatically.^{9}



Compare the L_{1}penalized with variable selection using pvalues. Which one performs better? Which one is easier to implement?


Multiclass Classi cation (The Realistic Case)


Find the best l in the same way as you found it in 8(b)i to build an L_{1}penalized multinomial regression model to classify all activities in your train


For L_{1}penalized logistic regression, you may want to use normalized/standardized features
^{9}Using the package Liblinear is strongly recommended.
Homework 4 EE 559, Instructor:
ing set.^{10} Report your test error. Research how confusion matrices and ROC curves are de ned for multiclass classi cation and show them for this problem if possible.^{11}

Repeat 8(c)i using a Na ve Bayes’ classi er. Use both Gaussian and Multinomial pdfs and compare the results.

Create p Principal Components from features extracted from features you extracted from l time series. Cross validate on the (l; p) pair to build a Na ve Bayes’ classi er based on the PCA features to classify all activities in your data set. Report your test error and plot the scatterplot of the classes in your training data based on the rst and second principal components you found from features extracted from l time series, where l is the value you found using crossvalidation. Show confusion matrices and ROC curves.

Which method is better for multiclass classi cation in this problem?
^{10}New versions of scikit learn allow using L_{1}penalty for multinomial regression.

For example, the pROC package in R does the job.
5