Description

Assune that in a cclass classi cation problem, we have k features X_{1}; X_{2}; : : : ; X_{k} that are independent conditioned on the class label and X_{j}j!_{i} Gamma(p_{i}; _{j}), i.e.
p_{X}_{j} _{j!}_{i} (x_{j}j!_{i}) = ^{1}_{p}_{i}_{)} ^{p}_{j}^{i} x^{p}_{j}^{i} ^{1}e ^{j} ^{x}^{j} ; p_{i}; _{j} > 0. (30 pts)


Determine the Bayes’ optimal classi er’s decision rule making the general assumption that the prior probability of the classes are di erent.



When are the decision boundaries linear functions of x_{1}; x_{2}; : : : ; x_{k}?



Assuming that p_{1} = 4; p_{2} = 2; c = 2; k = 4; _{1} = _{3} = 1; _{2} = _{4} = 2, and that the prior probabilites of each class are equal, classify x = (0:1; 0:2; 0:3; 4).



Assuming that p_{1} = 3:2; p_{2} = 8; c = 2; k = 1; _{1} = 1, and that the prior probabilites of each class are equal, nd the decision boundary x = x . Also, nd the probability of type1 and type2 errors.



Assuming that p_{1} = p_{2} = 4; c = 2; k = 2; _{1} = 8; _{2} = 0:3, and P (!_{1}) = 1=4; P (!_{2}) = 3=4, nd the decision boundary f(x_{1}; x_{2}) = 0.


Assume that in a cclass classi cation problem, there are k conditionally indepen
dent features and X_{i}j!_{j} Lap(m_{ij}; _{i}), i.e. p_{X}_{i}_{j!}_{j} (x_{i}j!_{j}) = _{2}^{i} e ^{i}^{jx}^{i} ^{m}^{ij} ^{j}; _{i} > 0; i 2 f1; 2; : : : ; kg; j 2 f1; 2; : : : ; cg. Assuming that the prior class probabilities are equal, show that the minimum error rate classi er is also a minimum weighted Manhattan dis
tance (or weighted L_{1}distance) classi er. When does the minimum error rate classi er becomes the minimum Manhattan distance classi er? (15 pts)

The classconditional density functions of a discrete random variable X for four pattern classes are shown below: (20 pts)





x
p(xj!_{1})
p(xj!_{2})
p(xj!_{3})
p(xj!_{4})
1
1/3
1/2
1/6
2/5
2
1/3
1/4
1/3
2/5
3
1/3
1/4
1/2
1/5




The loss function ( _{i}j!_{j}) is summarized in the following table, where action _{i} means decide pattern class !_{i}:







!_{1}
!_{2}
!_{3}
!_{4}
_{1}
0
2
3
4
_{2}
1
0
1
8
3
3
2
0
2
_{4}
5
3
1
0






Assume P (!_{1}) = 1=10; P (!_{2}) = 1=5; P (!_{3}) = 1=2; P (!_{4}) = 1=5.
(a) Compute the conditional risk for each action as:
R( _{i}jx) = ^{P}^{4} ( _{i}j!_{j})p(!_{j}jx)
j=1
Homework 3 
EE 559, Instructor: Mohammad Reza Rajati 

(b) Compute the overall risk R as: 

R = 
3 
R( (x_{i}) x_{i})p(x_{i}) 
i=1 
P
j
where (x_{i}) is the decision rule minimizing the conditional risk for x_{i}.
4. The following data set was collected to classify people who evade taxes:



Tax ID
Refund
Marital Status
Taxable Income
Evade
1
Yes
Single
122 K
No
2
No
Married
77 K
No
3
No
Married
106 K
No
4
No
Single
88 K
Yes
5
Yes
Divorced
210 K
No
6
No
Single
72 K
No
7
Yes
Married
117 K
No
8
No
Married
60 K
No
9
No
Divorced
90 K
Yes
10
No
Single
85 K
Yes


Considering relevant features in the table (only one feature is not relevant), assume that the features are conditionally independent. (25 pts)


Estimate prior class probabilities.



For continuous feature(s), assume conditional Gaussianity and estimate class conditional pdfs p(xj!_{i}). Use Maximum Likelihood Estimates.



For each discrete feature X, assume that the number of instances in class !_{i} for which X = x_{j} is n_{ji} and the number of instances in class !_{i} is n_{i}. Estimate the probability mass p_{Xj!}_{i} (x_{j}j!_{i}) = P (X = x_{j}j!_{i}) as n_{ji}=n_{i} for each discrete feature. Is this a valid estimate of the pmf?



There is an issue with using the estimate you calculated in 4c. Explain why the laplace correction (n_{ji} +1)=(n_{i} +l), where l is the number of levels X can assume,^{1} solves the problem with the estimate given in 4c. Is this a valid estimate of the pmf?



Estimate the minimum error rate decision rule for classifying tax evasion using Laplace correction.


Programming Part: Breast Cancer Prognosis
The goal of this assignment is to determine the prognosis of breast cancer patients using the features extrracted from digital images of Fine Needle Aspirates (FNA) of a breast mass. You will work with the Wisconsin Prognostic Breast Cancer data set, WPBC. There are 34 attributes in the data set: the rst attribute is a patient ID, the second is an outcome variable that shows whether the cancer recurred after two years or not (N for Nonrecurrent, R for Recurrent), the third variable is also an income
^{1}For example, if X 2 fapple; orange; pear; peach; blueberryg, then d = 5.
Homework 3 EE 559, Instructor: Mohammad Reza Rajati
variable that shows the time to recurrence. The other 30 attributes are the features that you will work with to build a diagnosis tool for breast cancer.
Ten realvalued features are calculated for each nucleus in the digital image of the FNA of a breast mass.^{2} They are:
radius (mean of distances from center to points on the perimeter) texture (standard deviation of grayscale values)
perimeter area
smoothness (local variation in radius lengths) compactness (perimeter^{2} / area – 1.0)
concavity (severity of concave portions of the contour)
concave points (number of concave portions of the contour) symmetry
fractal dimension \coastline approximation” – 1)
Ther mean, standard deviation, and the mean of three largest values for each image has been computed, to represent each image using 3 10 features.
Additionally, the diameter of the excised tumor in centimeters and the number of positive axillary lymph nodes are also given in the data set.
Important Note: Time to recurrence (third attribute) should not be used for classi – cation, otherwise, you will be able to perfectly classify!
There are 198 instances in the data set, 151 of which are nonrecurrent, and 47 are recurrent.

Download the WPBC data from: https://archive.ics.uci.edu/ml/datasets/ Breast+Cancer+Wisconsin+(Diagnostic).

Select the rst 130 nonrecurrent cases and the rst 37 recurrent cases as your training set. Add record #197 in the data set to your training set as well. (10 pts)

There are four instances in your training set that are missing the lymph node feature (denoted as ?). This is not a very severe issue, so replace the missing features with the median of the lymph node feature in your training set. (5 pts)

Binary Classi cation Using Na ve Bayes’ Classi ers


Solve the problem using a Na ve Bayes’ classi er. Use Gaussian class conditional distributions. Report the confusion matrix, ROC, precision, recall, F1 score, and AUC for both the train and test data sets. (10 pts)


For more details see: https://www.researchgate.net/publication/2512520_Nuclear_Feature_ Extraction_For_Breast_Tumor_Diagnosis.
Homework 3 EE 559, Instructor: Mohammad Reza Rajati


This data set is rather imbalanced. Balance your data set using SMOTE, by downsampling the common class in the training set to 90 instances and upsampling the uncommon class to 90 instances. Use k = 5 nearest neighbors in SMOTE. Remember not to change the balance of the test set. Report the confusion matrix, ROC, precision, recall, F1 score, and AUC for both the train and test data sets. Does SMOTE help? (10 pts)


(Extra practice, will not be graded) Solve the regression problem of estimating time to recurrence (third attribute) using the next 32 attributes. You can use KNN regression. To do it in a principled way, select 20% of data points each class in your training set to choose the best k 2 1; 2; : : : ; 20, and the rest 80% as the new training set. Report your MSE on the test set using the k you found and the whole training set (not only the new training set!). For simplicity, use Euclidean Distance. Repeat this process when you apply SMOTE to your new training set to only upsample the rare class and make the data completely balanced. Does SMOTE help in reducing the MSE?
4