Description
Problem 1 (25pt): [K-means] Implement the K-means algorithm. Note that you cannot directly call the built-in kmeans functions.
Figure 1: Scatter plot of datasets and the initialized centers of 3 clusters
Given the matrix X whose rows represent di erent data points, you are asked to perform a k-means clustering on this dataset using the Euclidean distance as the distance function. Here k is chosen as 3. The Euclidean distance d between a vector x and a vector y both in Rp is de ned as d = pPpi=1(xi yi)2. All data in X were plotted in Figure 1. The centers of 3 clusters were
1
initialized as 1 = (6:2; 3:2) (red), 2 = (6:6; 3:7) (green), 3 = (6:5; 3:0) (blue).
-
3
5:9 3:2
-
7
-
4:6 2:9 7
-
7
-
6:2 2:8 7
-
7
-
7
-
4:7 3:2 7
-
7
-
5:5 4:2 7
-
X = 6
7
-
7
-
5:0 3:0 7
-
7
-
4:9 3:1 7
-
7
-
7
-
6:7 3:1 7
-
7
-
5:1 3:8 7
-
-
5
-
6:0 3:0
-
[10pt] What’s the center of the rst cluster (red) after one iteration? (Answer in the format of [x1; x2], round results to three decimal places, same as part (2) and (3) )
-
[5pt] What’s the center of the second cluster (green) after two iteration?
-
[5pt] What’s the center of the third cluster (blue) when the clustering converges?
-
[5pt] How many iterations are required for the clusters to converge?
Problem 2 (15pt): [Latent variable model and GMM] Consider the discrete latent variable model where the latent variable z use 1-of-K representation. The distribution for latent variable z is de ned as:
p(zk = 1) = k
where f kg satisfy: 0 k 1 and PK k = 1. Suppose the conditional distribution of
k=1
observation x given particular value for z is Gaussian:
p(xjzk = 1) = N (xj k; k)
-
[5pt] Write down the compact form of p(z) and p(xjz).
-
[5pt] Show that the marginal distribution p(x) has the following form:
K
X
p(x) = kN (xj k; k)
k=1
-
[5pt] If we want to nd the MLE solution for parameters k; k; k in such model, what algorithm should we use? Brie y describe its major di erence compared to K-means algorithm.
Problem 3 (20pt): [Bayesian Network] Suppose we are given 5 random variables, A1; A2; B1; B2; B3. A1 and A2 are marginally independent and B1; B2; B3 are marginally dependent on A1; A2 as fol-
2
lows: B1 depends on A2, B2 depends on A2 and A1. B3 depends on A1. All 5 random variables are binary, i.e., Ai; Bj 2 f0; 1g; i = 1; 2; j = 1; 2; 3.
-
[5pt] Draw the corresponding bayesian network.
-
[5pt] Based on the bayesian network in (1), write down the joint distribution p(A1; A2; B1; B2; B3).
-
[5pt] How many independent parameters are needed to fully specify the joint distribution
in (2).
-
[5pt] Suppose we do not have any independence assumption, write down one possible fac-torization of p(A1; A2; B1; B2; B3), and state how many independent parameters are required to describe joint distribution in this case.
Problem 4 (40 pt) [Neural Networks]
Build a neural network with one hidden layer to predict class labels for Iris plant dataset (https:
//archive.ics.uci.edu/ml/datasets/iris).
-
[30pt] Properly split the data to training and testing set, and report the training, testing accuracy. You can use sigmoid activation function and select any reasonable size of hidden units. Note that for this part, you need to implement the forward/backward function yourself without using the deep learning package. However, you can use the deep learning package, e.g., tensor ow, pytorch, matconvnet etc, to compare with your own results.
-
[10pt] Try di erent design of the neural network, compare with part (1), and report ndings. This is an open-ended question, you can change the previous model in several ways, e.g., (1) change the activation function to be tanh, ReLU etc, or (2) try to build more complex neural network by introducing more layers, or many other options. Note that for this part, you are allowed to use deep learning packages.
3