Description

You are allowed to discuss the homework problems with your classmates, but you are supposed to do your assignment individually.

You cannot use an existing machine learning / neural networking / etc. library.

You need to turn in the computer codes also.

(30 pts) Design a twolayer neural network with the signum activation function (i.e. sgn(x) = 1 if x > 0, sgn(x) = 1 if x < 0, and sgn(0) = 0) such that the network implements the logic gate
f(x_{1}; x_{2}; x_{3}) = x_{1}x_{2}x_{3} + x_{1}x_{2}. Assume that the input of 1 is used to represent a FALSE, and an input of 1 is used to represent a TRUE. Show your work and draw the final network. Note that in class, we have discussed examples where we have instead used the step activation function and a 0 for FALSE.
2. (30 pts) Consider the network in Fig. 1. In the x y plane, sketch the region where z = 1. Show your work. Make sure you correctly indicate which part of the boundaries belong to the region z = 1. Recall that u(x) = 1 if x 0 and u(x) = 0 if x < 0.

1
1
+
u(.)
1
x
1
1
1.5
1
+
u(.)
1
z
1
+
u(.)
y
1
1
+
u(.)
Figure 1: The neural network for Problem 2.

(40 pts) Write a computer program that runs the perceptron training algorithm with the step activation function u( ). You have to include the source files of your computer program together with the solution of the problem. Implement the following steps.
1

Do not be scared at the fact that the question has a million steps. Most of the steps are very simple and they are just there to make your life easier.

(b) Pick (your code should pick it) w_{0} uniformly at random on [
1
;
1
].
4
4

Pick w_{1} uniformly at random on [ 1; 1].

Pick w_{2} uniformly at random on [ 1; 1].

Write in your report the numbers [w_{0}; w_{1}; w_{2}] you have picked.

Pick n = 100 vectors x_{1}; : : : ; x_{n} independently and uniformly at random on [ 1; 1]^{2}, call the collection of these vectors S.

Let S_{1} S denote the collection of all x = [x_{1} x_{2}] 2 S satisfying [1 x_{1} x_{2}][w_{0} w_{1}; w_{2}]^{T} 0.

Let S_{0} S denote the collection of all x = [x_{1} x_{2}] 2 S satisfying [1 x_{1} x_{2}][w_{0} w_{1}; w_{2}]^{T} < 0.

In one plot, show the line w_{0} + w_{1}x_{1} + w_{2}x_{2} = 0, with x_{1} being the “xaxis” and x_{2} being the “yaxis.” In the same plot, show all the points in S_{1} and all the points in S_{0}. Use different symbols for S_{0} and S_{1}. Indicate which points belong to which class. An example figure may be as shown in Fig. 2 (My labels look bad, I expect you to do a better job!).
Figure 2: An example figure for Problem 3i.

Use the perceptron training algorithm to find the weights that can separate the two classes S_{0} and S_{1} (Obviously you already know such weights, they are w_{0}; w_{1} and w_{2} above, but we will find the weights from scratch, and the training sets S_{0} and S_{1}). In detail,
i. Use the training parameter = 1.
ii. Pick w_{0}^{′} ; w_{1}^{′} ; w_{2}^{′} independently and uniformly at random on [ 1; 1]. Write them in your report.

Record the number of misclassifications if we use the weights [w_{0}^{′} ; w_{1}^{′} ; w_{2}^{′} ].

After one epoch of the perceptron training algorithm, you will find a new set of weights
[w_{0}^{′′} ; w_{1}^{′′} ; w_{2}^{′′} ].

Record the number of misclassifications if we use the weights [w_{0}^{′′} ; w_{1}^{′′} ; w_{2}^{′′} ].

Do another epoch of the perceptron training algorithm, find a new set of weights, record the number of misclassifications, and so on, until convergence.
2

Regarding the previous step, draw a graph that shows the epoch number vs the number of misclassifications.

Repeat the same experiment with = 10. Do not change w_{0}; w_{1}; w_{2}; S; w_{0}^{′} ; w_{1}^{′} ; w_{2}^{′} . As in the case = 1, draw a graph that shows the epoch number vs the number of misclassifications.

Repeat the same experiment with = 0:1. Do not change w_{0}; w_{1}; w_{2}; S; w_{0}^{′} ; w_{1}^{′} ; w_{2}^{′} . As in the case = 1, draw a graph that shows the epoch number vs the number of misclassifications.

Comment on how the changes in effect the number of epochs needed until convergence.

Comment on whether we would get the exact same results (in terms of the effects of on training performance) if we had started with different w_{0}; w_{1}; w_{2}; S; w_{0}^{′} ; w_{1}^{′} ; w_{2}^{′} .

Do the same experiments with n = 1000 samples. Comment on the differences compared to n = 100.
3