Description

This problem is to be done by hand. In the parts below, you will set up the primal Lagrangian and derive the dual Lagrangian, for support vector machine classifier, for the case of data that is linearly separable in expanded feature space.
For the SVM learning, we stated the optimization problem as:
Minimize J (w) = ^{1}_{2 }w ^{2}
s.t. z_{i} (w^{T} u_{i} + w_{0} )−1 ≥ 0 ∀i
Assume our data is linearly separable in the expanded feature space. Also, nonaugmented notation is used throughout this problem.

If the above set of constraints (second line of equations above) is satisfied, will all the training data be correctly classified?

Write the Lagrangian function L(w, w_{0} , λ ) for the minimization problem stated above. Use
λ_{i} , i = 1, 2, !, N for the Lagrange multipliers. Also state the KKT conditions. (Hint:
there are 3 KKT conditions).

Derive the dual Lagrangian L_{D} , by proceeding as follows:


Minimize L w.r.t. the weights.

Hint: solve ∇_{w} L = 0 for the optimal weight w * (in terms of λ_{i} and other variables);
and set ^{∂}^{L}^{ =} 0 and simplify.
∂w_{0}

Substitute your expressions from part (i) into L , and use your expression from

^{L} = 0 as a new constraint, to derive L_{D} as:
∂w_{0}





1
N N
T _{u}
N
L
λ
= −
λ
λ
z
z
u
+
λ
D (
)
2
∑∑
i
j
i
j
i
j
∑
i
i=1 j=1
i=1




N
subject to the (new) constraint: ∑λ_{i} z_{i} = 0 , which becomes a new KKT condition.
i=1
Also give the other two KKT conditions on λ_{i} , which carry over from the primal form.

In this problem you will do SVM learning on a small set of given data points, using the result from Problem 1 above. Problem 2 also uses nonaugmented notation.
Coding. This problem involves some work by hand, and some solution by coding. As in previous homework assignments, in this problem you are required to write the code yourself; you may use Python builtin functions, NumPy, and matplotlib; and you may use pandas only for reading and/or writing csv files. For the plotting, we are not supplying plotting function(s) for you; may use this as an opportunity to (learn, explore, and) use matplotlib functions as needed. Alternatively, you may do the plots by hand if you prefer.
Throughout this problem, let the dataset have = 3 points. We will work entirely in the expanded feature space ( space).


Derive by hand a set of equations, with λ, as variables, and _{!}, _{!}, = 1,2,3 as given constants, that when solved will give the SVM decision boundary and regions. To do this,


start the
Lagrangian process to optimize
^{L}D
w.r.t.
λ
and , subject to the equality
N
constraint ∑λ_{i} z_{i}
= 0 . Set all the derivatives equal to 0 to obtain the set of equations.
i=1
N
1
N N
T
N
Hint: Start from
L_{D}^{′} (
λ
, µ) = ∑λ_{i} −
∑∑λiλ j ^{z}i^{z} j
u
i
u
j
+ µ
∑^{z}iλi
, in which the
2
i=1
i=1
i=1 j=1
last term has been added to incorporate the equality constraint stated above.
Finally, formulate your set of equations as a matrixvector equation of the form:
A ρ = b
in which ρ = 0λ^{“}, µ2^{“}. Give your expressions for A and b.
Tip: If you’d like a quick check to see if you’re on the right track, try plugging in (by hand)
for this simple dataset:


= 3_{0}5,
$
=3_{0}5∈
=
= 1
%
= 3_{1}5 ∈ ( = −1).
#
1
0
# ^{(} #
$
);
1
$ %

The first row of should be [1 0 − 1 − 1] and first entry of should be 1.
In the parts below you will use a computer to solve this matrix equation for λ and µ, calculate then plot and interpret the results, for 3 different datasets.
For parts (b)(e) below, use the following dataset:
= 3^{1}5, = 3^{2}5 ∈
_{#} _{2} _{$} _{1} _{#}
^{0}
_{%}=3_{0}5 ∈ _{$}
For all parts below, consider _{#} to be the positive class ( _{!} = +1).

Write code in Python to solve for λ and µ, calculate ^{∗} and _{‘} , and check the KKT conditions.
Tip: Use variables _{!}, _{!}, i = 1,2,3 such that you can input their values for different datasets.
Specifically, the code should:


Use NumPy to invert your matrix, and to calculate the resulting values for λ and µ.

Check that the resulting λ satisfies the KKT conditions involving λ (but not involving ) that you stated in Problem 1(c)(ii).



Calculate the optimal (nonaugmented) weight vector ^{∗} by using your result from Problem 1(c)(i). And, find the optimal bias term _{‘} using one of the KKT conditions from Problem 1(b).



Check that the resulting and _{‘} satisfy the KKT conditions on and _{‘} of Pr. 1(c).



Output the results of (i)(iv) above.


Run your code on the given dataset; give the resulting values for λ and µ, and the results of the KKTconditions check on λ .
Also give the resulting ^{∗} and _{‘}, and the results of the KKTconditions check onand
_{‘}.

Plot in 2D nonaugmented feature ( ) space: the data points showing their class labels, the decision boundary defined by ^{∗} and _{‘}, and an arrow showing which side of the boundary



3 of 4


is class _{#}. While you are encouraged to do (some or all of) the plot by computer, you can do some or all of it by hand if you prefer, as long as it is neat and reasonably accurate.

Looking at the plot, does the decision boundary correctly classify the training data?
And if yes, does it look like the maximummargin boundary that will correctly classify the data? Briefly justify.
(f)(i)(iii) Repeat parts (c) – (e) except for the following dataset:
= 3^{1}5, = 3^{2}5 ∈
_{#} _{2} _{$} _{1} _{#}
^{1}
_{%}′=3_{1}5 ∈ _{$}
in which the third data point has been changed.


Explain the change, or lack of change, in the decision boundary from (d) to (f).


(i) How do you think the boundary will change (relative to (f)) if we instead use the following data:
= 3^{1}5, = 3^{2}5 ∈
_{#} _{2} _{$} _{1} _{#}
^{0}
%^{′′ = 3}_{1.5}^{5 ∈} $
in which the third data point has (yet again) been changed? (ii)(iv) Try it by repeating (c)(e) except with these data points^{1}.

Explain any change or lack of change in the decision boundary as compared with (d) and (f).

Tip: Note that the linear algebra approach we take to solve this problem, has no way of enforcing the requirement _{!} ≥ 0 ∀ . After you check this requirement, if any data point _{(} has _{(} < 0 , then you can set _{(} = 0 as a given constant, and then reoptimize the other Lagrange multipliers. Then check the KKT conditions again to verify they are satisfied.
Why? With no nonnegativity condition on , the optimal solution effectively finds a point that is on the boundary of all inequality constraint regions (as in case (b) of Lecture 12 p. 10). If one of the constraints (say on data point _{(}) is already satisfied (as in case (a) on p. 9), it proceeds as in case (b) to find a point on the boundary of the constraint region, which results in a negative Lagrange multiplier _{(} < 0. Resetting it to 0, then reoptimizing the other parameters, is a way of enforcing the _{(} ≥ 0 requirement.
Comment: this method of implementation (Pr. 2 using matrix inverse) is useful for working with algebraic expressions, theory, and seeing/verifying how things work. For larger datasets, typically other implementation methods are used such as quadratic programming or variations on gradient descent designed for this type of optimization.
p. 4 of 4