Description
All questions have multiplechoice answers ([a], [b], [c], …). You can collaborate with others, but do not discuss the selected or excluded choices in the answers. You can consult books and notes, but not other people’s solutions. Your solutions should be based on your own work. De nitions and notation follow the lectures.
Note about the homework
The goal of the homework is to facilitate a deeper understanding of the course material. The questions are not designed to be puzzles with catchy answers. They are meant to make you roll up your sleeves, face uncertainties, and approach the problem from di erent angles.
The problems range from easy to di cult, and from practical to theoretical. Some problems require running a full experiment to arrive at the answer.
The answer may not be obvious or numerically close to one of the choices, but one (and only one) choice will be correct if you follow the instructions precisely in each problem. You are encouraged to explore the problem further by experimenting with variations on these instructions, for the learning bene t.
You are also encouraged to take part in the forum http://book.caltech.edu/bookforum
where there are many threads about each homework set. We hope that you will contribute to the discussion as well. Please follow the forum guidelines for posting answers (see the \BEFORE posting answers” announcement at the top there).
c 20122015 Yaser AbuMostafa. All rights reserved. No redistribution in any format. No translation or derivative products without written permission.
1
Generalization Error
In Problems 13, we look at generalization bounds numerically. For N > d_{vc}, use the simple approximate bound N^{d}vc for the growth function m_{H}(N).

For an H with d_{vc} = 10, if you want 95% con dence that your generalization error is at most 0.05, what is the closest numerical approximation of the sample size that the VC generalization bound predicts?


400,000



420,000



440,000



460,000



480,000


There are a number of bounds on the generalization error , all holding with probability at least 1 . Fix d_{vc} = 50 and = 0:05 and plot these bounds as a function of N. Which bound is the smallest for very large N, say N = 10; 000? Note that [c] and [d] are implicit bounds in .

8
4m
(2N)
[a]
Original VC bound:^{q}
ln
H
N
[b]
Rademacher Penalty Bound:^{q}
2 ln(2Nm
H
(N))
+ ^{q}
2
ln
1
+
1
N
N
N
^{q}
N
2
[c]
Parrondo and Van den Broek:
1
(2 + ln
6m_{H}(2N)
)
1
4m
(N )
[d]
Devroye:^{q}
(4 (1 + ) + ln
H
)
2N


They are all equal.


For the same values of d_{vc} and of Problem 2, but for small N, say N = 5, which bound is the smallest?

8
4m
(2N)
[a]
Original VC bound:^{q}
ln
H
N
[b]
Rademacher Penalty Bound:^{q}
2 ln(2Nm
H
(N))
_{+} q
2
ln
1
+
1
N
N
N
q
N
2
[c]
Parrondo and Van den Broek:
1
(2 + ln
6m_{H}(2N)
)
1
4m
(N )
[d]
Devroye:^{q}
(4 (1 + ) + ln
H
)
2N

They are all equal.
2
Bias and Variance
Consider the case where the target function f : [ 1; 1] ! R is given by f(x) = sin( x) and the input probability distribution is uniform on [ 1; 1]. Assume that the training set has only two examples (picked independently), and that the learning algorithm produces the hypothesis that minimizes the mean squared error on the examples.

Assume the learning model consists of all hypotheses of the form h(x) = ax. What is the expected value, g(x), of the hypothesis produced by the learning algorithm (expected value with respect to the data set)? Express your g(x) as ax^, and round a^ to two decimal digits only, then match exactly to one of the following answers.


g(x) = 0



g(x) = 0:79x



g(x) = 1:07x



g(x) = 1:58x



None of the above


What is the closest value to the bias in this case?


0.1



0.3



0.5



0.7



1.0


What is the closest value to the variance in this case?


0.2



0.4



0.6



0.8



1.0


Now, let’s change H. Which of the following learning models has the least expected value of outofsample error?


Hypotheses of the form h(x) = b



Hypotheses of the form h(x) = ax

3



Hypotheses of the form h(x) = ax + b





Hypotheses of the form h(x) = ax^{2}





Hypotheses of the form h(x) = ax^{2} + b


VC Dimension


Assume q 1 is an integer and let m_{H}(1) = 2. What is the VC dimension of a

hypothesis set whose growth function satis es: m_{H}(N + 1) = 2m_{H}(N) ^{N}_{q} ? Recall that ^{M}_{m} = 0 when m > M.
[a] q 2
[b] q 1
[c] q
[d] q + 1
[e] None of the above
9. For hypothesis sets H_{1}; H_{2}; :::; H_{K} with nite, positive VC dimensions d_{vc}(H_{k}), some of the following bounds are correct and some are not. Which among the correct ones is the tightest bound (the smallest range of values) on the VC dimension of the intersection of the sets: d_{vc}(^{T}^{K}_{k=1}H_{k})? (The VC dimension of an empty set or a singleton set is taken as zero)
[a] 0 d_{vc}(^{T}^{K} H_{k}) ^{P}^{K} d_{vc}(H_{k})
k=1 k=1
[b] 0 d_{vc}(^{T}^{K}_{k=1}H_{k}) minfd_{vc}(H_{k})g^{K}_{k=1}
[c] 0 d_{vc}(^{T}^{K}_{k=1}H_{k}) maxfd_{vc}(H_{k})g^{K}_{k=1}
[d] minfd_{vc}(H_{k})g^{K}_{k=1} d_{vc}(^{T}^{K}_{k=1}H_{k}) maxfd_{vc}(H_{k})g^{K}_{k=1}
[e] minfd_{vc}(H_{k})g^{K}_{k=1} d_{vc}(^{T}^{K}_{k=1}H_{k}) ^{P}^{K}_{k=1} d_{vc}(H_{k})
10. For hypothesis sets H_{1}; H_{2}; :::; H_{K} with nite, positive VC dimensions d_{vc}(H_{k}), some of the following bounds are correct and some are not. Which among the correct ones is the tightest bound (the smallest range of values) on the VC dimension of the union of the sets: d_{vc}(^{SK}_{k=1}H_{k})?

[a] 0
d_{vc}(
S
K
H
k^{)}
^{P}
^{K} d_{vc}(
k^{)}
K
k=1
H_{K}
k=1
[b] 0
d_{vc}(^{S}_{k=1}H_{k})
K
1 + ^{P}_{k=1} d_{vc}(H_{k})
4

[c] minfd_{vc}(H_{k})g_{k}^{K}_{=1} d_{vc}(^{S}_{k}^{K}_{=1}H_{k})
^{P}_{k}^{K}_{=1} d_{vc}(H_{k})
f H
g_{K}
S
K
H
K
H_{K}
d_{vc}(
K
^{P}_{k=1}
d_{vc}(
[d] max d_{vc}( _{k})
k^{K}=1
k=1
k^{)}
k^{)}
[e] maxfd_{vc}(H_{k})g_{k=1}
d_{vc}(
^{S}_{k=1}H_{k})
K 1 + ^{P}_{k=1} d_{vc}(H_{k})
5