Description
Probability and Statistics
A.1 [2 points] (Bayes Rule, from Murphy exercise 2.4.) After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate (i.e., the probability of testing positive given that you have the disease is 0.99, as is the probability of testing negative given that you dont have the disease). The good news is that this is a rare disease, striking only one in 10,000 people. What are the chances that you actually have the disease? (Show your calculations as well as giving the nal result.)
A.2 For any two random variables X; Y the covariance is de ned as Cov(X; Y ) = E[(X E[X])(Y E[Y ])].
You may assume X and Y take on a discrete values if you nd that is easier to work with.
a. 
[1 
points] 
If E[Y jX = x] = x show that Cov(X; Y ) = E[(X E[X])^{2}]. 
b. 
[1 
points] 
If X; Y are independent show that Cov(X; Y ) = 0. 
A.3 Let X and Y be independent random variables with PDFs given by f and g, respectively. Let h be the PDF of the random variable Z = X + Y .
a. [2 points] Show that h(z) = 
^{1} f(x)g(z x)dx. (If you are more comfortable with discrete probabilities, 

you can instead derive an 
analogous expression for the discrete case, and then you should give a one 

R 
sentence explanation as to why your expression is analogous to the continuous case.).

[1 points] If X and Y are both independent and uniformly distributed on [0; 1] (i.e. f(x) = g(x) = 1 for x 2 [0; 1] and 0 otherwise) what is h, the PDF of Z = X + Y ?
A.4 [1 points] A random variable X N ( ; ^{2}) is Gaussian distributed with mean and variance ^{2}. Given that for any a; b 2 R, we have that Y = aX + b is also Gaussian, nd a; b such that Y N (0; 1).
A.5 [2 points] For a random variable Z, its mean and variance are de ned as E[Z] and E[(Z E[Z])^{2}], respectively. Let X_{1}; : : : ; X_{n} be independent and identically distributed random variables, each with mean and
1 
P 
n 

variance ^{2}. If we de ne _{n} = 
_{i=1} X_{i}, what is the mean and variance of ^{p}n( _{n} 
)? 

n 
x 
f(y)dy. For any 

F (x) = 

A.6 If f(x) is a PDF, the^{b}cumulative distribution function (CDF) is de ned as ^{b} 
value of g(X) is de ned 

function g : R ! R and random variable X with PDF f(x), recall that the expected 

R 
_{n}^{X}1^{; : : : ; X}n ^{be} 

1fx ag is 1 
^{R}whenever x a and 0 whenever x > a. Note that F (x) = E[1fX xg]. Let 

as E[g(X)] = 
^{1} g(y)f(y)dy. For a boolean event A, de ne 1fAg as 1 if A is true, and 0 otherwise. Thus, 

independent and identically distributed random variables with CDF F (x). De ne F_{n}(x) = 
n^{1} 
_{i=1} ^{1} ^{X}i ^{xg.} 

b 
^{P}to the^{f} previous 

Note, for every x, that F_{n}(x) is an empirical estimate 
of F (x). 
You may use your answers 

problem. 
b 

a. [1 points] 
For any x, what is E[F_{n}(x)]? 

b. [1 points] 
For any 
variance 
of 
E[(F_{n}(x) 
2 

x, the 
F_{n}(x) is 
F (x)) ]. Show 
that 
Variance(F_{n}(x)) = 

b 

F (x)(1 F (x)) 
. 
b 
b 
b 

n 
1
Using your answer to b, show that for all x 2 R, we have E[(F_{n}(x) F (x))^{2}] 
1 
. 

4n 

[1 points] 
Let X_{1}; : : : ; X_{n} be n independent and identically distributed 
random variables drawn un romly 

B.1 
b 
at random from [0; 1]. If Y = maxfX_{1}; : : : ; X_{n}g then nd E[Y ].
Linear Algebra and Vector Calculus
2 
1 
2 
1 
3 
2 
1 
2 
3 
^{3}. For each matrix A and B, 

A.7 (Rank) Let A = 
1 
0 
3 
and B = 
1 
0 
1 

4 
1 
1 
2 
5 
4 
1 
1 
2 
5 
a. [2 points] what is its rank?
b. [2 points] what is a (minimal size) basis for its column span?
2 
0 
2 
4 
3 
T 
T 

A.8 (Linear equations) Let A = 4 
2 
4 
2 
2 
2 4 
1 1 1 

3 
3 
1 
5_{,} _{b} _{=} 
, and c = 
. 
a. [1 points] What is Ac?
b. [2 points] What is the solution to the linear system Ax = b? (Show your work).
A.9 (Hyperplanes) Assume w is an ndimensional vector and b is a scalar. A hyperplane in R^{n} is the set fx : x 2 R^{n}; s.t. w^{T} x + b = 0g.

[1 points] (n = 2 example) Draw the hyperplane for w = [ 1; 2]^{T} , b = 2? Label your axes.

[1 points] (n = 3 example) Draw the hyperplane for w = [1; 1; 1]^{T} , b = 0? Label your axes.

[2 points] Given some x_{0} 2 R^{n}, nd the squared distance to the hyperplane de ned by w^{T} x + b = 0. In other words, solve the following optimization problem:
min x 
x 
k 
2 

x 
k 
0 

s.t. w^{T} x + b = 0 

w^{T} (x_{0} 
x_{0}) 

(Hint: if x_{0} is the minimizer of the above problem, note that kx_{0} 
x_{0}k = j 
kwk 
e 
j. What is w^{T} x_{0}?) 

A.10 For 
e 
R 
^{n} and c 
R 
e 
z 
e 

possibly nonsymmetric A; B 
2 
n 
2 
, let f(x; y) = x^{T} Ax + y^{T} Bx + c. De ne 
r 
f(x; y) = 

h 
@f(x;y) 
@f(x;y) 
(x;y) 
i 
T 

: : : 
@f 
. 

@z_{1} 
@z_{2} 
@z_{n} 

[2 points] Explicitly write out the function f(x; y) in terms of the components A_{i;j} and B_{i;j} using appropriate summations over the indices.

[2 points] What is r_{x}f(x; y) in terms of the summations over indices and vector notation?

[2 points] What is r_{y}f(x; y) in terms of the summations over indices and vector notation?
B.2 [1 points] The trace of a matrix is the sum of the diagonal entries; T r(A) = ^{P}_{i} A_{ii}. If A 2 R^{n m} and

2 R^{m n}, show that T r(AB) = T r(BA).
B.3 [1 points] Let v_{1}; : : : ; v_{n} be a set of nonzero vectors in R^{d}. Let V = [v_{1}; : : : ; v_{n}] be the vectors concatenated.
P
a. What is the minimum and maximum rank of n v vT ?
i=1 i i
b. What is the minimum and maximum rank of V ?
c. Let A 2 R^{D d} for D > d. What is the minimum and maximum rank of ^{P}^{n} (Avi)(Avi)^{T} ?
i=1

What is the minimum and maximum rank of AV ? What if V is rank d? 2
A.11 For the A; b; c as de ned in Problem 8, use NumPy to compute (take a screen shot of your answer):
2 
0:125 
0:625 
0:75 
3 

a. 
[2 points] 
What is A 
1 
_{?} 4 
0:375 
0:375 
0:25 
5 
0:25 
0:75 
0:5 
2 32 3

0:25
b. [1 points] What is A ^{1}b? What is Ac? 4 1 5, 4 0 5

0:25
A.12 [4 points] Two random variables X and Y have equal distributions if their CDFs, F_{X} and F_{Y} , respectively, are equal, i.e. for all x, jF_{X} (x) F_{Y} (x)j = 0. The central limit theorem says that the sum of k independent, zeromean, variance1=k random variables converges to a (standard) Normal distribution as k goes o to in nity. We will study this phenomenon empirically (you will use the Python packages Numpy and Matplotlib). De ne
(k) 
1 
k 

Y 
= 
p 
_{i=1} ^{B}i 
where each B_{i} is equal to 1 and 1 with equal probability. From your solution to problem 

k 

that 
1 
B 
is zeromean and has variance 1=k. 

5, we know^{P} 
p 
i 

k 

a. For i = 1; : : : ; n 
let Z 
_{i} N 
(0; 1). If F (x) is the true CDF from which each Z 
i 
is drawn (i.e., Gaussian) 

n 

for all x 
1 
^{P}[(F_{n}(x) 
F (x))^{2}] 0:0025, and plot F_{n}(x) from 
3 to 3. 

, 

and F_{n} 
(x) = 
n 
i=1 
1fZ_{i} x), use the answer to problem 1.5 above to choose n large enough such that, 

b 
2 R 

E 
to generate the random variables, and 

(Hint: use 
q 
import matplotlib.pyplot 

Z=numpy.random.randn(n) 
b 

as plt; 
b 
plt.step(sorted(Z), np.arange(1,n+1)/float(n)) to plot).

For each k 2 f1; 8; 64; 512g generate n independent copies Y ^{(k)} and plot their empirical CDF on the same plot as part a.
(Hint: np.sum(np.sign(np.random.randn(n, k))*np.sqrt(1./k), axis=1) generates n of the Y ^{(k)} random variables.)
Be sure to always label your axes. Your plot should look something like the following (Tip: checkout seaborn for instantly better looking plots.)
full.png
3