SML_Assignment1

SML_Assignment1
Task 1: Machine Learning Introduction

1a) Model Fitting

we can use linear model (classification)

The right classification is

striped circle triangle

10 1

black circle triangle

11 1

but we get

striped circle triangle

11 1

black circle triangle

10 1

Because linear model (classification) cannot precisely classify the training set.

Task 2: Linear Algebra Refresher

2a) Matrix Properties

1.Disprove of commutative properties of matrix
Reference: https://proofwiki.org/wiki/Matrix_Multiplication_is_not_Commutative
For an order n square matrix D, let D′ be the square matrix of order n+1 defined as:
(d_{ij}' = egin{cases} d_{ij}: & i<n+1 land j<n+1 \ 0: & i=n+1 lor j=n+1 end{cases})
Thus D′ is just D with a zero row and zero column added at the ends.
We have that D is a submatrix of D′.
Now:
((a'b')_{ij} = egin{cases} sum_{r=1}^{n+1} a'_{ir}b'_{rj}, & i<n+1 land j<n+1 \ 0, & i=n+1 lor j=n+1 end{cases})
But:
(sum_{r=1}^{n+1} a'_{ir}b'_{rj}=a'_{i(n+1)}b'_{(n+1)i}+sum_{r=1}^n a'_{ir}b'_{rj} \ qquadqquadquad=sum_{r=1}^n a_{ir}b_{rj})
and so:
(mathbf{A'B'}(n+1,n+1)=(mathbf{AB})'(n+1,n+1) \ qquadqquadqquadqquad=mathbf{AB} \ qquadqquadqquadqquad eq mathbf{BA} \ qquadqquadqquadqquad= (mathbf{BA})'(n+1,n+1) \ qquadqquadqquadqquad= mathbf{B'A'}(n+1,n+1) )
Thus it is seen that:
(exists mathbf{A'},mathbf{B'}in mathcal{M_{n+1 imes n+1}}:mathbf{A'B'} eqmathbf{B'A'})
So (P(k)Rightarrow P(k+1)) and the result follows by the Principle of Mathematical Induction
Therefore:
(exists mathbf{A},mathbf{B}inmathcal{M_R(n)}:mathbf{AB} eqmathbf{BA})
2.Prove of distributive properties of matrix
Reference:https://proofwiki.org/wiki/Matrix_Multiplication_Distributes_over_Matrix_Addition
Let (mathbf{A}=[a]_{nn},mathbf{B}=[b]_{nn},mathbf{C}=[c]_{nn}) be matrices over a ring ((mathit{R},+,circ))
Consider (mathbf{A}(mathbf{B}+mathbf{Cd}))
Let (mathbf{R}=[r]_{nn}=mathbf{B}+mathbf{C},mathbf{S}=[r]_{nn}=mathbf{A}(mathbf{B}+mathbf{C}))
Let (mathbf{G}=[g]_{nn}=mathbf{A}mathbf{B},mathbf{H}=[h]_{nn}=mathbf{A}mathbf{C})
Then:
(s_{ij}=sum_{k=1}^n a_{ik}circ r_{kj} \ r_{kj}=b_{kj}+c_{kj} \ Rightarrow s_{ij}=sum_{k=1}^n a_{ik}circ(b_{kj}+c_{kj}) \ qquad =sum_{k=1}^n a_{ik}b_{kj}+sum_{k=1}^n a_{ik}c_{kj} \ qquad =g_{ij}+h_{ij})
Thus:
(mathbf{A}(mathbf{B}+mathbf{C})=(mathbf{A}mathbf{B})+(mathbf{A}mathbf{C}))
A similar construction shows that:
((mathbf{B}+mathbf{C})mathbf{A}=(mathbf{B}mathbf{A})+(mathbf{C}mathbf{A}))
3.Prove of associative properties of matrix
Reference:https://proofwiki.org/wiki/Matrix_Multiplication_is_Associative
Let (mathbf{A}=[a]_{nn},mathbf{B}=[b]_{nn},mathbf{C}=[c]_{nn}) be matrices
From inspection of the subscripts, we can see that both ((mathbf{A}mathbf{B})mathbf{C}) and (mathbf{A}(mathbf{B}mathbf{C})) are defined
Consider((mathbf{A}mathbf{B})mathbf{C})
Let (mathbf{R}=[r]_{nn}=mathbf{A}mathbf{B},mathbf{S}=[s]_{nn}=mathbf{A}(mathbf{B}mathbf{C}))
Then:
(s_{ij}=sum_{k=1}^n r_{ik}circ c_{kj} \ r_{ik}=sum_{l=1}^n a_{il}circ b_{lk} \ Rightarrow s_{ij}=sum_{k=1}^n (sum_{l=1}^n a_{il}circ b_{lk})circ c_{kj} \ qquad=sum_{k=1}^nsum_{l=1}^n (a_{il}circ b_{lk})circ c_{kj})
Now consider (mathbf{A}(mathbf{B}mathbf{C}))
Let (mathbf{R} = [r]_{nn} = mathbf{B}mathbf{C},mathbf{S}=[s]_{nn}=mathbf{A}(mathbf{B}mathbf{C}))
Then:
(s_{ij}=sum_{l=1}^n a_{il}circ r_{lj} \ r_{lj}=sum_{k=1}^n b_{lk}circ c_{kj} \ Rightarrow s_{ij}=sum_{l=1}^n a_{il}(sum_{k=1}^n circ b_{lk})circ c_{kj} \ qquad=sum_{l=1}^nsum_{k=1}^n a_{il}circ(b_{lk}circ c_{kj}))
Using Ring Axiom M1: Associativity of Product:
(s_{ij} = sum_{k=1}^nsum_{l=1}^n (a_{il}circ b_{lk})circ c_{kj} = sum_{l=1}^nsum_{k=1}^n a_{il}circ(b_{lk}circ c_{kj})=s'_{ij})
It is concluded that:
((mathbf{A}mathbf{B})mathbf{C}=mathbf{A}(mathbf{B}mathbf{C}))

2b) Matrix Inversion (7 Points)

(A^{-1}=frac{1}{|A|}A^* \ A^*=egin{bmatrix} A_{11} & A_{21} & A_{31} \ A_{12} & A_{22} & A_{32} \ A_{13} & A_{23} & A_{33} end{bmatrix})
(A_{xx}) is algebraic complement. After calculating:
(A^*=egin{bmatrix} c & -a & ad-bc \ -1 & 1 & b-d \ 0 & 0 & c-a end{bmatrix})
(|A|=c+0+0-(0+0+a)=c-a)
only when (c-a eq 0), A is invertible, b can be any value.
so
(A^{-1}=frac{1}{c-a}egin{bmatrix} c & -a & ad-bc \ -1 & 1 & b-d \ 0 & 0 & c-a end{bmatrix})
If we change the matrix to
(A = egin{bmatrix} 2 & 2 & 3 \ 0 & 1 & 0 \ 8 & 3 & 12 end{bmatrix})
(|A|=2 imes1 imes2+0+0-3 imes1 imes8-0-0 = 0)
When (|A|=0), it is not invertible.

2c) Matrix Pseudoinverse

Left-Pseudo Inverse:
(color{red}{mathbf{J'}}mathbf{J}=color{red}{mathbf{(J^T J)^{-1}J^T}}mathbf{J}=mathbf{I_m})
Works if J has full column rank
Right-Pseudo Inverse:
(mathbf{J}color{red}{mathbf{J'}}=mathbf{J}color{red}{mathbf{J^T(J J^T )^{-1}}}=mathbf{I_n})
Works if J has full row rank
Given (A in mathbb{R}^{2 imes3})
First calculate dimensionality of left-pseudo
(color{red}{mathbf{(J^T J)^{-1}J^T}}=(mathbb{R}^{3 imes2}mathbb{R}^{2 imes3})^{-1}mathbb{R}^{3 imes2} \ qquadqquadquad=mathbb{R}^{3 imes2})
Second calculate dimensionality of right-pseudo
(color{red}{mathbf{J^T(J J^T )^{-1}}} = mathbb{R}^{3 imes2}(mathbb{R}^{2 imes3}mathbb{R}^{3 imes2})^{-1} \ qquadqquadquad=mathbb{R}^{3 imes2})
So left and right pseudo invert exist

2d) Basis Transformation

(1) (T = wv^{-1}\ quad=egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix} egin{bmatrix} 1 & 0 \ 0 & 1 end{bmatrix}^{-1} \ quad=egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix})
(2) (v = Yw \ Y = egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix} \ w = Y^{-1}v \ =egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix}^{-1} egin{bmatrix} 2 \ 5 end{bmatrix} =egin{bmatrix} -4 & 3 \ 3 & -2 end{bmatrix} egin{bmatrix} 2 \ 5 end{bmatrix} =egin{bmatrix} 7 \ -4 end{bmatrix})

Task 3: Statistics Refresher

3a) Expectation and Variance

(1)
Expectation:
(mathbb{E}_{xsim p(x)} = mathbb{E}_{x}[f] = mathbb{E}[f] = egin{cases} sum_{x} p(x)f(x), & ext{discrete case} \ int p(x)f(x)\,{ m d}x & ext{continuous case} end{cases} )
Variance:
(var[x] = mathbb{E}[(x - mathbb{E}[x])^2] = mathbb{E}[x^2] - mathbb{E}[x]^2)
Expectation is not a linear operator. However variance is a linear operator.
(2)
Estimate the expectation:
(E(A) = 1 imes1+5 imes2+6 imes3+3 imes4+2 imes5+1 imes6 = 57 \ E(B) = 6 imes1+1 imes2+1 imes3+4 imes4+1 imes5+5 imes6 = 62 \ E(C) = 3 imes1+2 imes2+3 imes3+3 imes4+4 imes5+3 imes6 = 66)
Estimate the variance:
(mu = 3), so
(var(A) = frac{[(1-3)^2+(5-3)^2+(6-3)^2+(3-3)^2+(2-3)^2+(1-3)^2]}{5} = 4.4 \ var(B) = frac{[(6-3)^2+(1-3)^2+(1-3)^2+(4-3)^2+(1-3)^2+(5-3)^2]}{5} = 5.2 \ var(C) = frac{[(3-3)^2+(2-3)^2+(3-3)^2+(3-3)^2+(4-3)^2+(3-3)^2]}{5} = 0.4).
(3) don't know
To do

3b) It is a Cold World

(1)
p(back): Probability that a person has back pain
p(cold): Probability that a person has a cold
p(ncold): Probability that a person doesn't have a cold
(2) (xin N)
(3)
(p(cold) = 4\% \ p(back|cold) = 25\% \ p(back|ncold) = 10\%)
(4)
(p(cold|back) = frac{p(cold,back)}{p(back)} \ qquadqquad=frac{p(back|cold)p(cold)}{p(back|cold)+p(back|ncold)} \ qquadqquad=frac{25\% imes4\%}{25\%+10\%} \ qquadqquad=2.86\%)

3c) Cure the Virus

(1)
(mutated = 42\%+2.6\% = 44.6\% \ unmutated = 58\%-2.6\% = 55.4\%)
(egin{array}{c|ccc} State & ext{m} & ilde{m}\ hline ilde{m} & 44.6\% & 55.4\% \ end{array})
(2)
```
import numpy as np
import matplotlib.pyplot as plt

# State space
states = ["Mutated","Unmutated"]

# Possible events
transitionName = [["UM","UU"],["MU","MM"]]

# Probilistic matrix
transitionMatrix = [0.446,0.554]



def activity_forecast(days):
    # Choose initial state
    activityToday = "Unmutated"
    print("Start state: " + activityToday)
    # Initial state list
    activityList = [activityToday]
    prob_list = []
    i = 0
    # Calculate the probability of activityList
    prob = 1
    while i != days:
        if activityToday == "Unmutated":
            change = np.random.choice(transitionName[0],replace=True,p=transitionMatrix)
            if change == "UM":
                prob = prob * 0.446
                activityToday = "Mutated"
                activityList.append("Mutated")
                prob_list.append(prob)
                pass
            elif change == "UU":
                prob = prob * 0.554
                activityList.append("Unmutated")
                prob_list.append(prob)
        elif activityToday == "Mutated":
            change = np.random.choice(transitionName[1], replace=True, p=transitionMatrix)
            if change == "MU":
                prob = prob * 0.446
                activityToday = "Unmutated"
                activityList.append("Unmutated")
                prob_list.append(prob)
                pass
            elif change == "MM":
                prob = prob * 0.554
                activityList.append("Mutated")
                prob_list.append(prob)
        i += 1
    print("Possible states: " + str(activityList))
    print("End state after "+ str(days) + " days: " + activityToday)
    print("Probability of the possible sequence of states: " + str(prob))

    x = np.arange(0, 18, 1)
    prob_list = np.array(prob_list)
    plt.plot(x,prob_list)
    plt.show()

# predict states after 18 days
activity_forecast(18)
```
The result is
```
Start state: Unmutated
Possible states: ['Unmutated', 'Unmutated', 'Mutated', 'Unmutated', 'Unmutated', 'Mutated', 'Mutated', 'Unmutated', 'Unmutated', 'Unmutated', 'Unmutated', 'Mutated', 'Unmutated', 'Mutated', 'Mutated', 'Mutated', 'Mutated', 'Unmutated', 'Mutated']
End state after 18 days: Mutated
Probability of the possible sequence of states: 3.432429382297346e-06
```
Plot:

(3) don't know
To do

Task 4: Information Theory

(1)
Measure the information in single:
(h(p_i) = -log_{2} p_i)
So
(h(S_1) = -log_{2} 0.04 = 4.64 \ h(S_2) = -log_{2} 0.22 = 2.18 \ h(S_3) = -log_{2} 0.67 = 0.58 \ h(S_4) = -log_{2} 0.07 = 3.83 \)
(2)
Measure average information:
(H(p) = mathbb{E}[h(.)] = sum_{i} p_i h(p_i) = -sum_{i} p_i log_{2} p_i)
So the entropy is
(0.04 imes4.64 + 0.22 imes2.18 + 0.67 imes0.58 + 0.07 imes3.83 \ =1.3219)

Task 5: Bayesian Decision Theory

(5a) Optimal Boundary

Bayes Decision Theory describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
Its Goal is to minimize the misclassification rate.
Bayes optimal classification based on probability distributions (p(x|C_k)p(C_k))
We compare the result among the magnitude of (p(x|C_1)) and (p(x|C_2))
Who is bigger, then we choose it. For example, when (p(x|C_1)) is bigger, then we choose (C_1)

(5b) Decision Boundaries

(p(x|C_1) = frac{p(xC_1)}{p(C_1)} = frac{p(C_1|x)p(x)}{p(C_1)}\ p(x|C_2) = frac{p(xC_2)}{p(C_2)} = frac{p(C_2|x)p(x)}{p(C_2)})
Because (p(C_1) = p(C_2)) and (sigma_1 = sigma_2)
According to Gaussian Distribution so (mu_1 = mu_2)
So (p(x|a) = p(x|b))
Then (p(xa) = p(xb))
So (x^* = mu_1 quad ext{or} quad mu_2)

5c) Different Misclassification Costs

lecture_04_bayesian_decision_theory page 17
作者：Rest探路者
出处：http://www.cnblogs.com/Java-Starter/
本文版权归作者和博客园共有，欢迎转载，但未经作者同意请保留此段声明，请在文章页面明显位置给出原文连接
Github：https://github.com/cjy513203427
相关阅读:
Linked list
mysql(1)
层
 mysql 1130 问题
 POST乱码
 GET乱码（2）
GET乱码
 P65——练习题2.31
2.1.9 C语言中的移位运算
 2.1中所想的问题：指针的类型有什么作用？
原文地址：https://www.cnblogs.com/Java-Starter/p/14775184.html

Task 1: Machine Learning Introduction

1a) Model Fitting

Task 2: Linear Algebra Refresher

2a) Matrix Properties

2b) Matrix Inversion (7 Points)

2c) Matrix Pseudoinverse

2d) Basis Transformation

Task 3: Statistics Refresher

3a) Expectation and Variance

3b) It is a Cold World

3c) Cure the Virus

Task 4: Information Theory

Task 5: Bayesian Decision Theory

(5a) Optimal Boundary

(5b) Decision Boundaries

5c) Different Misclassification Costs