• SML_Assignment1


    Task 1: Machine Learning Introduction

    1a) Model Fitting

    we can use linear model (classification)

    The right classification is

    striped circle triangle
    10 1
    black circle triangle
    11 1

    but we get

    striped circle triangle
    11 1
    black circle triangle
    10 1

    Because linear model (classification) cannot precisely classify the training set.

    Task 2: Linear Algebra Refresher

    2a) Matrix Properties

    1.Disprove of commutative properties of matrix
    Reference: https://proofwiki.org/wiki/Matrix_Multiplication_is_not_Commutative
    For an order n square matrix D, let D′ be the square matrix of order n+1 defined as:
    (d_{ij}' = egin{cases} d_{ij}: & i<n+1 land j<n+1 \ 0: & i=n+1 lor j=n+1 end{cases})
    Thus D′ is just D with a zero row and zero column added at the ends.
    We have that D is a submatrix of D′.
    Now:
    ((a'b')_{ij} = egin{cases} sum_{r=1}^{n+1} a'_{ir}b'_{rj}, & i<n+1 land j<n+1 \ 0, & i=n+1 lor j=n+1 end{cases})
    But:
    (sum_{r=1}^{n+1} a'_{ir}b'_{rj}=a'_{i(n+1)}b'_{(n+1)i}+sum_{r=1}^n a'_{ir}b'_{rj} \ qquadqquadquad=sum_{r=1}^n a_{ir}b_{rj})
    and so:
    (mathbf{A'B'}(n+1,n+1)=(mathbf{AB})'(n+1,n+1) \ qquadqquadqquadqquad=mathbf{AB} \ qquadqquadqquadqquad eq mathbf{BA} \ qquadqquadqquadqquad= (mathbf{BA})'(n+1,n+1) \ qquadqquadqquadqquad= mathbf{B'A'}(n+1,n+1) )
    Thus it is seen that:
    (exists mathbf{A'},mathbf{B'}in mathcal{M_{n+1 imes n+1}}:mathbf{A'B'} eqmathbf{B'A'})
    So (P(k)Rightarrow P(k+1)) and the result follows by the Principle of Mathematical Induction
    Therefore:
    (exists mathbf{A},mathbf{B}inmathcal{M_R(n)}:mathbf{AB} eqmathbf{BA})
    2.Prove of distributive properties of matrix
    Reference:https://proofwiki.org/wiki/Matrix_Multiplication_Distributes_over_Matrix_Addition
    Let (mathbf{A}=[a]_{nn},mathbf{B}=[b]_{nn},mathbf{C}=[c]_{nn}) be matrices over a ring ((mathit{R},+,circ))
    Consider (mathbf{A}(mathbf{B}+mathbf{Cd}))
    Let (mathbf{R}=[r]_{nn}=mathbf{B}+mathbf{C},mathbf{S}=[r]_{nn}=mathbf{A}(mathbf{B}+mathbf{C}))
    Let (mathbf{G}=[g]_{nn}=mathbf{A}mathbf{B},mathbf{H}=[h]_{nn}=mathbf{A}mathbf{C})
    Then:
    (s_{ij}=sum_{k=1}^n a_{ik}circ r_{kj} \ r_{kj}=b_{kj}+c_{kj} \ Rightarrow s_{ij}=sum_{k=1}^n a_{ik}circ(b_{kj}+c_{kj}) \ qquad =sum_{k=1}^n a_{ik}b_{kj}+sum_{k=1}^n a_{ik}c_{kj} \ qquad =g_{ij}+h_{ij})
    Thus:
    (mathbf{A}(mathbf{B}+mathbf{C})=(mathbf{A}mathbf{B})+(mathbf{A}mathbf{C}))
    A similar construction shows that:
    ((mathbf{B}+mathbf{C})mathbf{A}=(mathbf{B}mathbf{A})+(mathbf{C}mathbf{A}))
    3.Prove of associative properties of matrix
    Reference:https://proofwiki.org/wiki/Matrix_Multiplication_is_Associative
    Let (mathbf{A}=[a]_{nn},mathbf{B}=[b]_{nn},mathbf{C}=[c]_{nn}) be matrices
    From inspection of the subscripts, we can see that both ((mathbf{A}mathbf{B})mathbf{C}) and (mathbf{A}(mathbf{B}mathbf{C})) are defined
    Consider((mathbf{A}mathbf{B})mathbf{C})
    Let (mathbf{R}=[r]_{nn}=mathbf{A}mathbf{B},mathbf{S}=[s]_{nn}=mathbf{A}(mathbf{B}mathbf{C}))
    Then:
    (s_{ij}=sum_{k=1}^n r_{ik}circ c_{kj} \ r_{ik}=sum_{l=1}^n a_{il}circ b_{lk} \ Rightarrow s_{ij}=sum_{k=1}^n (sum_{l=1}^n a_{il}circ b_{lk})circ c_{kj} \ qquad=sum_{k=1}^nsum_{l=1}^n (a_{il}circ b_{lk})circ c_{kj})
    Now consider (mathbf{A}(mathbf{B}mathbf{C}))
    Let (mathbf{R} = [r]_{nn} = mathbf{B}mathbf{C},mathbf{S}=[s]_{nn}=mathbf{A}(mathbf{B}mathbf{C}))
    Then:
    (s_{ij}=sum_{l=1}^n a_{il}circ r_{lj} \ r_{lj}=sum_{k=1}^n b_{lk}circ c_{kj} \ Rightarrow s_{ij}=sum_{l=1}^n a_{il}(sum_{k=1}^n circ b_{lk})circ c_{kj} \ qquad=sum_{l=1}^nsum_{k=1}^n a_{il}circ(b_{lk}circ c_{kj}))
    Using Ring Axiom M1: Associativity of Product:
    (s_{ij} = sum_{k=1}^nsum_{l=1}^n (a_{il}circ b_{lk})circ c_{kj} = sum_{l=1}^nsum_{k=1}^n a_{il}circ(b_{lk}circ c_{kj})=s'_{ij})
    It is concluded that:
    ((mathbf{A}mathbf{B})mathbf{C}=mathbf{A}(mathbf{B}mathbf{C}))

    2b) Matrix Inversion (7 Points)

    (A^{-1}=frac{1}{|A|}A^* \ A^*=egin{bmatrix} A_{11} & A_{21} & A_{31} \ A_{12} & A_{22} & A_{32} \ A_{13} & A_{23} & A_{33} end{bmatrix})
    (A_{xx}) is algebraic complement. After calculating:
    (A^*=egin{bmatrix} c & -a & ad-bc \ -1 & 1 & b-d \ 0 & 0 & c-a end{bmatrix})
    (|A|=c+0+0-(0+0+a)=c-a)
    only when (c-a eq 0), A is invertible, b can be any value.
    so
    (A^{-1}=frac{1}{c-a}egin{bmatrix} c & -a & ad-bc \ -1 & 1 & b-d \ 0 & 0 & c-a end{bmatrix})
    If we change the matrix to
    (A = egin{bmatrix} 2 & 2 & 3 \ 0 & 1 & 0 \ 8 & 3 & 12 end{bmatrix})
    (|A|=2 imes1 imes2+0+0-3 imes1 imes8-0-0 = 0)
    When (|A|=0), it is not invertible.

    2c) Matrix Pseudoinverse

    Left-Pseudo Inverse:
    (color{red}{mathbf{J'}}mathbf{J}=color{red}{mathbf{(J^T J)^{-1}J^T}}mathbf{J}=mathbf{I_m})
    Works if J has full column rank
    Right-Pseudo Inverse:
    (mathbf{J}color{red}{mathbf{J'}}=mathbf{J}color{red}{mathbf{J^T(J J^T )^{-1}}}=mathbf{I_n})
    Works if J has full row rank
    Given (A in mathbb{R}^{2 imes3})
    First calculate dimensionality of left-pseudo
    (color{red}{mathbf{(J^T J)^{-1}J^T}}=(mathbb{R}^{3 imes2}mathbb{R}^{2 imes3})^{-1}mathbb{R}^{3 imes2} \ qquadqquadquad=mathbb{R}^{3 imes2})
    Second calculate dimensionality of right-pseudo
    (color{red}{mathbf{J^T(J J^T )^{-1}}} = mathbb{R}^{3 imes2}(mathbb{R}^{2 imes3}mathbb{R}^{3 imes2})^{-1} \ qquadqquadquad=mathbb{R}^{3 imes2})
    So left and right pseudo invert exist

    2d) Basis Transformation

    (1) (T = wv^{-1}\ quad=egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix} egin{bmatrix} 1 & 0 \ 0 & 1 end{bmatrix}^{-1} \ quad=egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix})
    (2) (v = Yw \ Y = egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix} \ w = Y^{-1}v \ =egin{bmatrix} 2 & 3 \ 3 & 4 end{bmatrix}^{-1} egin{bmatrix} 2 \ 5 end{bmatrix} =egin{bmatrix} -4 & 3 \ 3 & -2 end{bmatrix} egin{bmatrix} 2 \ 5 end{bmatrix} =egin{bmatrix} 7 \ -4 end{bmatrix})

    Task 3: Statistics Refresher

    3a) Expectation and Variance

    (1)
    Expectation:
    (mathbb{E}_{xsim p(x)} = mathbb{E}_{x}[f] = mathbb{E}[f] = egin{cases} sum_{x} p(x)f(x), & ext{discrete case} \ int p(x)f(x)\,{ m d}x & ext{continuous case} end{cases} )
    Variance:
    (var[x] = mathbb{E}[(x - mathbb{E}[x])^2] = mathbb{E}[x^2] - mathbb{E}[x]^2)
    Expectation is not a linear operator. However variance is a linear operator.
    (2)
    Estimate the expectation:
    (E(A) = 1 imes1+5 imes2+6 imes3+3 imes4+2 imes5+1 imes6 = 57 \ E(B) = 6 imes1+1 imes2+1 imes3+4 imes4+1 imes5+5 imes6 = 62 \ E(C) = 3 imes1+2 imes2+3 imes3+3 imes4+4 imes5+3 imes6 = 66)
    Estimate the variance:
    (mu = 3), so
    (var(A) = frac{[(1-3)^2+(5-3)^2+(6-3)^2+(3-3)^2+(2-3)^2+(1-3)^2]}{5} = 4.4 \ var(B) = frac{[(6-3)^2+(1-3)^2+(1-3)^2+(4-3)^2+(1-3)^2+(5-3)^2]}{5} = 5.2 \ var(C) = frac{[(3-3)^2+(2-3)^2+(3-3)^2+(3-3)^2+(4-3)^2+(3-3)^2]}{5} = 0.4).
    (3) don't know
    To do

    3b) It is a Cold World

    (1)
    p(back): Probability that a person has back pain
    p(cold): Probability that a person has a cold
    p(ncold): Probability that a person doesn't have a cold
    (2) (xin N)
    (3)
    (p(cold) = 4\% \ p(back|cold) = 25\% \ p(back|ncold) = 10\%)
    (4)
    (p(cold|back) = frac{p(cold,back)}{p(back)} \ qquadqquad=frac{p(back|cold)p(cold)}{p(back|cold)+p(back|ncold)} \ qquadqquad=frac{25\% imes4\%}{25\%+10\%} \ qquadqquad=2.86\%)

    3c) Cure the Virus

    (1)
    (mutated = 42\%+2.6\% = 44.6\% \ unmutated = 58\%-2.6\% = 55.4\%)
    (egin{array}{c|ccc} State & ext{m} & ilde{m}\ hline ilde{m} & 44.6\% & 55.4\% \ end{array})
    (2)

    import numpy as np
    import matplotlib.pyplot as plt
    
    # State space
    states = ["Mutated","Unmutated"]
    
    # Possible events
    transitionName = [["UM","UU"],["MU","MM"]]
    
    # Probilistic matrix
    transitionMatrix = [0.446,0.554]
    
    
    
    def activity_forecast(days):
        # Choose initial state
        activityToday = "Unmutated"
        print("Start state: " + activityToday)
        # Initial state list
        activityList = [activityToday]
        prob_list = []
        i = 0
        # Calculate the probability of activityList
        prob = 1
        while i != days:
            if activityToday == "Unmutated":
                change = np.random.choice(transitionName[0],replace=True,p=transitionMatrix)
                if change == "UM":
                    prob = prob * 0.446
                    activityToday = "Mutated"
                    activityList.append("Mutated")
                    prob_list.append(prob)
                    pass
                elif change == "UU":
                    prob = prob * 0.554
                    activityList.append("Unmutated")
                    prob_list.append(prob)
            elif activityToday == "Mutated":
                change = np.random.choice(transitionName[1], replace=True, p=transitionMatrix)
                if change == "MU":
                    prob = prob * 0.446
                    activityToday = "Unmutated"
                    activityList.append("Unmutated")
                    prob_list.append(prob)
                    pass
                elif change == "MM":
                    prob = prob * 0.554
                    activityList.append("Mutated")
                    prob_list.append(prob)
            i += 1
        print("Possible states: " + str(activityList))
        print("End state after "+ str(days) + " days: " + activityToday)
        print("Probability of the possible sequence of states: " + str(prob))
    
        x = np.arange(0, 18, 1)
        prob_list = np.array(prob_list)
        plt.plot(x,prob_list)
        plt.show()
    
    # predict states after 18 days
    activity_forecast(18)
    
    

    The result is

    Start state: Unmutated
    Possible states: ['Unmutated', 'Unmutated', 'Mutated', 'Unmutated', 'Unmutated', 'Mutated', 'Mutated', 'Unmutated', 'Unmutated', 'Unmutated', 'Unmutated', 'Mutated', 'Unmutated', 'Mutated', 'Mutated', 'Mutated', 'Mutated', 'Unmutated', 'Mutated']
    End state after 18 days: Mutated
    Probability of the possible sequence of states: 3.432429382297346e-06
    

    Plot:

    (3) don't know
    To do

    Task 4: Information Theory

    (1)
    Measure the information in single:
    (h(p_i) = -log_{2} p_i)
    So
    (h(S_1) = -log_{2} 0.04 = 4.64 \ h(S_2) = -log_{2} 0.22 = 2.18 \ h(S_3) = -log_{2} 0.67 = 0.58 \ h(S_4) = -log_{2} 0.07 = 3.83 \)
    (2)
    Measure average information:
    (H(p) = mathbb{E}[h(.)] = sum_{i} p_i h(p_i) = -sum_{i} p_i log_{2} p_i)
    So the entropy is
    (0.04 imes4.64 + 0.22 imes2.18 + 0.67 imes0.58 + 0.07 imes3.83 \ =1.3219)

    Task 5: Bayesian Decision Theory

    (5a) Optimal Boundary

    Bayes Decision Theory describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
    Its Goal is to minimize the misclassification rate.
    Bayes optimal classification based on probability distributions (p(x|C_k)p(C_k))
    We compare the result among the magnitude of (p(x|C_1)) and (p(x|C_2))
    Who is bigger, then we choose it. For example, when (p(x|C_1)) is bigger, then we choose (C_1)

    (5b) Decision Boundaries

    (p(x|C_1) = frac{p(xC_1)}{p(C_1)} = frac{p(C_1|x)p(x)}{p(C_1)}\ p(x|C_2) = frac{p(xC_2)}{p(C_2)} = frac{p(C_2|x)p(x)}{p(C_2)})
    Because (p(C_1) = p(C_2)) and (sigma_1 = sigma_2)
    According to Gaussian Distribution so (mu_1 = mu_2)
    So (p(x|a) = p(x|b))
    Then (p(xa) = p(xb))
    So (x^* = mu_1 quad ext{or} quad mu_2)

    5c) Different Misclassification Costs

    lecture_04_bayesian_decision_theory page 17


    作者:Rest探路者
    出处:http://www.cnblogs.com/Java-Starter/
    本文版权归作者和博客园共有,欢迎转载,但未经作者同意请保留此段声明,请在文章页面明显位置给出原文连接
    Github:https://github.com/cjy513203427

  • 相关阅读:
    Linked list
    mysql(1)

    mysql 1130 问题
    POST乱码
    GET乱码(2)
    GET乱码
    P65——练习题2.31
    2.1.9 C语言中的移位运算
    2.1中所想的问题:指针的类型有什么作用?
  • 原文地址:https://www.cnblogs.com/Java-Starter/p/14775184.html
Copyright © 2020-2023  润新知