• Machine Learning


    假期找了几本关于机器学习的书,将一些比较重要的核心公式整体到这里。

    模型描述

    特征空间假设, 寻找线性系数 $ theta $ 以希望用一个线性函数逼近目标向量。

    逼近的效果好坏叫做 Cost Function, 下面列出的MSE便是其中一种。

    Linear Regression

    梯度下降

    其中

    带有正则项

    • Ridge Regression
    • LASSO
    • Elastic Net
    sklearn-线性回归
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11

    from sklearn.linear_model import LinearRegression
    lr = LinearRegression()
    lr.fit(X, y)
    lr.intercept_, lr.coef_


    from sklearn.metrics import mean_squared_error

    # sgd
    from sklearn.linear_model import SGDRegressor

    对数线性回归 - Logistic Regression

    $ sigma(t) $ 是Sigmoid函数

    Logistic Regression cost function (log loss)

    Logistic cost function partial derivatives

    sklearn-Logistic Regression
    1
    2
    3
    4
    5

    from sklearn.linear_model import LogisticRegression

    log_reg = LogisticRegression()
    log_reg.fit(X, y)

    Softmax Regression

    支持向量机

    Support Vector Machine

    • Decision Functions and Predictions
    • Hard Margin Classification

    subject to

    • Soft Margin Classification

    subject to

    • Dual Problem

    subject to

    LinearSVC
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18


    import numpy as np
    from sklearn import datasets
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    from sklearn.svm import LinearSVC
    < 大专栏  Machine Learningspan class="line">
    iris = datasets.load_iris()
    X = iris["data"][:, (2, 3)] # petal length, petal width
    y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica

    svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge")),
    ])

    svm_clf.fit(X, y)

    Common kernels

    • Linear
    • Polynomial
    • Gaussian RBF
    • Sigmoid

    从树到森林。

    Decision Tree

    Decision Trees

    • Gini impurity
    • Entropy
    • CART cost function for regression

    where

    DecisionTreeClassifier
    1
    2
    3
    4
    5
    6
    7
    8
    9
    from sklearn.datasets import load_iris
    from sklearn.tree import DecisionTreeClassifier

    iris = load_iris()
    X = iris.data[:, 2:] # petal length and width
    y = iris.target

    tree_clf = DecisionTreeClassifier(max_depth=2)
    tree_clf.fit(X, y)

    Random Forests

    RF 在我看来是 Ensemble Learning (集成学习)的经典代表。

    以Classifiers举例,对待同样的数据, 不同分类器可能有不同的决策结果。

    Logistic Regression classifier, Random Forest classifier, K-Nearest Neighbors classifier

    自然而然的, 可引入选举策略来作最终决策。

    voting of classifier
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14

    from sklearn.ensemble import RandomForestClassifier
    from sklearn.ensemble import VotingClassifier
    from sklearn.linear_model import LogisticRegression
    from sklearn.svm import SVC

    log_clf = LogisticRegression()
    rnd_clf = RandomForestClassifier()
    svm_clf = SVC()

    voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')
    voting_clf.fit(X_train, y_train)

    Boosting

    Adaboost

    Gradient Boosting

    效果指标

    确定Model收敛的方向, 对连续和离散模型都有若干种Metrics

    Classification

    $F_1$ 是二者的调和平均

    precision_score and recall_score
    1
    2

    from sklearn.metrics import precision_score, recall_score

    Regression

    • MSE
  • 相关阅读:
    Reverse Words in a String II -- LeetCode
    计算两点间的距离,hdu-2001
    A + B Problem,hdu-1000
    ASCII码排序,hdu-2000
    定义#define
    break语句的使用
    判断一个数是否为素数
    用下面公式求π:
    Sum Problem-hdu-1001
    正三角形的外接圆面积,nyoj-274
  • 原文地址:https://www.cnblogs.com/lijianming180/p/12037887.html
Copyright © 2020-2023  润新知