Machine Learning

假期找了几本关于机器学习的书，将一些比较重要的核心公式整体到这里。

模型描述

特征空间假设，寻找线性系数 $ theta $ 以希望用一个线性函数逼近目标向量。

逼近的效果好坏叫做 Cost Function，下面列出的MSE便是其中一种。

Linear Regression

梯度下降

其中

带有正则项

Ridge Regression

LASSO

Elastic Net

sklearn-线性回归


from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
lr.intercept_, lr.coef_


from sklearn.metrics import mean_squared_error

# sgd
from sklearn.linear_model import SGDRegressor

对数线性回归 - Logistic Regression

$ sigma(t) $ 是Sigmoid函数

Logistic Regression cost function (log loss)

Logistic cost function partial derivatives

sklearn-Logistic Regression


from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X, y)

Softmax Regression

支持向量机

Support Vector Machine

Decision Functions and Predictions

Hard Margin Classification

subject to

Soft Margin Classification

subject to

Dual Problem

subject to

LinearSVC



import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
< 大专栏  Machine Learningspan class="line">
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]  # petal length, petal width
y = (iris["target"] == 2).astype(np.float64)  # Iris-Virginica

svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("linear_svc", LinearSVC(C=1, loss="hinge")),
    ])

svm_clf.fit(X, y)

Common kernels

Linear

Polynomial

Gaussian RBF

Sigmoid

树

从树到森林。

Decision Tree

Decision Trees

Gini impurity

Entropy

CART cost function for regression

where

DecisionTreeClassifier

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

Random Forests

RF 在我看来是 Ensemble Learning (集成学习)的经典代表。

以Classifiers举例，对待同样的数据，不同分类器可能有不同的决策结果。

Logistic Regression classifier, Random Forest classifier, K-Nearest Neighbors classifier

自然而然的，可引入选举策略来作最终决策。

voting of classifier


from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')
voting_clf.fit(X_train, y_train)

Boosting

Adaboost

Gradient Boosting

效果指标

确定Model收敛的方向，对连续和离散模型都有若干种Metrics

Classification

$F_1$ 是二者的调和平均

precision_score and recall_score

1 2	from sklearn.metrics import precision_score, recall_score

Regression

相关阅读:
Reverse Words in a String II -- LeetCode
计算两点间的距离,hdu-2001
A + B Problem,hdu-1000
ASCII码排序,hdu-2000
定义#define
break语句的使用
 判断一个数是否为素数
 用下面公式求π：
Sum Problem-hdu-1001
正三角形的外接圆面积,nyoj-274
原文地址：https://www.cnblogs.com/lijianming180/p/12037887.html