假期找了几本关于机器学习的书,将一些比较重要的核心公式整体到这里。
模型描述 特征空间假设, 寻找线性系数 $ theta $ 以希望用一个线性函数逼近目标向量。
逼近的效果好坏叫做 Cost Function , 下面列出的MSE便是其中一种。
Linear Regression
梯度下降
其中
带有正则项
sklearn-线性回归 1 2 3 4 5 6 7 8 9 10 11 from sklearn.linear_model import LinearRegressionlr = LinearRegression() lr.fit(X, y) lr.intercept_, lr.coef_ from sklearn.metrics import mean_squared_errorfrom sklearn.linear_model import SGDRegressor
对数线性回归 - Logistic Regression $ sigma(t) $ 是Sigmoid函数
Logistic Regression cost function (log loss)
Logistic cost function partial derivatives
sklearn-Logistic Regression 1 2 3 4 5 from sklearn.linear_model import LogisticRegressionlog_reg = LogisticRegression() log_reg.fit(X, y)
Softmax Regression 支持向量机 Support Vector Machine
Decision Functions and Predictions
Hard Margin Classification
subject to
Soft Margin Classification
subject to
subject to
LinearSVC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import numpy as npfrom sklearn import datasetsfrom sklearn.pipeline import Pipelinefrom sklearn.preprocessing import StandardScalerfrom sklearn.svm import LinearSVC < 大专栏 Machine Learning span class="line">iris = datasets.load_iris() X = iris["data" ][:, (2 , 3 )] y = (iris["target" ] == 2 ).astype(np.float64) svm_clf = Pipeline([ ("scaler" , StandardScaler()), ("linear_svc" , LinearSVC(C=1 , loss="hinge" )), ]) svm_clf.fit(X, y)
Common kernels
树 从树到森林。
Decision Tree
Decision Trees
CART cost function for regression
where
DecisionTreeClassifier 1 2 3 4 5 6 7 8 9 from sklearn.datasets import load_irisfrom sklearn.tree import DecisionTreeClassifieriris = load_iris() X = iris.data[:, 2 :] y = iris.target tree_clf = DecisionTreeClassifier(max_depth=2 ) tree_clf.fit(X, y)
Random Forests RF 在我看来是 Ensemble Learning (集成学习)的经典代表。
以Classifiers举例,对待同样的数据, 不同分类器可能有不同的决策结果。
Logistic Regression classifier , Random Forest classifier , K-Nearest Neighbors classifier
自然而然的, 可引入选举策略来作最终决策。
voting of classifier 1 2 3 4 5 6 7 8 9 10 11 12 13 14 from sklearn.ensemble import RandomForestClassifierfrom sklearn.ensemble import VotingClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.svm import SVClog_clf = LogisticRegression() rnd_clf = RandomForestClassifier() svm_clf = SVC() voting_clf = VotingClassifier( estimators=[('lr' , log_clf), ('rf' , rnd_clf), ('svc' , svm_clf)], voting='hard' ) voting_clf.fit(X_train, y_train)
Boosting Adaboost Gradient Boosting 效果指标 确定Model收敛的方向, 对连续和离散模型都有若干种Metrics
Classification $F_1$ 是二者的调和平均
precision_score and recall_score 1 2 from sklearn.metrics import precision_score, recall_score
Regression