Notes ： <Handson ML with Sklearn & TF> Chapter 1

Notes ： <Handson ML with Sklearn & TF> Chapter 1
<Hands-on ML with Sklearn & TF>　　Chapter 1
1. what is ml
  1. from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
2. what problems to solve
  1. exist solution but a lot of hand-tuning/rules
  2. no good solutions using a traditional approach
  3. fluctuating environment
  4. get insight about conplex problem and large data
3. type
  1. whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
  2. whether or not learn incrementally on the fly(online, batch)
  3. whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
4. (un)supervision learning
  1. supervision : include the desired solution called labels
    
    classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
  2. unsupervision : without labels
    
    Clustering : k-means, HCA, ecpectation maximization
    
    Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
    
    Association rule learning : Apriori, Eclat
  3. semisupervision
    
    unsupervision --> supervision
  4. reinforcement : an agent in context
    
    observe the environment
    
    select and perform action
    
    get rewards in return
5. batch/online learning
  1. batch : offline, to known new data need to train a new version from scratch one the full dataset
  2. online : incremental learning : challenge is bad data
6. instance-based/model-based
  1. instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
  2. model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
7. Challenge
  1. insufficient quantity of training data
  2. nonrepresentative training data
  3. poor-quality data
  4. irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
  5. overfitting : regularization -> hyperparameter
  6. underfitting : powerful model; better feature; reduce construct
8. Testing and Validating
  1. 80% of data for training 20% for testing
  2. validating : best model and hyperparameter for training set unliking perform as well on new data
    
    train multiple models with various hyperparameters using training data
    
    to get generatlization error , select the model and hyperparamaters that perform best on the validation set
  3. cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.
　　　Example 1-1:
```
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model

#load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a')

#prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
    #get the pandas dataframe of GDP per capita and Life satisfaction
    oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
    oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
    gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
    gdp_per_capita.set_index("Country", inplace=True)
    full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
    return full_country_stats[["GDP per capita", 'Life satisfaction']]
    
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita) 
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]]

#Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction')

#Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression()

#Train the model
lin_reg_model.fit(X, Y)

#plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show()

#Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))
```
　　　　　　

课后练习挺好的
相关阅读:
lnmp环境搭建
 Git常用命令
 博客园写随笔环境搭建
 Win常用软件
 Docker环境搭建
 ESP-8266 RTOS 环境搭建
 查看Linux信息
 博客园markdown语法
 Java后台技术（TDDL）
Java后台技术（Dubbo入门）
原文地址：https://www.cnblogs.com/yaoz/p/6858417.html