• docker run sqlfow xgboost demo


    Run SQLFlow Using Docker

    SQLFlow releases several Docker images that contains the SQLFlow server, MySQL server, sample datasets, Jupyter Notebook server, and the SQLFlow plugin for Jupyter.

    You can use these Docker images for either local trying out or production deployment.

    Preparation

    1. Install Docker Community Edition on your PC/Macbook/Server.

    2. Pull the latest SQLFlow Docker images. Or you can also build the Docker image from source code following this guide.

      docker pull sqlflow/sqlflow
      docker pull sqlflow/sqlflow:mysql
      docker pull sqlflow/sqlflow:jupyter
      

    Try Out SQLFlow Using Notebook

    1. Type the below command to start three containers to start a MySQL server, SQLFlow server and a Jupyter notebook server.

      docker run --name=sqlflow-mysql -d -p 8888:8888 sqlflow/sqlflow:mysql
      docker run --net=container:sqlflow-mysql -d sqlflow/sqlflow:latest sqlflowserver
      docker run --net=container:sqlflow-mysql -d sqlflow/sqlflow:jupyter
      
    2. You can also use a specified version (e.g. v0.4.0) of the SQLFlow server by changing the second line above to docker run --net=container:sqlflow-mysql -d sqlflow/sqlflow:v0.4.0 sqlflowserver.

    3. Open a web browser, go to localhost:8888, open any tutorial notebook like iris-dnn.ipynb file, then you can follow the tutorial and run the SQL statements to run the training and prediction.

     

    XGBoost on SQLFlow Tutorial

    Open In PAI-DSW

    This is a tutorial on train/predict XGBoost model in SQLFLow, you can find more SQLFlow usage from the Language Guide, in this tutorial you will learn how to:

    • Train a XGBoost model to fit the boston housing dataset; and
    • Predict the housing price using the trained model;

    The Dataset

    This tutorial would use the Boston Housing as the demonstration dataset. The database contains 506 lines and 14 columns, the meaning of each column is as follows:

    ColumnExplain
    crim per capita crime rate by town.
    zn proportion of residential land zoned for lots over 25,000 sq.ft.
    indus proportion of non-retail business acres per town.
    chas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
    nox nitrogen oxides concentration (parts per 10 million).
    rm average number of rooms per dwelling.
    age proportion of owner-occupied units built prior to 1940.
    dis weighted mean of distances to five Boston employment centres.
    rad index of accessibility to radial highways.
    tax full-value property-tax rate per $10,000.
    ptratio pupil-teacher ratio by town.
    black 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
    lstat lower status of the population (percent).
    medv median value of owner-occupied homes in $1000s.

    We separated the dataset into train/test dataset, which is used to train/predict our model. SQLFlow would automatically split the training dataset into train/validation dataset while training progress.

    In [1]:
     
     
     
     
     
    %%sqlflow
    describe boston.train;
     
     
    Out[1]:
     FieldTypeNullKeyDefaultExtra
    0 crim float YES   None  
    1 zn float YES   None  
    2 indus float YES   None  
    3 chas int(11) YES   None  
    4 nox float YES   None  
    5 rm float YES   None  
    6 age float YES   None  
    7 dis float YES   None  
    8 rad int(11) YES   None  
    9 tax int(11) YES   None  
    10 ptratio float YES   None  
    11 b float YES   None  
    12 lstat float YES   None  
    13 medv float YES   None  
     
    In [2]:
     
     
     
     
     
    %%sqlflow
    describe boston.test;
     
     
    Out[2]:
     FieldTypeNullKeyDefaultExtra
    0 crim float YES   None  
    1 zn float YES   None  
    2 indus float YES   None  
    3 chas int(11) YES   None  
    4 nox float YES   None  
    5 rm float YES   None  
    6 age float YES   None  
    7 dis float YES   None  
    8 rad int(11) YES   None  
    9 tax int(11) YES   None  
    10 ptratio float YES   None  
    11 b float YES   None  
    12 lstat float YES   None  
    13 medv float YES   None  
     

    Fit Boston Housing Dataset

    First, let's train an XGBoost regression model to fit the boston housing dataset, we prefer to train the model for 30 rounds, and using squarederror loss function that the SQLFLow extended SQL can be like:

    TO TRAIN xgboost.gbtree
    WITH
        train.num_boost_round=30,
        objective="reg:squarederror"

    xgboost.gbtree is the estimator name, gbtree is one of the XGBoost booster, you can find more information from here.

    We can specify the training data columns in COLUMN clause, and the label by LABEL keyword:

    COLUMN crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat
    LABEL medv

    To save the trained model, we can use INTO clause to specify a model name:

    INTO sqlflow_models.my_xgb_regression_model

    Second, let's use a standard SQL to fetch the training data from table boston.train:

    SELECT * FROM boston.train

    Finally, the following is the SQLFlow Train statement of this regression task, you can run it in the cell:

    In [3]:
     
     
     
     
     
    %%sqlflow
    SELECT * FROM boston.train
    TO TRAIN xgboost.gbtree
    WITH
        objective="reg:squarederror",
        train.num_boost_round = 30
    COLUMN crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat
    LABEL medv
    INTO sqlflow_models.my_xgb_regression_model;
     
     
     
    Start training XGBoost model...
    
    [08:08:18] 496x13 matrix with 6448 entries loaded from train.txt_0
    
    [0]	train-rmse:17.1167
    
    [1]	train-rmse:12.3003
    
    [2]	train-rmse:8.95505
    
    [3]	train-rmse:6.57081
    
    [4]	train-rmse:4.8891
    
    [5]	train-rmse:3.69686
    
    [6]	train-rmse:2.88164
    
    [7]	train-rmse:2.30785
    
    [8]	train-rmse:1.92918
    
    [9]	train-rmse:1.62107
    
    [10]	train-rmse:1.42948
    
    [11]	train-rmse:1.30048
    
    [12]	train-rmse:1.19441
    
    [13]	train-rmse:1.09655
    
    [14]	train-rmse:1.03565
    
    [15]	train-rmse:0.981582
    
    [16]	train-rmse:0.932085
    
    [17]	train-rmse:0.867206
    
    [18]	train-rmse:0.840945
    
    [19]	train-rmse:0.799894
    
    [20]	train-rmse:0.766308
    
    [21]	train-rmse:0.712008
    
    [22]	train-rmse:0.677878
    
    [23]	train-rmse:0.629921
    
    [24]	train-rmse:0.609242
    
    [25]	train-rmse:0.585225
    
    [26]	train-rmse:0.536009
    
    [27]	train-rmse:0.493594
    
    [28]	train-rmse:0.47246
    
    [29]	train-rmse:0.444385
    
    Evaluation result: {'train': {'rmse': [17.116695, 12.300258, 8.955046, 6.570806, 4.889098, 3.696858, 2.881636, 2.307854, 1.929176, 1.621072, 1.429485, 1.300483, 1.194412, 1.096546, 1.035647, 0.981582, 0.932085, 0.867206, 0.840945, 0.799894, 0.766308, 0.712008, 0.677878, 0.629921, 0.609242, 0.585225, 0.536009, 0.493594, 0.47246, 0.444385]}}
    
    
     

    Predict the Housing Price

    After training the regression model, let's predict the house price using the trained model.

    First, we can specify the trained model by USING clause:

    USING sqlflow_models.my_xgb_regression_model

    Than, we can specify the prediction result table by TO PREDICT clause:

    TO PREDICT boston.predict.medv

    And using a standard SQL to fetch the prediction data:

    SELECT * FROM boston.test

    Finally, the following is the SQLFLow Prediction statement:

    In [4]:
     
     
     
     
     
    %%sqlflow
    SELECT * FROM boston.test
    TO PREDICT boston.predict.medv
    USING sqlflow_models.my_xgb_regression_model;
     
     
     
    rank=0 nworkers=1
    
    2021-09-28 08:08:37.383197 Start predicting XGBoost model...
    
    [08:08:37] SparsePageSource::CreateRowPage Finished writing to predict.txt_0.cache
    
    [08:08:37] 10x13 matrix with 120 entries loaded from predict.txt_0#predict.txt_0.cache
    
    2021-09-28 08:08:37.573467 Done predicting. Predict table: boston.predict
    
    
     

    Let's have a glance at prediction results.

    In [5]:
     
     
     
     
     
    %%sqlflow
    SELECT * FROM boston.predict;
     
     
    Out[5]:
     crimzninduschasnoxrmagedisradtaxptratioblstatmedv
    0 0.28960 0.0 9.69 0 0.585 5.390 72.9 2.7986 6 391 19.2 396.90 21.14 20.6996
    1 0.26838 0.0 9.69 0 0.585 5.794 70.6 2.8927 6 391 19.2 396.90 14.10 21.9199
    2 0.23912 0.0 9.69 0 0.585 6.019 65.3 2.4091 6 391 19.2 396.90 12.92 21.5170
    3 0.17783 0.0 9.69 0 0.585 5.569 73.5 2.3999 6 391 19.2 395.77 15.10 21.1983
    4 0.22438 0.0 9.69 0 0.585 6.027 79.7 2.4982 6 391 19.2 396.90 14.33 20.9567
    5 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 27.7089
    6 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 25.9830
    7 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 30.3944
    8 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 28.3626
    9 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 25.4340
  • 相关阅读:
    POJ 2947:Widget Factory 求同余方程
    高斯消元几道入门题总结POJ1222&&POJ1681&&POJ1830&&POJ2065&&POJ3185
    POJ 1166:The Clocks
    神经网络 --学习之路,资料汇编
    机器学习 入门资料汇编
    无符号 coredump调试
    CentOS 6.3 升级软件 gcc等,并安装部署DNN环境 (未完成,不完整)
    OpenCL size_t error
    Nervanasys --> pycuda --> installation
    tmux.conf
  • 原文地址:https://www.cnblogs.com/bonelee/p/15348482.html
Copyright © 2020-2023  润新知