• 教师编制考试数据分析


    .背景:因为女朋友最近考上了教师编,所以拿到了教师编制 笔试 面试的数据,进行笔试面试 上岸数据分析。

    数据源:xx省xx市教师编制考试成绩数据

    1.准备数据:

    # 导入相关包
    import
    pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from sklearn import svm from sklearn import metrics import matplotlib.pyplot as plt import seaborn as sns from pandas import plotting sns.set_style("whitegrid") plt.style.use('seaborn')
    # 导入数据集
    io = r'G:PythonLearnirisdataDataCalculate.xlsx'
    data = pd.read_excel(io, sheet_name='Sheet1')

    查看数据:

    data.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 39 entries, 0 to 38
    Data columns (total 7 columns):
     #   Column            Non-Null Count  Dtype  
    ---  ------            --------------  -----  
     0   ranking_written   39 non-null     int64  
     1   written           39 non-null     float64
     2   ranking_audition  39 non-null     int64  
     3   audition          39 non-null     float64
     4   total             39 non-null     float64
     5   ranking_total     39 non-null     int64  
     6   complete          39 non-null     object 

    查看数据:

    print(data)
    ranking_written  written  ranking_audition  ...  total  ranking_total  complete
    0                 1    84.75                 2  ...  87.30              1        ON
    1                 2    78.70                 3  ...  84.40              2        ON
    2                 7    75.15                 1  ...  83.58              3        ON
    3                12    72.70                 4  ...  81.88              4        ON
    4                 8    74.70                 8  ...  81.72              5        ON
    5                 4    75.70                15  ...  81.52              6        ON
    6                 3    76.15                21  ...  81.34              7        ON
    7                13    72.05                 6  ...  81.26              8        ON
    8                 6    75.20                19  ...  81.08              9        ON
    9                11    73.95                16  ...  80.82             10        ON
    10               15    70.70                 7  ...  80.60             11        ON
    11               10    73.95                22  ...  80.46             12        ON
    12               14    71.65                10  ...  80.38             13        ON
    13                9    74.15                29  ...  79.82             14       OFF
    14                5    75.55                33  ...  79.78             15       OFF
    15               29    65.10                 5  ...  78.72             16       OFF
    16               19    68.80                18  ...  78.64             17       OFF
    17               21    67.05                11  ...  78.30             18       OFF
    18               17    69.60                31  ...  77.76             19       OFF
    19               25    65.70                13  ...  77.64             20       OFF
    20               20    68.35                26  ...  77.62             21       OFF
    21               22    66.50                20  ...  77.60             22       OFF
    22               26    65.60                14  ...  77.60             23       OFF
    23               30    65.10                12  ...  77.52             24       OFF
    24               32    63.85                 9  ...  77.38             25       OFF
    25               16    70.20                35  ...  77.16             26       OFF
    26               24    65.75                23  ...  76.82             27       OFF
    27               27    65.55                25  ...  76.62             28       OFF
    28               31    64.95                24  ...  76.50             29       OFF
    29               18    69.10                38  ...  76.12             30       OFF
    30               28    65.45                32  ...  76.10             31       OFF
    31               23    65.85                34  ...  75.78             32       OFF
    32               38    59.30                17  ...  74.96             33       OFF
    33               34    60.65                27  ...  74.54             34       OFF
    34               36    60.00                28  ...  74.28             35       OFF
    35               33    62.35                37  ...  73.78             36       OFF
    36               39    59.25                30  ...  73.74             37       OFF
    37               35    60.20                36  ...  73.16             38       OFF
    38               37    59.90                39  ...  23.96             39       OFF

    1.探索数据之间的关系:

     通过 violinplot 与  pointplot 通过斜率与分布,探索笔试和面试 以及上岸的关系

    # 设置颜色主题
    antV = ['#1890FF', '#2FC25B', '#FACC14', '#223273', '#8543E0', '#13C2C2', '#3436c7', '#F04864']
    # 绘制  pointplot
    # 各特征与上岸之间的关系
    f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
    sns.despine(left=True)
    sns.violinplot(x='complete', y='ranking_written', data=data, palette=antV, ax=axes[0, 0])
    sns.violinplot(x='complete', y='written', data=data, palette=antV, ax=axes[0, 1])
    sns.violinplot(x='complete', y='ranking_audition', data=data, palette=antV, ax=axes[1, 0])
    sns.violinplot(x='complete', y='audition', data=data, palette=antV, ax=axes[1, 1])

     

    # 绘制  pointplot
    # 各特征与上岸之间的关系
    f, axes = plt.subplots(2, 2, figsize=(8, 8), sharex=True)
    sns.despine(left=True)
    sns.pointplot(x='complete', y='ranking_written', data=data, color=antV[0], ax=axes[0, 0])
    sns.pointplot(x='complete', y='written', data=data, color=antV[0], ax=axes[0, 1])
    sns.pointplot(x='complete', y='ranking_audition', data=data, color=antV[0], ax=axes[1, 0])
    sns.pointplot(x='complete', y='audition', data=data, color=antV[0], ax=axes[1, 1])

     

    各特征值之间矩阵图关系

    sns.pairplot(data=data, palette=antV, hue='complete')

    Andrews Curves 适合进行数据校验,对数据中异常的数据进行数据校验。
    plt.subplots(figsize=(10, 8))
    plotting.andrews_curves(data, 'complete', colormap='cool')

    分别基于 笔试和面试 笔试排名和面试排名进行线性回归分析:

    sns.lmplot(data=data, x='written', y='audition', palette=antV, hue='complete')

    sns.lmplot(data=data, x='ranking_written', y='ranking_audition', palette=antV, hue='complete')

    最后通过热力图找出不同属性之间的相关性 相关性体现在热力图的正负值:

    2.机器学习

    通过机器学习 以笔试成绩 面试成绩预测其是否上岸,其他辅助数据笔试排名 面试排名

    进行机器学习之前 将数据集进行拆分为训练集和测试集 将是否上岸转换为 0 1

    # 载入特征和标签集
    X = data[['ranking_written', 'written', 'ranking_audition', 'audition', 'total', 'ranking_total']]
    Y = data['complete']
    # 对标签集进行编码
    encoder = LabelEncoder()
    y = encoder.fit_transform(Y)
    print(y)
    [1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 0]

     将数据集进行 7:3 的拆分  拆分为训练数据和测试数据

    # 对各阶段排名 以及成绩 最终是否进入进行机器学习
    train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=101)
    print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
    (27, 6) (27,) (12, 6) (12,)

    检查不同模型的准确性分析

    # 通用模型的机器学习测试方式
    model = svm.SVC()
    model.fit(train_X, train_y)
    prediction = model.predict(test_X)
    print('The accuracy of the SVM is: {0}'.format(metrics.accuracy_score(prediction, test_y)))
    The accuracy of the SVM is: 1.0
    # 笔试属性 与最终结果之间的关系
    written = data[['ranking_written', 'written', 'complete']]
    train_w, test_w = train_test_split(written, test_size=0.3, random_state=0)
    train_x_w = train_w[['ranking_written', 'written']]
    train_y_w = train_w.complete
    test_x_w = test_w[['ranking_written', 'written']]
    test_y_w = test_w.complete
    
    model = svm.SVC()
    model.fit(train_x_w, train_y_w)
    prediction = model.predict(test_x_w)
    print('The accuracy of the SVM using Written is: {0}'.format(metrics.accuracy_score(prediction, test_y_w)))
    # 面试属性 与最终结果之间的关系
    audition = data[['ranking_audition', 'audition', 'complete']]
    train_a, test_a = train_test_split(audition, test_size=0.3, random_state=0)
    train_x_a = train_a[['ranking_audition', 'audition']]
    train_y_a = train_a.complete
    test_x_a = test_a[['ranking_audition', 'audition']]
    test_y_a = test_a.complete
    
    model = svm.SVC()
    model.fit(train_x_a, train_y_a)
    prediction = model.predict(test_x_a)
    print('The accuracy of the SVM using audition is: {0}'.format(metrics.accuracy_score(prediction, test_y_a)))
    # 总成绩属性 与最终结果之间的关系
    audition = data[['ranking_total', 'total', 'complete']]
    train_a, test_a = train_test_split(audition, test_size=0.3, random_state=0)
    train_x_a = train_a[['ranking_total', 'total']]
    train_y_a = train_a.complete
    test_x_a = test_a[['ranking_total', 'total']]
    test_y_a = test_a.complete
    model = svm.SVC()
    model.fit(train_x_a, train_y_a)
    prediction = model.predict(test_x_a)
    print('The accuracy of the SVM using total is: {0}'.format(metrics.accuracy_score(prediction, test_y_a)))
    The accuracy of the SVM is: 1.0
    The accuracy of the SVM using Written is: 0.9166666666666666
    The accuracy of the SVM using audition is: 0.8333333333333334
    The accuracy of the SVM using total is: 1.0
  • 相关阅读:
    css属性设置
    自由从摇篮开始 ——杨支柱
    提醒幸福
    随记
    那些回不去的年少时光(桐华)
    Javascript 与正则表达式
    XmlHttpRequest对象的获取及相关操作
    CSS的4种引入方式及优先级
    c#textBox控件限制只允许输入数字及小数点,是否为空
    c# 循环界面控件
  • 原文地址:https://www.cnblogs.com/ad-zhou/p/13716971.html
Copyright © 2020-2023  润新知