• Python for Data Science


    Chapter 6 - Other Popular Machine Learning Methods

    Segment 5 - Naive Bayes Classifiers

    Naive Bayes Classifiers

    Naive Bayes is a machine learning method you can use to predict the likelihood that an event will occur given evidence that's present in your data.

    Conditional Probability

    [P(B|A) = frac{P(A and B)}{P(A)} ]

    Tree Types of Naive Bayes Model

    • Multinomial
    • Bernoulli
    • Gaussian

    Naive Bayes Use Cases

    • Spam Detection
    • Customer Classification
    • Credit Risk Protection
    • Health Risk Protection

    Naive Bayes Assumptions

    Predictors are independent of each other.

    A proiri assumption: the assumption the past conditions still hold true; when we make predictions from historical values we will get incorrect results if present circumstances have changed.

    • All regression models maintain a priori assumption as well
    import numpy as np
    import pandas as pd
    import urllib
    import sklearn
    
    from sklearn.model_selection import train_test_split
    from sklearn import metrics
    from sklearn.metrics import accuracy_score
    
    from sklearn.naive_bayes import BernoulliNB
    from sklearn.naive_bayes import GaussianNB
    from sklearn.naive_bayes import MultinomialNB
    

    Naive Bayes

    Using Naive Bayes to predict spam

    url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
    
    import urllib.request
    
    raw_data = urllib.request.urlopen(url)
    dataset = np.loadtxt(raw_data, delimiter=',')
    print(dataset[0])
    
    [  0.      0.64    0.64    0.      0.32    0.      0.      0.      0.
       0.      0.      0.64    0.      0.      0.      0.32    0.      1.29
       1.93    0.      0.96    0.      0.      0.      0.      0.      0.
       0.      0.      0.      0.      0.      0.      0.      0.      0.
       0.      0.      0.      0.      0.      0.      0.      0.      0.
       0.      0.      0.      0.      0.      0.      0.778   0.      0.
       3.756  61.    278.      1.   ]
    
    X = dataset[:,0:48]
    
    y = dataset[:,-1]
    
    X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.2, random_state=17)
    
    BernNB = BernoulliNB(binarize=True)
    BernNB.fit(X_train, y_train)
    print(BernNB)
    
    y_expect = y_test
    y_pred = BernNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    BernoulliNB(binarize=True)
    0.8577633007600435
    
    MultiNB = MultinomialNB()
    MultiNB.fit(X_train, y_train)
    print(MultiNB)
    
    y_pred = MultiNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    MultinomialNB()
    0.8816503800217155
    
    GausNB = GaussianNB()
    GausNB.fit(X_train, y_train)
    print(GausNB)
    
    y_pred = GausNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    GaussianNB()
    0.8197611292073833
    
    BernNB = BernoulliNB(binarize=0.1)
    BernNB.fit(X_train, y_train)
    print(BernNB)
    
    y_expect = y_test
    y_pred = BernNB.predict(X_test)
    
    print(accuracy_score(y_expect, y_pred))
    
    BernoulliNB(binarize=0.1)
    0.9109663409337676
    相信未来 - 该面对的绝不逃避,该执著的永不怨悔,该舍弃的不再留念,该珍惜的好好把握。
  • 相关阅读:
    第一份二线城市工作感悟
    BEGIN failedcompilation aborted at /opt/openssl3.0.1/Configure line 23.
    编译OpenSSL时报错,Can‘t locate IPC/Cmd.pm in @INC
    http://mirrors.aliyun.com/epel/6/x86_64/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22
    centos6 yum源失效的最新操作方式,解决:[Errno 14] PYCURL ERROR 22
    centos7 设置时区和时间
    Centos7将openssl升级版本至 openssl3.0.1
    Linux中mail的用法
    java中如何将嵌套循环性能提高500倍
    MySql日志文件
  • 原文地址:https://www.cnblogs.com/keepmoving1113/p/14349367.html
Copyright © 2020-2023  润新知