    Classification (II) –Neural Network and SVM



    Most research has shown that support vector machines (SVM) and neural networks (NN) are

    powerful classification tools, which can be applied to several different areas. Unlike tree-based

    or probabilistic-based methods that were mentioned in the previous chapter, the process of

    how support vector machines and neural networks transform from input to output is less clear

    and can be hard to interpret. As a result, both support vector machines and neural networks

    are referred to as black box methods.


    The development of a neural network is inspired by human brain activities. As such, this type

    of network is a computational model that mimics the pattern of the human mind. In contrast

    to this, support vector machines first map input data into a high dimension feature space

    defined by the kernel function, and find the optimum hyperplane that separates the training

    data by the maximum margin. In short, we can think of support vector machines as a linear

    algorithm in a high dimensional space.


    Both these methods have advantages and disadvantages in solving classification problems.

    For example, support vector machine solutions are the global optimum, while neural networks

    may suffer from multiple local optimums. Thus, choosing between either depends on the

    characteristics of the dataset source. In this chapter, we will illustrate the following:


     How to train a support vector machine


     Observing how the choice of cost can affect the SVM classifier


     Visualizing the SVM fit

    visualizing Fit

     Predicting the labels of a testing dataset based on the model trained by SVM


     Tuning the SVM


    In the neural network section, we will cover:


    How to train a neural network


     How to visualize a neural network model


     Predicting the labels of a testing dataset based on a model trained by neuralnet


     Finally, we will show how to train a neural network with  nnet , and how to use it to

    predict the labels of a testing dataset


    Classifying data with a support vector machine


    The two most well known and popular support vector machine tools are  libsvm and

    SVMLite . For R users, you can find the implementation of  libsvm in the  e1071 package and

    SVMLite in the  klaR package. Therefore, you can use the implemented function of these

    two packages to train support vector machines. In this recipe, we will focus on using the  svm

    function (the  libsvm implemented version) from the  e1071 package to train a support vector

    machine based on the telecom customer churn data training dataset.


    Getting ready

    In this recipe, we will continue to use the telecom churn dataset as the input data source to

    train the support vector machine. For those who have not prepared the dataset, please refer

    to Chapter 5, Classification (I) – Tree, Lazy, and Probabilistic, for details.


    How to do it...

    Perform the following steps to train the SVM:


    1. Load the e1071 package:


    > library(e1071)

    2. Train the support vector machine using the  svm function with  trainset as the input dataset, and use  churn as the classification category:


    > model = svm(churn~., data = trainset, kernel="radial", cost=1,

    gamma = 1/ncol(trainset))

    3.Finally, you can obtain overall information about the built model with  summary :


    > summary(model)


    svm(formula = churn ~ ., data = trainset, kernel = "radial", cost

    = 1, gamma = 1/ncol(trainset))


    SVM-Type: C-classification

    SVM-Kernel: radial

    cost: 1

    gamma: 0.05882353

    Number of Support Vectors: 691

    ( 394 297 )

    Number of Classes: 2


    yes no

    How it works...

    The support vector machine constructs a hyperplane (or set of hyperplanes) that maximize the margin width between two classes in a high dimensional space. In these, the cases that define the hyperplane are support vectors, as shown in the following figure:


    Figure 1: Support Vector Machine

    Support vector machine starts from constructing a hyperplane that maximizes the margin width. Then, it extends the definition to a nonlinear separable problem. Lastly, it maps the data to a high dimensional space where the data can be more easily separated with a linear boundary.


    The advantage of using SVM is that it builds a highly accurate model through an engineering problem-oriented kernel. Also, it makes use of the regularization term to avoid over-fitting. It also does not suffer from local optimal and multicollinearity. The main limitation of SVM is its speed and size in the training and testing time. Therefore, it is not suitable or efficient enough to construct classification models for data that is large in size. Also, since it is hard to interpret SVM, how does the determination of the kernel take place? Regularization is another problem that we need tackle.


    In this recipe, we continue to use the telecom  churn dataset as our example data source.We begin training a support vector machine using  libsvm provided in the  e1071 package.Within the training function,  svm , one can specify the  kernel function, cost, and the  gamma function. For the  kernel argument, the default value is radial, and one can specify the kernel to a linear, polynomial, radial basis, and sigmoid. As for the  gamma argument, the default value is equal to (1/data dimension), and it controls the shape of the separating hyperplane. Increasing the gamma argument usually increases the number of support vectors.

    在这个食谱中,我们继续使用电信客户流失数据集作为我们的示例数据源。我们开始使用在e1071包LIBSVM支持向量机训练。在训练函数中,支持向量机可以指定核函数、代价和Gamma函数。对于内核参数,默认值是径向的,可以指定内核的线性,多项式,径向基,乙状结肠。至于伽玛参数,默认值等于(1 /数据维数),并且它控制分离超平面的形状。增加伽玛参数通常会增加支持向量的数目。

    As for the cost, the default value is set to 1, which indicates that the regularization term is constant, and the larger the value, the smaller the margin is. We will discuss more on how the cost can affect the SVM classifier in the next recipe. Once the support vector machine is built, the  summary function can be used to obtain information, such as calls, parameters, number of classes, and the types of label.


    See also

    Another popular support vector machine tool is  SVMLight . Unlike the  e1071 package, which provides the full implementation of  libsvm , the  klaR package simply provides an interface to  SVMLight only. To use  SVMLight , one can perform the following steps:


    1. Install the  klaR package:


    > install.packages("klaR")

    > library(klaR)

    2.Download the SVMLight source code and binary for your platform from http://svmlight.joachims.org/ . For example, if your guest OS is Windows 64-bit, you should downloadthefilefromhttp://download.joachims.org/svm_light/current/svm_light_windows64.zip

    3. Then, you should unzip the file and put the workable binary in the working directory; you may check your working directory by using the  getwd function:

    3. Then, you should unzip the file and put the workable binary in the working directory; you may check your working directory by using the  getwd function:


    > getwd()

    4. Train the support vector machine using the  svmlight function:


    > model.light = svmlight(churn~., data = trainset,

    kernel="radial", cost=1, gamma = 1/ncol(trainset))

    Choosing the cost of a support vector machine


    The support vector machines create an optimum hyperplane that separates the training data by the maximum margin. However, sometimes we would like to allow some misclassifications while separating categories. The SVM model has a cost function, which controls training errors and margins. For example, a small cost creates a large margin (a soft margin) and allows more misclassifications. On the other hand, a large cost creates a narrow margin (a hard margin) and permits fewer misclassifications. In this recipe, we will illustrate how the large and small cost will affect the SVM classifier.


    Getting ready

    In this recipe, we will use the  iris dataset as our example data source.


    How to do it...

    Perform the following steps to generate two different classification examples with different costs:


    1. Subset the  iris dataset with columns named as  Sepal.Length ,  Sepal.Width ,Species , with species in  setosa and  virginica :


    > iris.subset = subset(iris, select=c("Sepal.Length", "Sepal.

    Width", "Species"), Species %in% c("setosa","virginica"))

    1. Then, you can generate a scatter plot with  Sepal.Length as the x-axis and the Sepal.Width as the y-axis:


    > plot(x=iris.subset$Sepal.Length,y=iris.subset$Sepal.Width,

    col=iris.subset$Species, pch=19)

    Figure 2: Scatter plot of Sepal.Length and Sepal.Width with subset of iris dataset

    1. Next, you can train SVM based on  iris.subset with the cost equal to 1:

    3.接下来,你可以训练SVM基于成本等于1 iris.subset:

    > svm.model = svm(Species ~ ., data=iris.subset, kernel='linear',

    cost=1, scale=FALSE)

    1. Then, we can circle the support vector with blue circles:


    > points(iris.subset[svm.model$index,c(1,2)],col="blue",cex=2)

    Figure 3: Circling support vectors with blue ring

    1. Lastly, we can add a separation line on the plot:


    > w = t(svm.model$coefs) %*% svm.model$SV

    > b = -svm.model$rho

    > abline(a=-b/w[1,2], b=-w[1,1]/w[1,2], col="red", lty=5)

    1. In addition to this, we create another SVM classifier where  cost = 10,000 :


    > plot(x=iris.subset$Sepal.Length,y=iris.subset$Sepal.Width,

    col=iris.subset$Species, pch=19)

    > svm.model = svm(Species ~ ., data=iris.subset, type='C-

    classification', kernel='linear', cost=10000, scale=FALSE)

    > points(iris.subset[svm.model$index,c(1,2)],col="blue",cex=2)

    > w = t(svm.model$coefs) %*% svm.model$SV

    > b = -svm.model$rho

    > abline(a=-b/w[1,2], b=-w[1,1]/w[1,2], col="red", lty=5)

    Figure 5: A classification example with large cost

    How it works...

    In this recipe, we demonstrate how different costs can affect the SVM classifier. First, we create an iris subset with the columns,  Sepal.Length ,  Sepal.Width , and  Species containing the species,  setosa and  virginica . Then, in order to create a soft margin and allow some misclassification, we use an SVM with small cost (where  cost = 1 ) to train the support of the

    vector machine. Next, we circle the support vectors with blue circles and add the separation line. As per Figure 5, one of the green points ( virginica ) is misclassified (it is classified to setosa ) to the other side of the separation line due to the choice of the small cost.

    在这个配方中,我们演示了如何不同的成本可以影响SVM分类器。首先,我们创建一个列,萼片虹膜的子集。长度,宽度,和萼片。两种物种,粗糙和锦葵。然后,为了创造一个软边缘和允许一些误判,我们用SVM和一些小的成本(wherecost = 1)去练习支持。接下来,我们将支持向量与蓝色圆圈和添加分离线。如图5,一个绿色的点(锦葵)是错误的(这是分类到setosa)的分离线由于小成本的选择对方。

    In addition to this, we would like to determine how a large cost can affect the SVM classifier. Therefore, we choose a large cost (where  cost = 10,000 ). From Figure 5, we can see that the margin created is narrow (a hard margin) and no misclassification cases are present. As a result, the two examples show that the choice of different costs may affect the margin created and also affect the possibilities of misclassification.

    除此之外,我们要确定一个大的成本会影响SVM分类器。因此,我们选择一个大的成本(成本= 10000)。从图5,我们可以看到边缘创建窄(硬边缘)和阳离子病例无误分类训练样本。作为一个结果,这两个例子表明,不同成本的选择可能影响利润创造和影响误分类训练样本的阳离子的可能性。

    See also

    The idea of soft margin, which allows misclassified examples, was suggested by Corinna Cortes and Vladimir N. Vapnik in 1995 in the following paper: Cortes, C., and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.

    软边缘的概念,它允许误分类训练样本ED的例子,建议由Corinna Cortes和Vladimir N. Vapnik在1995在以下论文:科尔特斯,C,和Vapnik V(1995)。支持向量网。机器学习,20(3),273-297。    -----------摘自百度



