• 利用CTU-13数据集进行僵尸网络检测


    写在前面,CTU-13的数据集示例:

    StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes,SrcBytes,Label
    2011/08/10 09:46:59.607825,1.026539,tcp,94.44.127.113,1577,   ->,147.32.84.59,6881,S_RA,0,0,4,276,156,flow=Background-Established-cmpgw-CVUT
    2011/08/10 09:47:00.634364,1.009595,tcp,94.44.127.113,1577,   ->,147.32.84.59,6881,S_RA,0,0,4,276,156,flow=Background-Established-cmpgw-CVUT
    2011/08/10 09:47:48.185538,3.056586,tcp,147.32.86.89,4768,   ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt
    2011/08/10 09:47:48.230897,3.111769,tcp,147.32.86.89,4788,   ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt
    2011/08/10 09:47:48.963351,3.083411,tcp,147.32.86.89,4850,   ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt
    2011/08/10 09:47:58.806814,3.097288,tcp,147.32.86.89,4866,   ->,77.75.73.33,80,SR_A,0,0,3,182,122,flow=Background-TCP-Attempt
    2011/08/10 09:51:34.450457,1.048908,tcp,213.200.244.217,47908,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 09:54:55.231320,4.373526,tcp,75.105.28.60,1419,   ->,147.32.84.59,6881,S_RA,0,0,4,252,132,flow=Background-Established-cmpgw-CVUT
    2011/08/10 09:57:13.352114,4.827912,tcp,75.105.28.60,1491,   ->,147.32.84.59,6881,S_RA,0,0,4,252,132,flow=Background-Established-cmpgw-CVUT
    2011/08/10 09:58:43.301515,0.049697,tcp,178.111.79.115,41752,   ->,147.32.84.229,13363,SR_SA,0,0,5,352,208,flow=Background-TCP-Established
    2011/08/10 09:54:09.710772,328.361664,tcp,147.32.84.59,49185,   ->,147.32.80.7,80,SRPA_SPA,0,0,7,760,520,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:00:34.864769,5.242459,tcp,75.105.28.60,1586,   ->,147.32.84.59,6881,S_RA,0,0,4,252,132,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:01:16.344485,0.972390,tcp,89.31.40.106,28451,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:06:19.661695,0.923098,tcp,89.31.40.106,13717,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:07:41.514293,1.009763,tcp,188.112.70.72,1817,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:08:18.464075,0.969967,tcp,85.248.56.40,42480,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:09:36.758829,2.853907,tcp,41.188.145.202,2285,   ->,147.32.84.229,13363,SR_SA,0,0,3,184,122,flow=Background-TCP-Established
    2011/08/10 10:09:38.376337,2.948305,tcp,41.188.145.202,2288,   ->,147.32.84.229,443,SR_SA,0,0,3,184,122,flow=Background-TCP-Established
    2011/08/10 10:09:39.984069,2.956415,tcp,41.188.145.202,2291,   ->,147.32.84.229,80,SR_SA,0,0,3,184,122,flow=Background-TCP-Established
    2011/08/10 10:09:44.359242,0.920374,tcp,89.31.40.106,39927,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:09:39.612736,6.031078,tcp,41.188.145.202,2285,   ->,147.32.84.229,13363,SR_SA,0,0,3,184,122,flow=Background-TCP-Established
    2011/08/10 10:09:41.324642,6.035653,tcp,41.188.145.202,2288,   ->,147.32.84.229,443,SR_SA,0,0,3,184,122,flow=Background-TCP-Established
    2011/08/10 10:09:42.940484,6.028175,tcp,41.188.145.202,2291,   ->,147.32.84.229,80,SR_SA,0,0,3,184,122,flow=Background-TCP-Established
    2011/08/10 10:09:51.394331,0.000000,tcp,147.32.84.59,6881,   ?>,188.112.70.72,1904,RA_,0,0,1,60,60,flow=Background-Attempt-cmpgw-CVUT
    2011/08/10 10:09:57.619871,1.383731,tcp,213.24.237.172,15007,   ->,147.32.84.109,4899,S_RA,0,0,4,244,124,flow=Background-TCP-Attempt
    2011/08/10 10:10:53.596635,0.939583,tcp,85.248.56.40,42572,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:13:28.886935,3.015055,tcp,69.231.198.54,4144,   ->,147.32.86.179,80,SR_SA,0,0,3,184,122,flow=Background-TCP-Established
    2011/08/10 10:13:31.657243,0.987838,tcp,188.112.70.72,2057,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVUT
    2011/08/10 10:13:42.930890,1.419550,tcp,95.153.177.13,35049,   ->,147.32.84.59,6881,S_RA,0,0,4,244,124,flow=Background-Established-cmpgw-CVU
    

     然后僵尸网络的数据,一共4万多条,示例如下:

    2011/08/10 15:48:36.603724,0.059293,udp,147.32.84.165,2079,  <->,94.127.67.112,53,CON,0,0,2,226,71,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.607360,0.103091,udp,147.32.84.165,2079,  <->,192.41.162.30,53,CON,0,0,2,288,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.611327,0.140512,udp,147.32.84.165,2079,  <->,192.55.83.30,53,CON,0,0,2,323,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.615374,0.071502,udp,147.32.84.165,2079,  <->,213.171.60.92,53,CON,0,0,2,240,77,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.619348,0.146849,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,332,78,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.623326,0.069379,udp,147.32.84.165,2079,  <->,193.232.128.6,53,CON,0,0,2,266,73,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.627332,0.013470,udp,147.32.84.165,2079,  <->,178.248.240.75,53,CON,0,0,2,249,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.631326,0.035150,udp,147.32.84.165,2079,  <->,192.12.94.30,53,CON,0,0,2,332,98,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.636594,0.029000,udp,147.32.84.165,2079,  <->,192.93.0.4,53,CON,0,0,2,240,80,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.640670,0.000576,udp,147.32.84.165,2079,  <->,193.232.142.17,53,CON,0,0,2,240,80,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.644677,0.121641,udp,147.32.84.165,2079,  <->,192.31.80.30,53,CON,0,0,2,322,93,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.649388,0.103050,udp,147.32.84.165,2079,  <->,192.41.162.30,53,CON,0,0,2,332,98,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.654584,0.075283,udp,147.32.84.165,2079,  <->,217.10.35.5,53,CON,0,0,2,256,80,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.658527,0.071333,udp,147.32.84.165,2079,  <->,213.171.60.92,53,CON,0,0,2,240,77,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.666330,0.075548,udp,147.32.84.165,2079,  <->,217.10.35.5,53,CON,0,0,2,256,80,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.667170,0.020246,udp,147.32.84.165,2079,  <->,216.239.32.10,53,CON,0,0,2,246,98,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.671732,0.071367,udp,147.32.84.165,2079,  <->,213.171.60.92,53,CON,0,0,2,240,77,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.693473,0.000523,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,386,73,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.693817,0.021204,udp,147.32.84.165,2079,  <->,192.33.4.12,53,CON,0,0,2,434,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.711265,0.096246,udp,147.32.84.165,2079,  <->,216.239.38.10,53,CON,0,0,2,168,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.725172,0.058786,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,373,72,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.732670,0.069816,udp,147.32.84.165,2079,  <->,193.232.128.6,53,CON,0,0,2,218,75,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.739366,0.132555,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,388,73,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.739638,0.203287,udp,147.32.84.165,2079,  <->,199.19.54.1,53,CON,0,0,2,310,72,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.753076,0.178658,udp,147.32.84.165,2079,  <->,76.74.236.21,53,CON,0,0,2,259,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.767083,0.096140,udp,147.32.84.165,2079,  <->,216.239.38.10,53,CON,0,0,2,236,93,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.767456,0.008897,udp,147.32.84.165,2079,  <->,216.239.36.10,53,CON,0,0,2,246,98,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.797369,0.103273,udp,147.32.84.165,2079,  <->,192.41.162.30,53,CON,0,0,2,260,81,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.803168,0.072956,udp,147.32.84.165,2079,  <->,83.242.140.21,53,CON,0,0,2,234,75,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:36.900945,0.138787,udp,147.32.84.165,2079,   ->,207.182.130.90,53,INT,0,,1,81,81,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:36.972182,0.065794,udp,147.32.84.165,2077,  <->,79.137.236.5,53,CON,0,0,2,252,72,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:37.943562,0.000510,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:37.943832,0.056818,udp,147.32.84.165,2079,  <->,89.208.17.22,53,CON,0,0,2,234,70,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:37.944078,0.065793,udp,147.32.84.165,2079,  <->,95.163.69.51,53,CON,0,0,2,162,73,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:37.944471,0.060810,udp,147.32.84.165,2079,  <->,194.226.96.8,53,CON,0,0,2,544,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:37.944743,0.070236,udp,147.32.84.165,2079,  <->,79.174.74.74,53,CON,0,0,2,213,71,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:37.945022,0.055323,udp,147.32.84.165,2079,  <->,217.16.20.30,53,CON,0,0,2,204,71,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:37.993311,0.065569,udp,147.32.84.165,2079,  <->,95.163.69.51,53,CON,0,0,2,193,68,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:38.005950,0.060812,udp,147.32.84.165,2079,  <->,194.226.96.8,53,CON,0,0,2,544,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:38.067526,0.062003,udp,147.32.84.165,2079,  <->,194.85.61.20,53,CON,0,0,2,226,76,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:38.130317,0.000519,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:39.034911,0.062867,udp,147.32.84.165,2077,  <->,213.180.204.213,53,CON,0,0,2,148,74,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:39.035175,0.099364,udp,147.32.84.165,2077,  <->,82.146.55.155,53,CON,0,0,2,142,71,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:39.035453,0.000000,udp,147.32.84.165,2077,   ->,89.149.254.87,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:39.124911,17.884367,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,150,75,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:39.525887,239.334076,tcp,147.32.84.165,1394,   ->,212.117.171.138,65500,SA_SPA,0,0,10,604,122,flow=From-Botnet-V42-TCP-Not-Encrypted-SMTP-Private-Proxy-1
    2011/08/10 15:48:39.526019,9.012423,tcp,147.32.84.165,4944,   ->,188.138.90.25,25,S_,0,,3,186,186,flow=From-Botnet-V42-TCP-Attempt-SPAM
    2011/08/10 15:48:39.526087,9.012417,tcp,147.32.84.165,1400,   ->,74.125.93.27,25,S_,0,,3,186,186,flow=From-Botnet-V42-TCP-Attempt-SPAM
    2011/08/10 15:48:40.036997,0.064627,udp,147.32.84.165,2077,  <->,217.107.217.16,53,CON,0,0,2,217,83,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:40.127212,0.068327,udp,147.32.84.165,2079,  <->,81.177.1.85,53,CON,0,0,2,193,73,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:40.127509,0.000000,udp,147.32.84.165,2079,   ->,173.212.197.124,53,INT,0,,1,77,77,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:41.038227,0.027763,udp,147.32.84.165,2077,   ->,188.40.106.4,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:41.038380,0.056562,udp,147.32.84.165,2077,  <->,188.65.208.29,53,CON,0,0,2,144,72,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:41.134482,22.922115,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,136,68,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:41.134689,0.000289,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,140,70,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:41.134822,0.000535,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:42.129326,0.000542,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:43.041020,0.056619,udp,147.32.84.165,2077,  <->,82.146.43.2,53,CON,0,0,2,142,71,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:43.143158,0.025064,udp,147.32.84.165,2079,  <->,85.25.126.145,53,CON,0,0,2,270,67,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:43.143529,0.000000,udp,147.32.84.165,2079,   ->,77.82.34.183,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:43.143828,0.125798,udp,147.32.84.165,2079,  <->,209.190.16.83,53,CON,0,0,2,356,81,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:43.144038,1.299856,udp,147.32.84.165,2079,  <->,78.46.90.36,53,CON,0,0,2,262,74,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:43.212624,0.055366,udp,147.32.84.165,2079,  <->,217.16.22.30,53,CON,0,0,2,204,71,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:43.631387,3.004279,tcp,147.32.84.165,1389,   ->,199.49.1.56,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM
    2011/08/10 15:48:44.262798,0.024404,udp,147.32.84.165,2079,   ->,78.159.114.121,53,INT,0,,1,70,70,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:44.275499,0.055468,udp,147.32.84.165,2079,  <->,90.156.144.47,53,CON,0,0,2,193,73,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:45.043735,0.061122,udp,147.32.84.165,2077,  <->,217.107.217.16,53,CON,0,0,2,217,83,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:45.333775,3.004314,tcp,147.32.84.165,1305,   ->,188.138.90.25,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM
    2011/08/10 15:48:45.333905,3.004330,tcp,147.32.84.165,1232,   ->,64.12.139.193,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM
    2011/08/10 15:48:45.444050,18.612602,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,136,68,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:45.444255,0.000280,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,140,70,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:46.045303,0.062259,udp,147.32.84.165,2077,  <->,93.158.134.213,53,CON,0,0,2,148,74,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:46.135442,3.003875,tcp,147.32.84.165,1401,   ->,207.115.21.22,25,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt-SPAM
    2011/08/10 15:48:46.445639,0.050659,udp,147.32.84.165,2079,  <->,178.218.208.130,53,CON,0,0,2,193,68,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:46.445764,0.000000,udp,147.32.84.165,2079,   ->,173.212.197.124,53,INT,0,,1,77,77,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:46.445905,0.000373,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:47.056449,0.027269,udp,147.32.84.165,2077,   ->,188.40.106.4,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:47.056523,0.099087,udp,147.32.84.165,2077,  <->,82.146.55.155,53,CON,0,0,2,142,71,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:47.447119,0.000349,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:48.058272,0.000000,udp,147.32.84.165,2077,   ->,89.149.254.87,53,INT,0,,1,72,72,flow=From-Botnet-V42-UDP-Attempt-DNS
    2011/08/10 15:48:48.338346,0.000000,tcp,147.32.84.165,1081,   ->,202.59.166.29,25,S_,0,,1,62,62,flow=From-Botnet-V42-TCP-Attempt-SPAM
    2011/08/10 15:48:48.448462,0.000405,udp,147.32.84.165,2079,  <->,147.32.80.9,53,CON,0,0,2,138,69,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:49.059334,0.056870,udp,147.32.84.165,2077,  <->,188.65.208.29,53,CON,0,0,2,144,72,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 15:48:49.449944,0.055294,udp,147.32.84.165,2079,  <->,217.16.16.30,53,CON,0,0,2,204,71,flow=From-Botnet-V42-UDP-DN
    2011/08/10 11:08:59.574857,0.299675,tcp,147.32.84.165,1282,   ->,94.63.150.20,80,FSPA_FSPA,0,0,10,1310,767,flow=From-Botnet-V42-TCP-WEB-Established
    2011/08/10 11:08:59.970514,15.650887,tcp,147.32.84.165,1283,   ->,195.88.191.59,80,FSPA_FSPA,0,0,164,104023,4266,flow=From-Botnet-V42-TCP-Established-HTTP-Binary-Download-3
    2011/08/10 11:09:07.692938,2.924329,tcp,147.32.84.165,1284,   ->,123.194.145.64,6667,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt
    2011/08/10 11:09:15.281026,0.299838,tcp,147.32.84.165,1285,   ->,94.63.150.20,80,FSPA_FSPA,0,0,10,1301,758,flow=From-Botnet-V42-TCP-WEB-Established
    2011/08/10 11:09:15.704792,0.330985,tcp,147.32.84.165,1286,   ->,94.63.150.20,80,FSPA_FSPA,0,0,10,1310,767,flow=From-Botnet-V42-TCP-WEB-Established
    2011/08/10 11:09:17.696956,1.861224,tcp,147.32.84.165,1287,   ->,58.143.125.236,6667,S_RA,0,0,6,366,186,flow=From-Botnet-V42-TCP-Attempt
    2011/08/10 11:09:22.564057,1.227437,tcp,147.32.84.165,1288,   ->,200.171.4.222,6667,FSPA_FSPA,0,0,10,1148,590,flow=From-Botnet-V42-TCP-CC1-HTTP-Not-Encrypted
    2011/08/10 11:09:23.545442,87.400864,tcp,147.32.84.165,1289,   ->,212.117.171.138,65500,FSPA_FSPA,0,0,31,2627,1607,flow=From-Botnet-V42-TCP-Not-Encrypted-SMTP-Private-Proxy-1
    2011/08/10 11:09:26.500381,2.943700,tcp,147.32.84.165,1290,   ->,187.106.81.34,6667,S_,0,,2,124,124,flow=From-Botnet-V42-TCP-Attempt
    2011/08/10 11:09:26.571864,1.517806,udp,147.32.84.165,1025,  <->,147.32.80.9,53,CON,0,0,3,370,158,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 11:09:27.022695,0.201243,udp,147.32.84.165,1291,  <->,147.32.80.9,53,CON,0,0,2,553,78,flow=From-Botnet-V42-UDP-DNS
    2011/08/10 11:09:27.225336,500.001862,tcp,147.32.84.165,1292,   ->,195.113.232.98,80,SPA_FSPA,0,0,14,5498,531,flow=From-Botnet-V42-TCP-Established-HTTP-Ad-40
    2011/08/10 11:09:28.090806,0.023474,udp,147.32.84.165,1293,  <->,94.100.28.114,9381,CON,0,0,2,120,60,flow=From-Botnet-V42-UDP-Established
    2011/08/10 11:09:28.122236,3545.593750,udp,147.32.84.165,1293,  <->,95.211.58.97,8399,CON,0,0,124,7836,5847,flow=From-Botnet-V42-UDP-Established
    2011/08/10 11:09:30.204903,17.036390,tcp,147.32.84.165,1294,   ->,60.199.114.56,10298,SPA_FSPA,0,0,56,46078,1531,flow=From-Botnet-V42-TCP-Established
    2011/08/10 11:09:30.205052,15.332468,tcp,147.32.84.165,1295,   ->,77.79.4.96,41422,SPA_FSPA,0,0,58,46712,1647,flow=From-Botnet-V42-TCP-Established
    2011/08/10 11:09:30.205209,15.348065,tcp,147.32.84.165,1296,   ->,77.79.4.96,41422,SPA_FSPA,0,0,58,46200,1647,flow=From-Botnet-V42-TCP-Established
    2011/08/10 11:09:30.205413,16.994499,tcp,147.32.84.165,1297,   ->,60.199.114.56,10298,SPA_FSPA,0,0,56,46589,1530,flow=From-Botnet-V42-TCP-Established

    可以看到,有广告、垃圾邮件等。上述数据的处理:

    import socket, struct, sys
    import numpy as np
    import pickle
    
    def loaddata(fileName):
        
    
        file = open(fileName, 'r')
    
        xdata = []
        ydata = []
        xdataT = []
        ydataT = []
        flag=0
        count1=0
        count2=0
        count3=0
        count4=0
    
        #dicts to convert protocols and state to integers
        protoDict = {'arp': 5, 'unas': 13, 'udp': 1, 'rtcp': 7, 'pim': 3, 'udt': 11, 'esp': 12, 'tcp' : 0, 'rarp': 14, 'ipv6-icmp': 9, 'rtp': 2, 'ipv6': 10, 'ipx/spx': 6, 'icmp': 4, 'igmp' : 8}
    
        stateDict = {'': 1, 'FSR_SA': 30, '_FSA': 296, 'FSRPA_FSA': 77, 'SPA_SA': 31, 'FSA_SRA': 1181, 'FPA_R': 46, 'SPAC_SPA': 37, 'FPAC_FPA': 2, '_R': 1, 'FPA_FPA': 784, 'FPA_FA': 66, '_FSRPA': 1, 'URFIL': 431, 'FRPA_PA': 5, '_RA': 2, 'SA_A': 2, 'SA_RA': 125, 'FA_FPA': 17, 'FA_RA': 14, 'PA_FPA': 48, 'URHPRO': 380, 'FSRPA_SRA': 8, 'R_':541, 'DCE': 5, 'SA_R': 1674, 'SA_': 4295, 'RPA_FSPA': 4, 'FA_A': 17, 'FSPA_FSPAC': 7, 'RA_': 2230, 'FSRPA_SA': 255, 'NNS': 47, 'SRPA_FSPAC': 1, 'RPA_FPA': 42, 'FRA_R': 10, 'FSPAC_FSPA': 86, 'RPA_R': 3, '_FPA': 5, 'SREC_SA': 1, 'URN': 339, 'URO': 6, 'URH': 3593, 'MRQ': 4, 'SR_FSA': 1, 'SPA_SRPAC': 1, 'URP': 23598, 'RPA_A': 1, 'FRA_': 351, 'FSPA_SRA': 91, 'FSA_FSA': 26138, 'PA_': 149, 'FSRA_FSPA': 798, 'FSPAC_FSA': 11, 'SRPA_SRPA': 176, 'SA_SA': 33, 'FSPAC_SPA': 1, 'SRA_RA': 78, 'RPAC_PA': 1, 'FRPA_R': 1, 'SPA_SPA': 2989, 'PA_RA': 3, 'SPA_SRPA': 4185, 'RA_FA': 8, 'FSPAC_SRPA': 1, 'SPA_FSA': 1, 'FPA_FSRPA': 3, 'SRPA_FSA': 379, 'FPA_FRA': 7, 'S_SRA': 81, 'FSA_SA': 6, 'State': 1, 'SRA_SRA': 38, 'S_FA': 2, 'FSRPAC_SPA': 7, 'SRPA_FSPA': 35460, 'FPA_A': 1, 'FSA_FPA': 3, 'FRPA_RA': 1, 'FSAU_SA': 1, 'FSPA_FSRPA': 10560, 'SA_FSA': 358, 'FA_FRA': 8, 'FSRPA_SPA': 2807, 'FSRPA_FSRA': 32, 'FRA_FPA': 6, 'FSRA_FSRA': 3, 'SPAC_FSRPA': 1, 'FS_': 40, 'FSPA_FSRA': 798, 'FSAU_FSA': 13, 'A_R': 36, 'FSRPAE_FSPA': 1, 'SA_FSRA': 4, 'PA_PAC': 3, 'FSA_FSRA': 279, 'A_A': 68, 'REQ': 892, 'FA_R': 124, 'FSRPA_SRPA': 97, 'FSPAC_FSRA':20, 'FRPA_RPA': 7, 'FSRA_SPA': 8, 'INT': 85813, 'FRPA_FRPA': 6, 'SRPAC_FSPA': 4, 'SPA_SRA': 808, 'SA_SRPA': 1, 'SPA_FSPA': 2118, 'FSRAU_FSA': 2, 'RPA_PA': 171,'_SPA': 268, 'A_PA': 47, 'SPA_FSRA': 416, 'FSPA_FSRPAC': 2, 'PAC_PA': 5, 'SRPA_SPA': 9646, 'SRPA_FSRA': 13, 'FPA_FRPA': 49, 'SRA_SPA': 10, 'SA_SRA': 838, 'PA_PA': 5979, 'FPA_RPA': 27, 'SR_RA': 10, 'RED': 4579, 'CON': 2190507, 'FSRPA_FSPA':13547, 'FSPA_FPA': 4, 'FAU_R': 2, 'ECO': 2877, 'FRPA_FPA': 72, 'FSAU_SRA': 1, 'FRA_FA': 8, 'FSPA_FSPA': 216341, 'SEC_RA': 19, 'ECR': 3316, 'SPAC_FSPA': 12, 'SR_A': 34, 'SEC_': 5, 'FSAU_FSRA': 3, 'FSRA_FSRPA': 11, 'SRC': 13, 'A_RPA': 1, 'FRA_PA': 3, 'A_RPE': 1, 'RPA_FRPA': 20, '_SRA': 74, 'SRA_FSPA': 293, 'FPA_': 118, 'FSRPAC_FSRPA': 2, '_FA': 1, 'DNP': 1, 'FSRPA_FSRPA': 379, 'FSRA_SRA': 14, '_FRPA': 1, 'SR_': 59, 'FSPA_SPA': 517, 'FRPA_FSPA': 1, 'PA_A': 159, 'PA_SRA': 1, 'FPA_RA': 5, 'S_': 68710, 'SA_FSRPA': 4, 'FSA_FSRPA': 1, 'SA_SPA': 4, 'RA_A': 5, '_SRPA': 9, 'S_FRA': 156, 'FA_FRPA': 1, 'PA_R': 72, 'FSRPAEC_FSPA': 1, '_PA': 7, 'RA_S': 1, 'SA_FR': 2, 'RA_FPA': 6, 'RPA_': 5, '_FSPA': 2395, 'FSA_FSPA': 230, 'UNK': 2, 'A_RA': 9, 'FRPA_': 6, 'URF': 10, 'FS_SA': 97, 'SPAC_SRPA': 8, 'S_RPA': 32, 'SRPA_SRA': 69, 'SA_RPA': 30, 'PA_FRA': 4, 'FSRA_SA': 49, 'FSRA_FSA': 206, 'PAC_RPA': 1, 'SRA_': 18, 'FA_': 451, 'S_SA': 6917, 'FSPA_SRPA': 427, 'TXD': 542,'SRA_SA': 1514, 'FSPA_FA': 1, 'FPA_FSPA': 10, 'RA_PA': 3, 'SRA_FSA': 709, 'SRPA_SPAC': 3, 'FSPAC_FSRPA': 10, 'A_': 191, 'URNPRO': 2, 'PA_RPA': 81, 'FSPAC_SRA':1, 'SRPA_FSRPA': 3054, 'SPA_': 1, 'FA_FA': 259, 'FSPA_SA': 75, 'SR_SRA': 1, 'FSA_': 2, 'SRPA_SA': 406, 'SR_SA': 3119, 'FRPA_FA': 1, 'PA_FRPA': 13, 'S_R': 34, 'FSPAEC_FSPAE': 3, 'S_RA': 61105, 'FSPA_FSA': 5326, '_SA': 20, 'SA_FSPA': 15, 'SRPAC_SPA': 8, 'FPA_PA': 19, 'FSRPAE_FSA': 1, 'S_A': 1, 'RPA_RPA': 3, 'NRS': 6, 'RSP': 115, 'SPA_FSRPA': 1144, 'FSRPAC_FSPA': 139}
    
        file.readline()
    
        for line in file:
            sd = line[:-1].split(',')
            dur, proto, Sport, Dport, Sip, Dip, totP, totB, label, state = sd[1], sd[2], sd[4], sd[7], sd[3], sd[6], sd[-4], sd[-3], sd[-1], sd[8]
            try:
                Sip = socket.inet_aton(Sip)
                Sip = struct.unpack("!L", Sip)[0]
            except:
                continue
            try:
                Dip = socket.inet_aton(Dip)
                Dip = struct.unpack("!L", Dip)[0]
            except:
                continue
            if Sport=='': continue
            if Dport=='': continue
            #back, nor, bot
            try:
    
                if "Background" in label: 
                    label=0
    
                elif "Normal" in label:
                    label = 0
    
                elif "Botnet" in label: # 看来就是做了一个0、1分类,并没有做具体的僵尸网络类型识别
                    label = 1
    
    
                if flag==0:
                    #Training Dataset
                    if label==0 and count1<20001:
                        xdata.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]])
                        ydata.append(label)
                        count1+=1
    
                    elif label==1 and count2<20001:
                        xdata.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]])
                        ydata.append(label)
                        count2+=1
    
                    elif count1>19999 and count2>19999:
                        #print("HI")
                        flag=1
    
                else:
                    #Test dataset
                    if label==0 and count3<5001:
                        #print("H")
                        xdataT.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]])
                        ydataT.append(label)
                        count3+=1
                    elif label==1 and count4<5001:
                        xdataT.append([float(dur), protoDict[proto], int(Sport), int(Dport), Sip, Dip, int(totP), int(totB), stateDict[state]])
                        ydataT.append(label)
                        count4 += 1
                    elif count3>4999 and count4>4999:
                        break
            except:
                continue
    
        #pickle the dataset for fast loading
        file = open('flowdata.pickle', 'wb')
        pickle.dump([np.array(xdata), np.array(ydata), np.array(xdataT), np.array(ydataT)], file)
    
        #return the training and the test dataset
        return np.array(xdata), np.array(ydata), np.array(xdataT), np.array(ydataT)
    
    if __name__ == "__main__":
        loaddata('flowdata.binetflow')
    

     下文根据mean和std计算特征的操作也是不太稳健的,有可能真实网络的数据分布不是这个。

    Build botnet detectors using machine learning algorithms in Python [Tutorial]

    0 12594

    Botnets are connected computers that perform a number of repetitive tasks to keep websites going. Connected devices play an important role in modern life. From smart home appliances, computers, coffee machines, and cameras, to connected cars, this huge shift in our lifestyles has made our lives easier. Unfortunately, these exposed devices could be easily targeted by attackers and cybercriminals who could use them later to enable larger-scale attacks. Security vendors provide many solutions and products to defend against botnets, but in this tutorial, we are going to learn how to build novel botnet detection systems with Python and machine learning techniques.

    You will find all the code discussed, in addition to some other useful scripts, in the following repository: https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing/tree/master/Chapter05

    This article is an excerpt from a book written by Chiheb Chebbi titled Mastering Machine Learning for Penetration Testing

    We are going to learn how to build different botnet detection systems with many machine learning algorithms. As a start to a first practical lab, let’s start by building a machine learning-based botnet detector using different classifiers. By now, I hope you have acquired a clear understanding about the major steps of building machine learning systems. So, I believe that you already know that, as a first step, we need to look for a dataset.

    Many educational institutions and organizations are given a set of collected datasets from internal laboratories. One of the most well known botnet datasets is called the CTU-13 dataset. It is a labeled dataset with botnet, normal, and background traffic delivered by CTU University, Czech Republic. During their work, they tried to capture real botnet traffic mixed with normal traffic and background traffic. To download the dataset and check out more information about it, you can visit the following link: https://mcfp.weebly.com/the-ctu-13-dataset-a-labeled-dataset-with-botnet-normal-and-background-traffic.html.

    The dataset is bidirectional NetFlow files. But what are bidirectional NetFlow files? Netflow is an internet protocol developed by Cisco. The goal of this protocol is to collect IP traffic information and monitor network traffic in order to have a clearer view about the network traffic flow. The main components of a NetFlow architecture are a NetFlow Exporter, a Netflow collector, and a Flow Storage. The following diagram illustrates the different components of a NetFlow infrastructure:

    When it comes to NetFlow generally, when host A sends an information to host B and from host B to host A as a reply, the operation is named unidirectional NetFlow. The sending and the reply are considered different operations. In bidirectional NetFlow, we consider the flows from host A and host B as one flow. Let’s download the dataset by using the following command:

    $ wget --no-check-certificate https://mcfp.felk.cvut.cz/publicDatasets/CTU-13-Dataset/CTU-13-Dataset.tar.bz2

    Extract the downloaded tar.bz2 file by using the following command:

    # tar xvjf  CTU-13-Dataset.tar.bz2

    The file contains all the datasets, with the different scenarios. For the demonstration, we are going to use dataset 8 (scenario 8). You can select any scenario or you can use your own collected data, or any other .binetflow files delivered by other institutions:

    Load the data using pandas as usual:

    >>> import pandas as pd
    >>> data = pd.read_csv("capture20110816-3.binetflow")
    >>> data['Label'] = data.Label.str.contains("Botnet")

    Exploring the data is essential in any data-centric project. For example, you can start by checking the names of the features or the columns:

    >> data.columns

    The command results in the columns of the dataset: StartTime, Dur, Proto, SrcAddr, Sport, Dir, DstAddr, Dport, State, sTos, dTos, TotPkts, TotBytes, SrcBytes, and Label. The columns represent the features used in the dataset; for example, Dur represents duration, Sport represents the source port, and so on. You can find the full list of features in the chapter’s GitHub repository.

    Before training the model, we need to build some scripts to prepare the data. This time, we are going to build a separate Python script to prepare data, and later we can just import it into the main script.

    I will call the first script DataPreparation.py. There are many proposals done to help extract the features and prepare data to build botnet detectors using machine learning. In our case, I customized two new scripts inspired by the data loading scripts built by NagabhushanS:

    from __future__ import division
    import os, sys
    import threading

    After importing the required Python packages, we created a class called Prepare to select training and testing data:

    class Prepare(threading.Thread):   
    def __init__(self, X, Y, XT, YT, accLabel=None):
        threading.Thread.__init__(self)
        self.X = X
        self.Y = Y
        self.XT=XT
        self.YT=YT
        self.accLabel= accLabel
    def run(self):
    X = np.zeros(self.X.shape)
    Y = np.zeros(self.Y.shape)
    XT = np.zeros(self.XT.shape)
    YT = np.zeros(self.YT.shape)
    np.copyto(X, self.X)
    np.copyto(Y, self.Y)
    np.copyto(XT, self.XT)
    np.copyto(YT, self.YT)
    for i in range(9):
    X[:, i] = (X[:, i] - X[:, i].mean()) / (X[:, i].std())
    for i in range(9):
    XT[:, i] = (XT[:, i] - XT[:, i].mean()) / (XT[:, i].std())

    The second script is called LoadData.py. You can find it on GitHub and use it directly in your projects to load data from .binetflow files and generate a pickle file.

    Let’s use what we developed previously to train the models. After building the data loader and preparing the machine learning algorithms that we are going to use, it is time to train and test the models.

    First, load the data from the pickle file, which is why we need to import the pickle Python library. Don’t forget to import the previous scripts using:

    import LoadData
    import DataPreparation
    import pickle
    file = open('flowdata.pickle', 'rb')
    data  = pickle.load(file)

    Select the data sections:

    Xdata = data[0]
    Ydata =  data[1]
    XdataT = data[2]
    YdataT = data[3]

    As machine learning classifiers, we are going to try many different algorithms so later we can select the best algorithm for our model. Import the required modules to use four machine learning algorithms from sklearn:

    from sklearn.linear_model import *
    from sklearn.tree import *
    from sklearn.naive_bayes import *
    from sklearn.neighbors import *

    Prepare the data by using the previous module build. Don’t forget to import DataPreparation by typing import DataPreparation:

    >>> DataPreparation.Prepare(Xdata,Ydata,XdataT,YdataT)

    Now, we can train the models; and to do that, we are going to train the model with different techniques so later we can select the most suitable machine learning technique for our project. The steps are like what we learned in previous projects: after preparing the data and selecting the features, define the machine learning algorithm, fit the model, and print out the score after defining its variable.

    As machine learning classifiers, we are going to test many of them. Let’s start with a decision tree:

    • Decision tree model:
    >>> clf = DecisionTreeClassifier()
    >>> clf.fit(Xdata,Ydata)
    >>> Prediction = clf.predict(XdataT)
    >>> Score = clf.score(XdataT,YdataT)
    >>> print (“The Score of the Decision Tree Classifier is”, Score * 100)

    The score of the decision tree classifier is 99%

    • Logistic regression model:
    >>> clf = LogisticRegression(C=10000)
    >>> clf.fit(Xdata,Ydata)
    >>> Prediction = clf.predict(XdataT)
    >>> Score = clf.score(XdataT,YdataT)
    
    >>> print ("The Score of the Logistic Regression Classifier is", Score * 100)

    The score of the logistic regression classifier is 96%

    • Gaussian Naive Bayes model:
    >>> clf = GaussianNB()
    >>> clf.fit(Xdata,Ydata)
    >>> Prediction = clf.predict(XdataT)
    >>> Score = clf.score(XdataT,YdataT)
    >>> print("The Score of the Gaussian Naive Bayes classifier is", Score * 100)

    The score of the Gaussian Naive Bayes classifier is 72%

    • k-Nearest Neighbors model:
    >>> clf = KNeighborsClassifier()
    >>> clf.fit(Xdata,Ydata)
    >>> Prediction = clf.predict(XdataT)
    >>> Score = clf.score(XdataT,YdataT)
    >>> print("The Score of the K-Nearest Neighbours classifier is", Score * 100)

    The score of the k-Nearest Neighbors classifier is 96%

    • Neural network model:

    To build a Neural network Model use the following code:

    >>> from keras.models import *
    >>> from keras.layers import Dense, Activation
    >>> from keras.optimizers import *
    model = Sequential()
    model.add(Dense(10, input_dim=9, activation="sigmoid")) model.add(Dense(10, activation='sigmoid'))
    model.add(Dense(1))
    sgd = SGD(lr=0.01, decay=0.000001, momentum=0.9, nesterov=True) 
    model.compile(optimizer=sgd, loss='mse')
    model.fit(Xdata, Ydata, nb_epoch=200, batch_size=100)
    Score = model.evaluate(XdataT, YdataT, verbose=0)
    Print(“The Score of the Neural Network is”, Score * 100 )

    With this code, we imported the required Keras modules, we built the layers, we compiled the model with an SGD optimizer, we fit the model, and we printed out the score of the model.

  • 相关阅读:
    初学者常用的LINUX命令
    logging模块全总结
    logging模块初识
    xpath知多少
    selenium常用的API
    系统测试基础(适合入门)
    JavaScript正则表达式(一)
    webpack2学习(二)
    webpack2学习(一)
    SQL语句学习
  • 原文地址:https://www.cnblogs.com/bonelee/p/14930051.html
Copyright © 2020-2023  润新知