• Hadoop学习:Map/Reduce初探与小Demo实现


    原文地址:https://blog.csdn.net/liyong199012/article/details/25423221

    一、    概念知识介绍

            Hadoop MapReduce是一个用于处理海量数据的分布式计算框架。这个框架解决了诸如数据分布式存储、作业调度、容错、机器间通信等复杂问题,可以使没有并行 处理或者分布式计算经验的工程师,也能很轻松地写出结构简单的、应用于成百上千台机器处理大规模数据的并行分布式程序。

           Hadoop MapReduce基于“分而治之”的思想,将计算任务抽象成map和reduce两个计算过程,可以简单理解为“分散运算—归并结果”的过程。一个 MapReduce程序首先会把输入数据分割成不相关的若干键/值对(key1/value1)集合,这些键/值对会由多个map任务来并行地处理。 MapReduce会对map的输出(一些中间键/值对key2/value2集合)按照key2进行排序,排序是用memcmp的方式对key在内存中 字节数组比较后进行升序排序,并将属于同一个key2的所有value2组合在一起作为reduce任务的输入,由reduce任务计算出最终结果并输出 key3/value3。作为一个优化,同一个计算节点上的key2/value2会通过combine在本地归并。基本流程如下:

           Hadoop和单机程序计算流程对比:

           常计算任务的输入和输出都是存放在文件里的,并且这些文件被存放在Hadoop分布式文件系统HDFS(Hadoop Distributed File System)中,系统会尽量调度计算任务到数据所在的节点上运行,而不是尽量将数据移动到计算节点上,减少大量数据在网络中传输,尽量节省带宽消耗。

           应用程序开发人员一般情况下需要关心的是图中灰色的部分,单机程序需要处理数据读取和写入、数据处理;Hadoop程序需要实现map和 reduce,而数据读取和写入、map和reduce之间的数据传输、容错处理等由Hadoop MapReduce和HDFS自动完成。

    二、    开发环境搭建

           Map/Reduce程序依赖Hadoop集群,另外Eclipse需要安装依赖的hadoop包。

           Hadoop集群搭建:参考Hadoop 2.2.0集群搭建

    1.   安装、配置Eclipse

           在官网下载合适的Eclipse,将hadoop开发所依赖的插件jar包拷贝到eclipse的安装文件夹plugins下。下载地址参考:hadoop2.2.0开发依赖的jar包,当然也可以自己编译。

           启动eclipse,选择Window—>Prefrances,若出现如下Hadoop Map/Reduce说明插件安装成功

    2.   配置DFS,主要是数据文件的输入输出管理。

           Window—>Open Perspective—>other—>Map/Reduce,显示Map/Reduce视图。点击Map/Reduce Locations 的小象图标,新建Hadoop Location,输入如下:

           项目视图会出现DFS Location,用来管理输入、输出数据文件。

           需要配置hadoop安装文件夹:新建Map/Reduce工程单击Configure Hadoop install direction,输入hadoop的安装路径。

           右键单击DFS Location下的空文件夹上传一个文本文件,然后刷新,若文件出现了则说明环境配置成功。

    三、    编程模型

           MapReduce编程模型的原理是:利用一个输入key/value pair集合来产生一个输出的key/value pair集合。MapReduce库的用户用两个函数表达这个计算:Map和Reduce。

           用户自定义的Map函数接受一个输入的key/value pair值,然后产生一个中间key/value pair值的集合。MapReduce库把所有具有相同中间key值I的中间value值集合在一起后传递给reduce函数。

           用户自定义的Reduce函数接受一个中间key的值I和相关的一个value值的集合。Reduce函数合并这些value值,形成一个较小的 value值的集合。一般的,每次Reduce函数调用只产生0或1个输出value值。通常我们通过一个迭代器把中间value值提供给Reduce函 数,这样我们就可以处理无法全部放入内存中的大量的value值的集合。

    四、    小例子

    1.      数据准备

           以Tomcat日志为例,日志格式如下:

    1. 127.0.0.1,-,-,[08/May/2014:13:42:40 +0800],GET / HTTP/1.1,200,11444
    2. 127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,-
    3. 127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,-
    4. 127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20
    5. 127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198
    6. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525
    7. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,-
    8. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,-
    9. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,-
    10. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,-
    11. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,-
    12. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105
    13. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603
    14. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,-
    15. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093
    16. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105
    17. 127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913
    18. 127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22
    19. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989
    20. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117
    21. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20
    22. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198
    23. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525
    24. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,-
    25. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,-
    26. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,-
    27. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,-
    28. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603
    29. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,-
    30. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105
    31. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105
    32. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,-
    33. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913
    34. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093
    35. 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22
    36. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989
    37. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117
    38. 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?graduateBatchName=&pageSize=10&pageNo=1 HTTP/1.1,200,597
    39. 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?graduateBatchName= HTTP/1.1,200,21
    40. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:26 +0800],GET /jyglFront/graduate_initGraduateBatch HTTP/1.1,200,8766
    41. 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089
    42. 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785
    43. 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
    44. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:28 +0800],GET /jyglFront/graduate_initGraduateQulifyCheck HTTP/1.1,200,26397
    45. 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089
    46. 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785
    47. 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
    48. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:29 +0800],GET /jyglFront/graduate_initLeaveSchoolInfo HTTP/1.1,200,20125
    49. 127.0.0.1,-,-,[08/May/2014:13:43:30 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089
    50. 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785
    51. 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
    52. 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch HTTP/1.1,200,597
    53. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:31 +0800],GET /jyglFront/graduate_initGraduateInfo HTTP/1.1,200,28464
    54. 127.0.0.1,-,-,[08/May/2014:14:27:10 +0800],GET / HTTP/1.1,200,11444
    55. 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,-
    56. 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,-
    57. 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43
    58. 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16
    59. 127.0.0.1,-,-,[08/May/2014:14:27:35 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653
    60. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:35 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551
    61. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:37 +0800],GET /jyglFront/exam_initsubstudentsubscribe HTTP/1.1,500,3900
    62. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:41 +0800],GET /jyglFront/supervisor/intoInitAssignmentDetail HTTP/1.1,200,1808
    63. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20
    64. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198
    65. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525
    66. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,-
    67. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,-
    68. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,-
    69. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,-
    70. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603
    71. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105
    72. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093
    73. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,-
    74. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,-
    75. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105
    76. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913
    77. 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22
    78. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989
    79. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:43 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117
    80. 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getAllNationInPage?pageSize=10&pageNo=1 HTTP/1.1,200,374
    81. 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getTotalNations HTTP/1.1,200,22
    82. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/baseInfo_nationInfoList HTTP/1.1,200,7471
    83. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/menuStyle2.css HTTP/1.1,404,1060
    84. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/basic.css HTTP/1.1,200,1476
    85. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:45 +0800],GET /jyglFront/common/css/_images/botton2.gif HTTP/1.1,404,1075
    86. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
    87. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785
    88. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= HTTP/1.1,200,12061
    89. 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject HTTP/1.1,200,6006
    90. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:48 +0800],GET /jyglFront/teaching/openReplaceChooseCourse HTTP/1.1,200,26455
    91. 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,-
    92. 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,-
    93. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:49 +0800],GET /jyglFront/teaching/openChooseCourse HTTP/1.1,200,1611
    94. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo HTTP/1.1,200,473
    95. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227
    96. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785
    97. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a042437c2c0801437ed1cdea0017 HTTP/1.1,200,20
    98. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a0423f41d66d013f5a1f766c00ce HTTP/1.1,200,20
    99. 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= HTTP/1.1,200,4849
    100. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/teaching/teachingPlanList HTTP/1.1,200,22794
    101. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/js/jquery.form.js HTTP/1.1,200,30330
    102. 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43
    103. 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16
    104. 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653
    105. 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:28:02 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551
    106. 127.0.0.1,-,-,[08/May/2014:14:28:19 +0800],POST /jygl/jaxrs/right/addUserLog HTTP/1.1,200,-
    107. 127.0.0.1,-,-,[08/May/2014:14:31:42 +0800],GET /jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1 HTTP/1.1,200,-

    2.      要解决的问题:统计资源(URL)被访问的次数。

    3.      编程实现

           想法:解析Tomcat日志,map的工作是将每一行日志中的URL截取作为key值,value为1表示1次,reduce的工作是将相同key值的行合并,value为总次数。

    代码如下:

    package org.ly.ccnu;
    import java.io.IOException;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    public class SecondTest extends Configured implements Tool{
        enum Counter{
            LINESKIP,
        }    
        public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
            private static final IntWritable one = new IntWritable(1); 
            public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
                String line = value.toString();
                try{
                    String[] lineSplit = line.split(",");
                    String requestUrl = lineSplit[4];
                    requestUrl = requestUrl.substring(requestUrl.indexOf(' ')+1, requestUrl.lastIndexOf(' '));
                    Text out = new Text(requestUrl);
                    context.write(out,one);
                }catch(java.lang.ArrayIndexOutOfBoundsException e){
                    context.getCounter(Counter.LINESKIP).increment(1);
                }            
            }
        }    
        public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{        
            public void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException{
                int count =  0;  
                for(IntWritable v : values){  
                    count = count + 1;  
                }  
                try {
                    context.write(key, new IntWritable(count));
                } catch (InterruptedException e) {
                     e.printStackTrace();
                }             
            }        
        }     
        @Override
        public int run(String[] args) throws Exception {
            Configuration conf = getConf();
            Job job = new Job(conf, "logAnalysis");
            job.setJarByClass(SecondTest.class);        
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));        
            job.setMapperClass(Map.class);
            job.setReducerClass(Reduce.class);
            job.setOutputFormatClass(TextOutputFormat.class);        
            //keep the same format with the output of Map and Reduce
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);        
            job.waitForCompletion(true);
            return job.isSuccessful()?0:1;
        }    
        public static void main(String[] args)throws Exception{        
            int res = ToolRunner.run(new Configuration(), new SecondTest(),args);        
            System.exit(res);
        }
    }

    4.      处理结果

    1. / 2
    2. /jygl/jaxrs/article/getArticleList/10-1 3
    3. /jygl/jaxrs/article/getTotalArticleRecords 3
    4. /jygl/jaxrs/enroll/educationLevelService/allEducationLevels 5
    5. /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos 2
    6. /jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo 1
    7. /jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject 1
    8. /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest 2
    9. /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd 2
    10. /jygl/jaxrs/exam/examParameterService/getAllGradeInfo 3
    11. /jygl/jaxrs/exam/examParameterService/getAllStudyCenters 3
    12. /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 2
    13. /jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1 1
    14. /jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch 1
    15. /jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?graduateBatchName=&pageSize=10&pageNo=1 1
    16. /jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?graduateBatchName= 1
    17. /jygl/jaxrs/nationInfo/getAllNationInPage?pageSize=10&pageNo=1 1
    18. /jygl/jaxrs/nationInfo/getTotalNations 1
    19. /jygl/jaxrs/right/addUserLog 1
    20. /jygl/jaxrs/right/getUserByLoginName/admin 3
    21. /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin 3
    22. /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= 1
    23. /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a0423f41d66d013f5a1f766c00ce 1
    24. /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a042437c2c0801437ed1cdea0017 1
    25. /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= 1
    26. /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 2
    27. /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO 2
    28. /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO 2
    29. /jyglFront/baseInfo_articleList?flag=1 3
    30. /jyglFront/baseInfo_nationInfoList 1
    31. /jyglFront/common/css/_images/botton2.gif 1
    32. /jyglFront/common/css/basic.css 1
    33. /jyglFront/common/css/menuStyle2.css 1
    34. /jyglFront/exam_initgroupsubscribestatistic 2
    35. /jyglFront/exam_initsubstudentsubscribe 1
    36. /jyglFront/graduate_initGraduateBatch 1
    37. /jyglFront/graduate_initGraduateInfo 1
    38. /jyglFront/graduate_initGraduateQulifyCheck 1
    39. /jyglFront/graduate_initLeaveSchoolInfo 1
    40. /jyglFront/js/jquery.form.js 1
    41. /jyglFront/mainView/navigate/images/allmenu.gif 3
    42. /jyglFront/mainView/navigate/images/leftmenu_bg.gif 3
    43. /jyglFront/mainView/navigate/images/logo.png 3
    44. /jyglFront/mainView/navigate/images/toggle_menu.gif 3
    45. /jyglFront/mainView/navigate/js/frame.js 3
    46. /jyglFront/mainView/navigate/js/jquery.js 3
    47. /jyglFront/mainView/navigate/js/tree.js 3
    48. /jyglFront/mainView/navigate/menuList.jsp 3
    49. /jyglFront/mainView/navigate/style/images/header_bg.jpg 3
    50. /jyglFront/mainView/navigate/style/style.css 3
    51. /jyglFront/mainView/studentView/style/images/nav_10.png 3
    52. /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 3
    53. /jyglFront/supervisor/intoInitAssignmentDetail 1
    54. /jyglFront/teaching/openChooseCourse 1
    55. /jyglFront/teaching/openReplaceChooseCourse 1
    56. /jyglFront/teaching/teachingPlanList 1
  • 相关阅读:
    Python 面向对象 —— super 的使用(Python 2.x vs Python 3.x)
    安全移除驱动器、弹出、卸载的差别及详细查看设备的运行前后的异同
    java中不常见的keyword:strictfp,transient
    textarea文本域宽度和高度(width、height)自己主动适应变化处理
    Android 输入框弹出样式
    .net下载优酷1080P视频
    Oracle Hints具体解释
    关于成本核算方法、步骤、成本分析的简单回复
    程序猿接私活经验总结,来自csdn论坛语录
    Android getResources的作用和须要注意点
  • 原文地址:https://www.cnblogs.com/boonya/p/9513059.html
Copyright © 2020-2023  润新知