• Hive分析hadoop进程日志


    想把hadoop的进程日志导入hive表进行分析,遂做了以下的尝试。

    关于hadoop进程日志的解析
    使用正则表达式获取四个字段,一个是日期时间,一个是日志级别,一个是类,最后一个是详细信息,
    然后在hive中建一个表,可以用来方便查询。

    2015-12-18 22:23:23,357 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 32652 for container-id container_1448915696877_26289_01_000158: 110.6 MB of 2 GB physical memory used; 2.1 GB of 4.2 GB virtual memory used
    2015-12-18 22:23:23,426 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 32615 for container-id container_1448915696877_26289_01_000102: 104.6 MB of 2 GB physical memory used; 2.1 GB of 4.2 GB virtual memory used
    2015-12-18 22:23:23,467 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Uncaught exception in ContainerMemoryManager while managing memory of container_1448915696877_26289_01_000270
    java.lang.IllegalArgumentException: disparate values
            at sun.misc.FDBigInt.quoRemIteration(FloatingDecimal.java:2931)
            at sun.misc.FormattedFloatingDecimal.dtoa(FormattedFloatingDecimal.java:922)
            at sun.misc.FormattedFloatingDecimal.<init>(FormattedFloatingDecimal.java:542)
            at java.util.Formatter$FormatSpecifier.print(Formatter.java:3264)
            at java.util.Formatter$FormatSpecifier.print(Formatter.java:3202)
            at java.util.Formatter$FormatSpecifier.printFloat(Formatter.java:2769)
            at java.util.Formatter$FormatSpecifier.print(Formatter.java:2720)
            at java.util.Formatter.format(Formatter.java:2500)
            at java.util.Formatter.format(Formatter.java:2435)
            at java.lang.String.format(String.java:2148)
            at org.apache.hadoop.util.StringUtils.format(StringUtils.java:123)
            at org.apache.hadoop.util.StringUtils$TraditionalBinaryPrefix.long2String(StringUtils.java:758)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.formatUsageString(ContainersMonitorImpl.java:487)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:399)
    2015-12-18 22:23:23,498 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Uncaught exception in ContainerMemoryManager while managing memory of container_1448915696877_26289_01_000214

    DROP TABLE IF EXISTS hadoop_log; 
    
    CREATE TABLE hadoop_log (
    date1 STRING,
      time1 STRING,
      msgtype STRING,
      classname STRING,
      msgtext STRING
      ) 
    
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
    
    WITH SERDEPROPERTIES (
      "input.regex" = "^(\d{4}-\d{2}-\d{2})\s+(\d{2}.\d{2}.\d{2}.\d{3})\s+(\S+)\s+(\S+)\s+(.*)$", 
    
    "output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
    )
    STORED AS TEXTFILE; 
    
    LOAD DATA LOCAL INPATH "/home/student/hadooplog" INTO TABLE hadoop_log; 
    
    SELECT date1, time1, msgtext FROM hadoop_log WHERE msgtype='ERROR' OR msgtype='WARN' LIMIT 5; 
    
    LOAD DATA LOCAL INPATH "/home/student/hadooplog3" OVERWRITE INTO TABLE hadoop_log;

    需要注意的一点是,hive以 做为行分隔符,所以需要对原有的日志文件进行处理,因为原有的日志文件中可能有异常或错误发生,这个时候是多行的。否则hive中会有很

    多空的记录。

    可以写一段bash shell或python来完成需要的功能。下面是我刚学python写的,很简陋。

    import re
    p=re.compile(r"^d{4}-d{2}-d{2}s+d{2}.d{2}.d{2}.d{3} INFO|WARN|ERROR|DEBUG")
    str=""
    f2=open('/home/student/hadooplog4','w')
    with open('/app/cdh23502/logs/hadoop-student-datanode-nn1.log','r') as f:
        for l in f:
            if(str==""):
                str=l.rstrip()
                continue
            if(str!="" and len(p.findall(l))>0):
                print "
    "+str
            f2.write(str+"
    ")
                str=l.rstrip()
            else:
                str=str+l.rstrip()
        print "
    " + str
        f2.write(str+"
    ")
    
    f2.flush()
    f2.close()
  • 相关阅读:
    CF1454F Array Partition
    leetcode1883 准时抵达会议现场的最小跳过休息次数
    leetcode1871 跳跃游戏 VII
    leetcode1872 石子游戏VIII
    CF1355C Count Triangles
    CF1245D Shichikuji and Power Grid
    CF1368C Even Picture
    CF1368D AND, OR and square sum
    CF1395C Boboniu and Bit Operations
    SpringBoot和开发热部署
  • 原文地址:https://www.cnblogs.com/huaxiaoyao/p/5066454.html
Copyright © 2020-2023  润新知