一、Nutch日志实现方式
1、Nutch使用slf4j作为日志接口,使用log4j作为具体实现。关于二者的基础,请参考
http://blog.csdn.net/jediael_lu/article/details/43854571
http://blog.csdn.net/jediael_lu/article/details/43865571
2、在java类文件中,通过以下方式输出日志消息:
(1)获取Logger对象
public static final Logger LOG = LoggerFactory.getLogger(InjectorJob.class);
(2)使用Logger进行输出
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); long start = System.currentTimeMillis(); LOG.info("InjectorJob: starting at " + sdf.format(start));
3、在log4j.properties中定义各个属性
# Define some default values that can be overridden by system properties hadoop.log.dir=. hadoop.log.file=hadoop.log # RootLogger - DailyRollingFileAppender log4j.rootLogger=INFO,DRFA # Logging Threshold log4j.threshold=ALL #special logging requirements for some commandline tools log4j.logger.org.apache.nutch.crawl.Crawl=INFO,cmdstdout log4j.logger.org.apache.nutch.crawl.InjectorJob=INFO,cmdstdout log4j.logger.org.apache.nutch.host.HostInjectorJob=INFO,cmdstdout log4j.logger.org.apache.nutch.crawl.GeneratorJob=INFO,cmdstdout log4j.logger.org.apache.nutch.crawl.DbUpdaterJob=INFO,cmdstdout log4j.logger.org.apache.nutch.host.HostDbUpdateJob=INFO,cmdstdout log4j.logger.org.apache.nutch.fetcher.FetcherJob=INFO,cmdstdout log4j.logger.org.apache.nutch.parse.ParserJob=INFO,cmdstdout log4j.logger.org.apache.nutch.indexer.IndexingJob=INFO,cmdstdout log4j.logger.org.apache.nutch.indexer.DeleteDuplicates=INFO,cmdstdout log4j.logger.org.apache.nutch.indexer.CleaningJob=INFO,cmdstdout log4j.logger.org.apache.nutch.crawl.WebTableReader=INFO,cmdstdout log4j.logger.org.apache.nutch.host.HostDbReader=INFO,cmdstdout log4j.logger.org.apache.nutch.parse.ParserChecker=INFO,cmdstdout log4j.logger.org.apache.nutch.indexer.IndexingFiltersChecker=INFO,cmdstdout log4j.logger.org.apache.nutch.plugin.PluginRepository=WARN log4j.logger.org.apache.nutch.api.NutchServer=INFO,cmdstdout log4j.logger.org.apache.nutch=INFO log4j.logger.org.apache.hadoop=WARN log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.apache.gora=WARN # # Daily Rolling File Appender # log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file} # Rollver at midnight log4j.appender.DRFA.DatePattern=.yyyy-MM-dd # 30-day backup #log4j.appender.DRFA.MaxBackupIndex=30 log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout # Pattern format: Date LogLevel LoggerName LogMessage log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n # Debugging Pattern format: Date LogLevel LoggerName (FileName:MethodName:LineNo) LogMessage #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n # # stdout # Add *stdout* to rootlogger above if you want to use this # log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n # # plain layout used for commandline tools to output to console # log4j.appender.cmdstdout=org.apache.log4j.ConsoleAppender log4j.appender.cmdstdout.layout=org.apache.log4j.PatternLayout log4j.appender.cmdstdout.layout.ConversionPattern=%m%n # # Rolling File Appender # #log4j.appender.RFA=org.apache.log4j.RollingFileAppender #log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file} # Logfile size and and 30-day backups #log4j.appender.RFA.MaxFileSize=1MB #log4j.appender.RFA.MaxBackupIndex=30 #log4j.appender.RFA.layout=org.apache.log4j.PatternLayout #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
二、Nutch日志分析
1、nutch日志输出有2个appender: cmdstdout 与 DRFA。
前者将日志输出至标准输出中,后者将文件输出到每日一个的日志文件中。
2、整个工程的默认日志设置为INFO, DRFA
而nutch自身的日志被重定义为INFO,cmdstdout
hadoop, gora, zookeeper等则重定义为WARN,DRFA, 默认日志为./hadoop.log