• Nutch 1-build

    1. install software

    Cygwin,  jdk, ant, nutch

    2. configure

    • environment variable 

    JAVA_HOME = C:PROGRA~1Javajdk1.7.0_45

    ANT_HOME =  C:PROGRA~1Antapache-ant-1.9.3

    PATH = ...

    • copy source file

    copy apache-nutch-2.2.1-src folder into home of Cygwin

    • build

    enter home/apache-nutch-2.2.1-src then build


    It takes about half an hour to download dependency.

    3. test

    Stan@Stan-PC ~/nutch/runtime/local
    $ ls
    bin  conf  lib  plugins  test
    Stan@Stan-PC ~/nutch/runtime/local
    $ bin/nutch
    Usage: nutch COMMAND
    where COMMAND is one of:
     inject         inject new urls into the database
     hostinject     creates or updates an existing host table from a text file
     generate       generate new batches to fetch from crawl db
     fetch          fetch URLs marked during generate
     parse          parse URLs marked during fetch
     updatedb       update web table after parsing
     updatehostdb   update host table after parsing
     readdb         read/dump records from page database
     readhostdb     display entries from the hostDB
     elasticindex   run the elasticsearch indexer
     solrindex      run the solr indexer on parsed batches
     solrdedup      remove duplicates from solr
     parsechecker   check the parser for a given url
     indexchecker   check the indexing filters for a given url
     plugin         load a plugin and run one of its classes main()
     nutchserver    run a (local) Nutch server on a user defined port
     junit          runs the given JUnit test
     CLASSNAME      run the class named CLASSNAME
    Most commands print help when invoked w/o parameters.
    Stan@Stan-PC ~/nutch/runtime/local


  • 相关阅读:
    CCF NOI1067 最匹配的矩阵
    POJ NOI0105-29 数字反转
    POJ NOI0105-30 含k个3的数
    POJ NOI0105-32 求分数序列和
    POJ NOI0105-33 计算分数加减表达式的值
    POJ NOI0105-34 求阶乘的和
    POJ NOI0105-35 求出e的值
    POJ NOI0105-36 计算多项式的值
    POJ NOI0105-44 第n小的质数
    POJ NOI0105-43 质因数分解
  • 原文地址:https://www.cnblogs.com/harrysun/p/3516783.html
Copyright © 2020-2023  润新知