1. install software
Cygwin, jdk, ant, nutch
2. configure
- environment variable
JAVA_HOME = C:PROGRA~1Javajdk1.7.0_45
ANT_HOME = C:PROGRA~1Antapache-ant-1.9.3
PATH = ...
- copy source file
copy apache-nutch-2.2.1-src folder into home of Cygwin
- build
enter home/apache-nutch-2.2.1-src then build
ant
It takes about half an hour to download dependency.
3. test
Stan@Stan-PC ~/nutch/runtime/local $ ls bin conf lib plugins test Stan@Stan-PC ~/nutch/runtime/local $ bin/nutch Usage: nutch COMMAND where COMMAND is one of: inject inject new urls into the database hostinject creates or updates an existing host table from a text file generate generate new batches to fetch from crawl db fetch fetch URLs marked during generate parse parse URLs marked during fetch updatedb update web table after parsing updatehostdb update host table after parsing readdb read/dump records from page database readhostdb display entries from the hostDB elasticindex run the elasticsearch indexer solrindex run the solr indexer on parsed batches solrdedup remove duplicates from solr parsechecker check the parser for a given url indexchecker check the indexing filters for a given url plugin load a plugin and run one of its classes main() nutchserver run a (local) Nutch server on a user defined port junit runs the given JUnit test or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters. Stan@Stan-PC ~/nutch/runtime/local
continue...