1、下载Solr5.3.1
http://mirror.bit.edu.cn/apache/lucene/solr/5.3.1/
wget http://mirror.bit.edu.cn/apache/lucene/solr/5.3.0/solr-5.3.0.tgz
2、解压压缩包
tar zxf solr-5.3.1.tgz 或 unzip solr-5.3.1.zip
3、配置solr
1、复制solr项目文件
mkdir -p /data/web/solr/solr_app/
cp -r /data/solr-5.3.1/server/solr-webapp/webapp/* /data/web/solr/solr_app/
2、复制dll文件
cp /data/solr-5.3.1/server/lib/ext/* /data/web/solr/solr_app/WEB-INF/lib/
3、复制日志文件
mkdir /data/web/solr/solr_app/WEB-INF/classes cp /data/solr-5.3.1/server/resources/log4j.properties /data/web/solr/solr_app/WEB-INF/classes/
4、修改solr.log文件的存储位置:默认在/root/logs/solr.log
vim /data/web/solr/solr_app/WEB-INF/classes/log4j.properties
改成自己的日志路径
5、复制solr.xml文件到web.xml里面的<env-entry-value>的路径下
mkdir -p /data/web/solr/solr_app/WEB-INF/solr_home
cp /data/solr-5.3.1/example/example-DIH/solr/solr.xml /data/web/solr/solr_app/WEB-INF/solr_home/
6、配置solr_home
vim /data/web/solr/solr_app/WEB-INF/web.xml --修改env-entry-value的值:/data/web/solr/solr_app/WEB-INF/solr_home
tomcat配置->Server.xml->Connector->connectionTimeout="20000"不知道为什么,这个值大了启动tomcat,solr页面显示就是失败的。
启动tomcat,此时没有集合,如下图:
4、配置solr集合
1、进入solr_home,开始配置solr的索引库、分词器、数据源和定时任务:
cd /data/web/solr/solr_app/WEB-INF/solr_home/
2、为某一个语言创建solr配置,首先需要该语言的目录,比如:英文
mkdir pc_EN
cd pc_EN
touch core.properties
mkdir conf
mkdir data
3、编辑core.properties文件,设置索引名称和索引存放的位置:
vim core.properties --指定索引文件的存放位置(solr_index目录可以创建了mkdir -p /data/web/solr/solr_app/WEB-INF/solr_index)
--文件内容
name=pc_EN
dataDir=/data/web/solr/solr_app/WEB-INF/solr_index/master/pc_EN/data
4、进入conf目录设置索引的数据格式、数据源
cd conf
find /data -name solrconfig.xml
把rss文件夹下面的solrconfig.xml复制到pc_EN/conf目录里面
cp /data/solr-5.3.0/example/example-DIH/solr/rss/conf/solrconfig.xml solrconfig.xml
设置solrconfig.xml关联website-data-config.xml文件
vim solrconfig.xml --搜索name="/dataimport"
设置solrconfig.xml的solr搜索结果返回的数据格式为:xml
设置solrconfig.xml关联schema.xml文件,增加如下代码:
<requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <str name="replicateAfter">commit</str> <str name="replicateAfter">startup</str> <str name="confFiles">schema.xml</str> </lst> </requestHandler>
完整的solrconfig.xml文件
1 <?xml version="1.0" encoding="UTF-8" ?> 2 <!-- 3 Licensed to the Apache Software Foundation (ASF) under one or more 4 contributor license agreements. See the NOTICE file distributed with 5 this work for additional information regarding copyright ownership. 6 The ASF licenses this file to You under the Apache License, Version 2.0 7 (the "License"); you may not use this file except in compliance with 8 the License. You may obtain a copy of the License at 9 10 http://www.apache.org/licenses/LICENSE-2.0 11 12 Unless required by applicable law or agreed to in writing, software 13 distributed under the License is distributed on an "AS IS" BASIS, 14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 See the License for the specific language governing permissions and 16 limitations under the License. 17 --> 18 19 <!-- 20 This is a stripped down config file used for a simple example... 21 It is *not* a good example to work from. 22 --> 23 <config> 24 <luceneMatchVersion>5.3.1</luceneMatchVersion> 25 <!-- The DirectoryFactory to use for indexes. 26 solr.StandardDirectoryFactory, the default, is filesystem based. 27 solr.RAMDirectoryFactory is memory based, not persistent, and doesn't work with replication. --> 28 <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> 29 30 <dataDir>${solr.data.dir:}</dataDir> 31 32 <!-- To enable dynamic schema REST APIs, use the following for <schemaFactory>: 33 34 <schemaFactory class="ManagedIndexSchemaFactory"> 35 <bool name="mutable">true</bool> 36 <str name="managedSchemaResourceName">managed-schema</str> 37 </schemaFactory> 38 39 When ManagedIndexSchemaFactory is specified, Solr will load the schema from 40 he resource named in 'managedSchemaResourceName', rather than from schema.xml. 41 Note that the managed schema resource CANNOT be named schema.xml. If the managed 42 schema does not exist, Solr will create it after reading schema.xml, then rename 43 'schema.xml' to 'schema.xml.bak'. 44 45 Do NOT hand edit the managed schema - external modifications will be ignored and 46 overwritten as a result of schema modification REST API calls. 47 48 When ManagedIndexSchemaFactory is specified with mutable = true, schema 49 modification REST API calls will be allowed; otherwise, error responses will be 50 sent back for these requests. 51 --> 52 <codecFactory class="solr.SchemaCodecFactory"/> 53 <schemaFactory class="ClassicIndexSchemaFactory"/> 54 55 <updateHandler class="solr.DirectUpdateHandler2"> 56 <updateLog> 57 <str name="dir">${solr.data.dir:}</str> 58 <int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int> 59 </updateLog> 60 </updateHandler> 61 62 <query> 63 <!-- Max Boolean Clauses 64 65 Maximum number of clauses in each BooleanQuery, an exception 66 is thrown if exceeded. 67 68 ** WARNING ** 69 70 This option actually modifies a global Lucene property that 71 will affect all SolrCores. If multiple solrconfig.xml files 72 disagree on this property, the value at any given moment will 73 be based on the last SolrCore to be initialized. 74 75 --> 76 <maxBooleanClauses>1024</maxBooleanClauses> 77 78 79 <!-- Solr Internal Query Caches 80 81 There are two implementations of cache available for Solr, 82 LRUCache, based on a synchronized LinkedHashMap, and 83 FastLRUCache, based on a ConcurrentHashMap. 84 85 FastLRUCache has faster gets and slower puts in single 86 threaded operation and thus is generally faster than LRUCache 87 when the hit ratio of the cache is high (> 75%), and may be 88 faster under other scenarios on multi-cpu systems. 89 --> 90 91 <!-- Filter Cache 92 93 Cache used by SolrIndexSearcher for filters (DocSets), 94 unordered sets of *all* documents that match a query. When a 95 new searcher is opened, its caches may be prepopulated or 96 "autowarmed" using data from caches in the old searcher. 97 autowarmCount is the number of items to prepopulate. For 98 LRUCache, the autowarmed items will be the most recently 99 accessed items. 100 101 Parameters: 102 class - the SolrCache implementation LRUCache or 103 (LRUCache or FastLRUCache) 104 size - the maximum number of entries in the cache 105 initialSize - the initial capacity (number of entries) of 106 the cache. (see java.util.HashMap) 107 autowarmCount - the number of entries to prepopulate from 108 and old cache. 109 --> 110 <filterCache class="solr.FastLRUCache" 111 size="512" 112 initialSize="512" 113 autowarmCount="0"/> 114 115 <!-- Query Result Cache 116 117 Caches results of searches - ordered lists of document ids 118 (DocList) based on a query, a sort, and the range of documents requested. 119 Additional supported parameter by LRUCache: 120 maxRamMB - the maximum amount of RAM (in MB) that this cache is allowed 121 to occupy 122 --> 123 <queryResultCache class="solr.LRUCache" 124 size="512" 125 initialSize="512" 126 autowarmCount="0"/> 127 128 <!-- Document Cache 129 130 Caches Lucene Document objects (the stored fields for each 131 document). Since Lucene internal document ids are transient, 132 this cache will not be autowarmed. 133 --> 134 <documentCache class="solr.LRUCache" 135 size="512" 136 initialSize="512" 137 autowarmCount="0"/> 138 139 <!-- custom cache currently used by block join --> 140 <cache name="perSegFilter" 141 class="solr.search.LRUCache" 142 size="30" 143 initialSize="0" 144 autowarmCount="30" 145 regenerator="solr.NoOpRegenerator" /> 146 147 <!-- Lazy Field Loading 148 149 If true, stored fields that are not requested will be loaded 150 lazily. This can result in a significant speed improvement 151 if the usual case is to not load all stored fields, 152 especially if the skipped fields are large compressed text 153 fields. 154 --> 155 <enableLazyFieldLoading>true</enableLazyFieldLoading> 156 157 <!-- Result Window Size 158 159 An optimization for use with the queryResultCache. When a search 160 is requested, a superset of the requested number of document ids 161 are collected. For example, if a search for a particular query 162 requests matching documents 10 through 19, and queryWindowSize is 50, 163 then documents 0 through 49 will be collected and cached. Any further 164 requests in that range can be satisfied via the cache. 165 --> 166 <queryResultWindowSize>20</queryResultWindowSize> 167 168 <!-- Maximum number of documents to cache for any entry in the 169 queryResultCache. 170 --> 171 <queryResultMaxDocsCached>200</queryResultMaxDocsCached> 172 173 <!-- Use Cold Searcher 174 175 If a search request comes in and there is no current 176 registered searcher, then immediately register the still 177 warming searcher and use it. If "false" then all requests 178 will block until the first searcher is done warming. 179 --> 180 <useColdSearcher>false</useColdSearcher> 181 182 <!-- Max Warming Searchers 183 184 Maximum number of searchers that may be warming in the 185 background concurrently. An error is returned if this limit 186 is exceeded. 187 188 Recommend values of 1-2 for read-only slaves, higher for 189 masters w/o cache warming. 190 --> 191 <maxWarmingSearchers>2</maxWarmingSearchers> 192 193 </query> 194 195 <requestDispatcher handleSelect="true" > 196 <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048" formdataUploadLimitInKB="2048" /> 197 </requestDispatcher> 198 199 <requestHandler name="/select" class="solr.SearchHandler"> 200 <lst name="defaults"> 201 <str name="echoParams">explicit</str> 202 <str name="wt">xml</str> 203 <str name="indent">true</str> 204 <int name="rows">10</int> 205 </lst> 206 </requestHandler> 207 208 <requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" /> 209 210 <requestHandler name="/admin/ping" class="solr.PingRequestHandler"> 211 <lst name="invariants"> 212 <str name="q">*:*</str> 213 </lst> 214 <lst name="defaults"> 215 <str name="echoParams">all</str> 216 </lst> 217 </requestHandler> 218 219 <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> 220 <lst name="defaults"> 221 <str name="config">website-data-config.xml</str> 222 </lst> 223 </requestHandler> 224 225 <requestHandler name="/replication" class="solr.ReplicationHandler" > 226 <lst name="master"> 227 <str name="replicateAfter">commit</str> 228 <str name="replicateAfter">startup</str> 229 <str name="confFiles">schema.xml</str> 230 </lst> 231 </requestHandler> 232 233 <!-- config for the admin interface --> 234 <admin> 235 <defaultQuery>*:*</defaultQuery> 236 </admin> 237 238 </config>
schema.xml用来设置solr需要索引的字段
完整的schema.xml
1 <?xml version="1.0" ?> 2 3 <schema name="website" version="1.5"> 4 <types> 5 <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" /> 6 <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true" /> 7 <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/> 8 <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0" /> 9 <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" positionIncrementGap="0" /> 10 <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" /> 11 <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" positionIncrementGap="0" /> 12 <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0" /> 13 <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0" /> 14 <fieldType name="sfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0" /> 15 <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0" /> 16 <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0" /> 17 <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0" /> 18 <fieldType name="tdates" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0" multiValued="true"/> 19 <fieldType name="tints" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0" multiValued="true"/> 20 <fieldType name="tfloats" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0" multiValued="true"/> 21 <fieldType name="tlongs" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0" multiValued="true"/> 22 <fieldType name="tdoubles" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0" multiValued="true"/> 23 <fieldType name="text" class="solr.TextField"> 24 <analyzer type="index" class="org.apache.lucene.analysis.en.EnglishAnalyzer"/> 25 <analyzer type="query" class="org.apache.lucene.analysis.en.EnglishAnalyzer"/> 26 </fieldType> 27 </types> 28 <!-- general --> 29 <fields> 30 <field name="_version_" type="long" indexed="true" stored="true"/> 31 <field name="CultureID" type="int" indexed="false" stored="true" /> 32 <field name="DescriptionFull" type="text" indexed="true" stored="false" /> 33 <field name="DescriptionShort" type="text" indexed="true" stored="false" /> 34 <field name="ImageJSON" type="text" indexed="false" stored="true" /> 35 <field name="IsHot" type="int" indexed="false" stored="true" /> 36 <field name="IsMutilColor" type="int" indexed="false" stored="true" default="" /> 37 <field name="LeiMuNameJSON" type="text" indexed="true" stored="true" /> 38 <field name="PID" type="string" indexed="true" stored="true" /> 39 <field name="PropertyText" type="text" indexed="true" stored="true" /> 40 <field name="RequiredText" type="text" indexed="true" stored="true" /> 41 <field name="SPUID" type="long" indexed="true" stored="true" /> 42 <field name="Sort" type="int" indexed="true" stored="true" /> 43 <field name="Status" type="int" indexed="true" stored="true" /> 44 <field name="Title" type="text" indexed="true" stored="true" /> 45 <field name="UpTime" type="date" indexed="true" stored="true" /> 46 <field name="Price" type="double" indexed="true" stored="true" /> 47 <field name="SaleCount" type="long" indexed="true" stored="true" /> 48 <field name="CustomerRatingCount" type="long" indexed="false" stored="true" /> 49 <field name="DisCount" type="double" indexed="true" stored="true" /> 50 <field name="Basic_search" type="text" indexed="true" stored="false" multiValued="true"/> 51 </fields> 52 53 <!-- field to use to determine and enforce document uniqueness. --> 54 <uniqueKey>SPUID</uniqueKey> 55 <!-- field for the QueryParser to use when an explicit fieldname is absent --> 56 <defaultSearchField>Basic_search</defaultSearchField> 57 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> 58 <solrQueryParser defaultOperator="OR"/> 59 <copyField source="PID" dest="Basic_search" /> 60 <copyField source="DescriptionFull" dest="Basic_search" /> 61 <copyField source="DescriptionShort" dest="Basic_search" /> 62 <copyField source="LeiMuNameJSON" dest="Basic_search" /> 63 <copyField source="PropertyText" dest="Basic_search" /> 64 <copyField source="RequiredText" dest="Basic_search" /> 65 <copyField source="Title" dest="Basic_search" /> 66 </schema>
website-data-config.xml设置数据源和数据源格式与schema.xml的字段匹配
完整的website-data-config.xml
1 <?xml version="1.0" encoding="UTF-8" ?> 2 <dataConfig> 3 <dataSource type="URLDataSource" encoding="UTF-8" /> 4 <document> 5 <entity name="website" 6 processor="XPathEntityProcessor" 7 forEach="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel |/LuceneSpuXmlModel" 8 url="http://url/product?cultureId=1&pageSize=100&pageIndex=1&siteId=6&platform=1" 9 transformer="RegexTransformer,DateFormatTransformer" 10 connectionTimeout="120000" 11 readTimeout="300000" 12 stream="true"> 13 <field column="SPUID" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/SPUID" /> 14 <field column="PID" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/PID" /> 15 <field column="Title" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Title" /> 16 <field column="Status" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Status" /> 17 <field column="CultureID" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/CultureID" commonField="true" /> 18 <field column="LeiMuNameJSON" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/LeiMuNameJSON" /> 19 <field column="DescriptionShort" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/DescriptionShort" commonField="true" /> 20 <field column="DescriptionFull" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/DescriptionFull" commonField="true" /> 21 <field column="Sort" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Sort" /> 22 <field column="ImageJSON" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/ImageJSON" /> 23 <field column="PropertyText" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/PropertyText" /> 24 <field column="RequiredText" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/RequiredText" /> 25 <field column="IsHot" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/IsHot" /> 26 <field column="IsMutilColor" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/IsMutilColor" /> 27 <field column="UpTime" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/UpTime" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss"/> 28 <field column="Price" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/Price" /> 29 <field column="SaleCount" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/SaleCount" /> 30 <field column="CustomerRatingCount" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/CustomerRatingCount" /> 31 <field column="DisCount" xpath="/LuceneSpuXmlModel/LuceneSpuModelList/LuceneSpuModel/DisCount" /> 32 33 <field column="$hasMore" xpath="/LuceneSpuXmlModel/HasMore" /> 34 <field column="$nextUrl" xpath="/LuceneSpuXmlModel/NextPageUrl" /> 35 </entity> 36 </document> 37 </dataConfig>
启动Tomcat运行solr出错:
复制数据倒入的包:
cp /data/solr-5.3.1/dist/solr-dataimporthandler-* /data/web/solr/solr_app/WEB-INF/lib/
启动tomcat_solr成功界面如下:
5、设置solr定时任务
1、复制定时任务包(如果没有复制过)
cp /data/solr-5.3.1/dist/solr-dataimporthandler-* /data/web/solr/solr_app/WEB-INF/lib/
2、还需要一个jar也复制到/data/web/solr/solr_app/WEB-INF/lib/下面:
apache-solr-dataimportscheduler-1.0.jar
3、修改Web.xml,添加配置节点:
<listener>
<listener-class>
org.apache.solr.handler.dataimport.scheduler.ApplicationListener
</listener-class>
</listener>