1. UIMA 集成
你可以使用solr集成Apache的非结构化信息管理架构(UIMA).UIMA可以让你定义自己的分析引擎通道,逐步添加元数据到文档的标注.
关于Solr UIMA的更多信息,参考https://wiki.apache.org/solr/SolrUIMA.
1.1 Configuring UIMA
solr UIMA的UpdateRequestProcessor是一个自定义的更新请求处理器.发送它们给UIMA管道,然后返回具有丰富元数据的文档.按照下面步骤配置UIMA:
1. solrconfig.xml,复制/solr-4.x.y/dist/solr-uima-4.x.y.jar包和它的contrib/uima/lib下面的类库到solr的类库目录下.
<lib dir="../../contrib/uima/lib" /> <lib dir="../../dist/" regex="solr-uima-d.*.jar" />
2.schema.xml中,添加元数据字段:
<field name="language" type="string" indexed="true" stored="true" required="false" /> <field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false" /> <field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />
3.在solrconfig.xml中添加如下片段:
<updateRequestProcessorChain name="uima"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"> <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str> <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str> <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str> <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str> <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str> <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str> </lst> <str name="analysisEngine"> /org/apache/uima/desc/OverridingParamsExtServicesAE.xml </st r> <!-- Set to true if you want to continue indexing even if text processing fails. Default is false. That is, Solr throws RuntimeException and never indexed documents entirely in your session. --> <bool name="ignoreErrors">true</bool> <!-- This is optional. It is used for logging when text processing fails. If logField is not specified, uniqueKey will be used as logField. <str name="logField">id</str> --> <lst name="analyzeFields"> <bool name="merge">false</bool> <arr name="fields"> <str>text</str> </arr> </lst> <lst name="fieldMappings"> <lst name="type"> <str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str> <lst name="mapping"> <str name="feature">text</str> <str name="field">concept</str> </lst> </lst> <lst name="type"> <str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str> <lst name="mapping"> <str name="feature">language</str> <str name="field">language</str> </lst> </lst> <lst name="type"> <str name="name">org.apache.uima.SentenceAnnotation</str> <lst name="mapping"> <str name="feature">coveredText</str> <str name="field">sentence</str> </lst> </lst> </lst> </lst> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
4. 在solrconfig.xml中替换已经存在的UpdateRequestHandler或者创建新的UpdateRequestHandler.
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler"> <lst name="defaults"> <str name="update.processor">uima</str> </lst> </requestHandler>