• .NET开发过程中的全文索引使用技巧之Solr(转)


       前言:相信许多人都听说过.net开发过程中基于Lucene.net实现的全文索引,而Solr是一个高性能,基于Lucene的全文搜索服务器。同时对其进行了扩展,提供了比Lucene更为丰富的查询语言,同时实现了可配置、可扩展并对查询性能进行了优化,并且提供了一个完善的功能管理界面,是一款非常优秀的全文搜索引引擎,这里我就绕过Lucene,直接说Solr的应用了,总之,Solr比Lucene更加方便简洁好用,而且上手快,开发效率高。

       Solr应用场景:涉及到大数据的全文搜索。尤其是电子商务平台还有现在流行的云计算,物联网等都是需要强大的数据量作为支撑的,使用Solr来进行数据 检索最合适不过了,而且Solr是免费开源的,门槛低、投资少见效快。关于Solr的一些优点我这里就不在累赘陈述了,园子里也有很多大神也写了很多关于 Solr的技术博文,我这里也只是抛砖引玉,见笑了。

       好了,这里就开始Solr的奇幻之旅吧

    基于.NET平台下的Solr开发步骤

    一、搭建Solr服务器,具体步骤如下:

       1.安装JDK,因为是.NET平台,不需要安装JRE、JAVA虚拟机,只安装JDK即可,而且安装JDK不需要手动去配置环境变量,它会自动帮我们配置好环境变量,很方便,这里我安装的是jdk1.7,官网地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html

       2.安装Tomcat8.0,官网地址:http://tomcat.apache.org/download-80.cgi,安装完成后启动Monitor Tomcat,浏览器地址栏输入http://localhost:8080/,能进入说明安装成功

       3.下载Solr,这里我用的是Solr4.4版本,下载后进行下列配置

      (1)解压Solr4.4,创建Solr目录,比如D:/SorlServer/one,将解压后的Solr4.4中的example目录下的Solr文件夹中的所有文件拷贝到创建的目录中

      (2)创建Solr Web应用,具体步骤,将解压后的Solr4.4中的dist目录下的Solr-4.4.0.war文件拷贝到Tomcat下,比如C:Program FilesApache Software FoundationTomcat 7.0webapps下,重命名为one.war,启动Tomcat后该文件会自动解压,进入到D:SorlServeronecollection1conf下,打开solrconfig.xml文件,找到 <dataDir>节点改为<dataDir>${solr.data.dir:c:/SorlServer/one/data}</dataDir>

    注意:这一步很重要:打开C:Program FilesApache Software FoundationTomcat 7.0webappsOneWEB-INF下的web.xml文件,找到<env-entry>节点开启,

    将env-entry-value值改为D:/SorlServer/one,如下:

    <env-entry>       

          <env-entry-name>solr/home</env-entry-name>

          <env-entry-value>D:/SorlServer/one</env-entry-value>

          <env-entry-type>java.lang.String</env-entry-type>

     </env-entry>

       (3)将解压后的Solr4.4下的/dist/solrj-lib目录中的所有jar包拷贝到C:Program FilesApache Software FoundationTomcat 7.0lib中

      (4)停止Tomcat,然后再启动,访问http://localhost:8080/one,即可打开

    注意:如果是开发英文网站,我们就不需要使用第三方的分词配置,Solr本身就内置支持英文分词,如果是其他语种比如小语种(日语、意大利、法语等等),大家可以去网上找相关的分词包,这里我们以中文分词为例,毕竟国内大部分网站都是中文为主的。

       4.配置中文分词,国内常用的分词器(庖丁解牛mmseg4jIKAnalyzer),这里我用的是IKAnalyzer,这个分词器比较活跃而且更新也快,挺好用的,具体步骤如下:

       (1)将IKAnalyzer的jar包以及IKAnalyzer.cfg.xml都复制到C:Program FilesApache Software FoundationTomcat 7.0webappsoneWEB-INFlib下

       (2)配置D:SorlServeronecollection1conf下的schema.xml,添加如下配置:

          <!-- 分词配置 -->

     <fieldType name="text_IKFENCHI" class="solr.TextField"> 

         <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>

     </fieldType>

        (3)停止Tomcat,然后再启动,访问http://localhost:8080/one/#/collection1/analysis,即可进行测试

        以上是Solr服务器端的相关配置工作

    二、开始基于.NET平台的Solr开发:

       1.下载Solr客户端组件,我用的是园子里的Terry大哥的EasyNet.Solr,地址在微软开源站:http://easynet.codeplex.com/

    Terry大哥已经把solr客户端封装的很完善了,里面封装了很多现成的方法和参数配置,我们直接可以拿过来用,利用Easynet.solr创建索引,然后再查询索引,具体使用方法如下:

      (1)下载EasyNet.Solr源码直接放到项目中,也可以将源码生成Dll组件后添加到项目引用进行使用,把源码放到项目中最好不过了,我们也可以对其进行调整来满足自己的需要

      (2)创建索引实体类,就是我们要保存的索引数据,比如创建一个产品实体类 

      

    using System;
    using System.Collections.Generic;
    
    namespace Seek.SearchIndex
    {
        public partial class IndexProductModel
        {
            public IndexProductModel()
            {
            }
    
            #region  Properties
            public int ID { get; set; }
            public int ProductID { get; set; }
            public string ClassPath { get; set; }
            public int ClassID1 { get; set; }
            public int ClassID2 { get; set; }
            public int ClassID3 { get; set; }
            public string Title { get; set; }
            public string Model { get; set; }
            public string PriceRange { get; set; }
            public string AttributeValues { get; set; }
            public string ProductImages { get; set; }
            public int MemberID { get; set; }
            public System.DateTime CreateDate { get; set; }
            public System.DateTime LastEditDate { get; set; }
            public string FileName { get; set; }
            public string ProductType { get; set; }
            public string Summary { get; set; }
            public string Details { get; set; }
            public string RelatedKeywords { get; set; }
            public int MemberGrade { get; set; }
            #endregion
        }
    }

         (3)配置Solr服务器端的xml,就是将咱们的这个索引实体类配置到Solr服务器上,进入D:SorlServeronecollection1conf,打开schema.xml文件,配置如下

       

    <field name="ID" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
       <field name="ProductID" type="int" indexed="true" stored="true"/>
       <!-- 快速高亮配置 termVectors="true" termPositions="true"  termOffsets="true" -->
       <field name="Title" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true"  termOffsets="true"/>
       <field name="Model" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true"  termOffsets="true"/>
       <field name="ClassPath" type="string" indexed="true" stored="true"/>
       <field name="ClassID1" type="int" indexed="true" stored="true"/>
       <field name="ClassID2" type="int" indexed="true" stored="true"/>
       <field name="ClassID3" type="int" indexed="true" stored="true"/>
       <field name="PriceRange" type="string" indexed="true" stored="true"/>
       <field name="AttributeValues" type="string" indexed="true" stored="true"/>
       <field name="ProductImages" type="string" indexed="true" stored="true"/>
       <field name="MemberID" type="int" indexed="true" stored="true"/>
       <field name="CreateDate" type="date" indexed="true" stored="true"/>
       <field name="LastEditDate" type="date" indexed="true" stored="true"/>
       <field name="FileName" type="string" indexed="true" stored="true"/>
       <field name="ProductType" type="string" indexed="true" stored="true"/>
       <field name="Summary" type="string" indexed="true" stored="false"/>
       <field name="Details" type="string" indexed="true" stored="false"/>
       <field name="RelatedKeywords" type="string" indexed="true" stored="true"/>
       <field name="MemberType" type="string" indexed="true" stored="true"/>
       <field name="MemberGrade" type="int" indexed="true" stored="true"/>

        (4)开始创建索引,最好能写一个生成索引的客户端程序,我这里提供一下自己的索引器关键代码

       

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Seek.SearchIndex;
    using System.Data;
    using System.Threading;
    using System.Configuration;
    using System.Reflection;
    using EasyNet.Solr;
    using EasyNet.Solr.Impl;
    using EasyNet.Solr.Commons;
    using System.Xml.Linq;
    using EasyNet.Solr.Commons.Params;
    using System.Threading.Tasks;
    
    namespace Seek.SearchIndex
    {
        /// <summary>
        /// 索引器
        /// </summary>
        public class Indexer
        {
            private readonly static OptimizeOptions optimizeOptions = new OptimizeOptions();
            private readonly static CommitOptions commitOptions = new CommitOptions() { SoftCommit = true };
            private readonly static ISolrResponseParser<NamedList, EasyNet.Solr.ResponseHeader> binaryResponseHeaderParser = new BinaryResponseHeaderParser();
            private readonly static IUpdateParametersConvert<NamedList> updateParametersConvert = new BinaryUpdateParametersConvert();
            private readonly static ISolrQueryConnection<NamedList> connection = new SolrQueryConnection<NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"] };
            private readonly static ISolrUpdateConnection<NamedList, NamedList> solrUpdateConnection = new SolrUpdateConnection<NamedList, NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"], ContentType = "application/javabin" };
            private readonly static ISolrUpdateOperations<NamedList> solr = new SolrUpdateOperations<NamedList, NamedList>(solrUpdateConnection, updateParametersConvert) { ResponseWriter = "javabin" };
            private readonly static ISolrQueryOperations<NamedList> solrQuery = new SolrQueryOperations<NamedList>(connection) { ResponseWriter = "javabin" };
    
            public enum State
            {
                /// <summary>
                /// 运行中
                /// </summary>
                Runing,
                /// <summary>
                /// 停止
                /// </summary>
                Stop,
                /// <summary>
                /// 中断
                /// </summary>
                Break
            }
            /// <summary>
            /// 窗口
            /// </summary>
            private Main form;
            /// <summary>
            /// 线程
            /// </summary>
            public Thread t;
            /// <summary>
            /// 消息状态
            /// </summary>
            public State state = State.Stop;
            /// <summary>
            /// 当前索引
            /// </summary>
            private long currentIndex = 0;
    
            public long CurrentIndex
            {
                get { return currentIndex; }
                set { currentIndex = value; }
            }
    
            private int _startId = AppCongfig.StartId;
    
            public int StartId
            {
                get { return _startId; }
                set { _startId = value; }
            }
    
            /// <summary>
            /// 产品总数
            /// </summary>
            private int productsCount = 0;
            /// <summary>
            /// 起始时间
            /// </summary>
            private DateTime startTime = DateTime.Now;
            /// <summary>
            /// 结束时间
            /// </summary>
            private DateTime endTime = DateTime.MinValue;
            private static object syncLock = new object();
            #region 单利模式
            private static Indexer instance = null;
    
            private Indexer(Main _form)
            {
                form = _form;
                productsCount = DataAccess.GetCount(0);       //产品数统计
                form.fullerTsslMaxNum.Text = productsCount.ToString();
                form.fullerProgressBar.Minimum = 0;
                form.fullerProgressBar.Maximum = productsCount;
            }
            public static Indexer GetInstance(Main form)
            {
                if (instance == null)
                {
                    lock (syncLock)
                    {
                        if (instance == null)
                        {
                            instance = new Indexer(form);
                        }
                    }
                }
                return instance;
            }
            #endregion
    
            /// <summary>
            /// 启动
            /// </summary>
            public void Start()
            {
                ThreadStart ts = new ThreadStart(FullerRun);
                t = new Thread(ts);
                t.Start();
            }
            /// <summary>
            /// 停止
            /// </summary>
            public void Stop()
            {
                state = State.Stop;
            }
            /// <summary>
            /// 中断
            /// </summary>
            public void Break()
            {
                state = State.Break;
            }
    
    
            /// <summary>
            /// 创建索引
            /// </summary>
            public void InitIndex(object data)
            {
                var docs = new List<SolrInputDocument>();
                DataTable list = data as DataTable;
                foreach (DataRow pro in list.Rows)
                {
                    var model = new SolrInputDocument();
    
                    PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到实体类属性的集合
                    string[] dateFields = { "CreateDate", "LastEditDate" };
                    string field = string.Empty;//存储fieldname
                    foreach (PropertyInfo propertyInfo in properites)//遍历数组
                    {
                        object val = pro[propertyInfo.Name];
                        if (val != DBNull.Value)
                        {
                            model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));
                        }
                    }
                    docs.Add(model);
    
                    StartId = Convert.ToInt32(pro["ID"]);
                }
                GetStartId();
                lock (syncLock)
                {
                    if (currentIndex <= productsCount)
                    {
                        form.fullerProgressBar.Value = (int)currentIndex;
                    }
                    form.fullerTsslCurrentNum.Text = currentIndex.ToString();
                }
                var result = solr.Update("/update", new UpdateOptions() {  Docs = docs });
            }
    
            /// <summary>
            /// 创建索引
            /// </summary>
            public void CreateIndexer(DataTable dt)
            {
                GetStartId();
                Parallel.ForEach<DataRow>(dt.AsEnumerable(), (row) =>
                {
                    //从数据库查询商品详细属性
                    if (row != null)
                    {
                        var docs = new List<SolrInputDocument>();
                        var model = new SolrInputDocument();
    
                        PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到实体类属性的集合
                        string[] dateFields = { "CreateDate", "LastEditDate" };
                        string field = string.Empty;//存储fieldname
                        foreach (PropertyInfo propertyInfo in properites)//遍历数组
                        {
                            object val = row[propertyInfo.Name];
                            if (val != DBNull.Value)
                            {
                                model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));
                            }
                        }
                        docs.Add(model);
    
                        StartId = Convert.ToInt32(row["ID"]);
                        var result = solr.Update("/update", new UpdateOptions() { Docs = docs });
                    }
                });
    
                //GetStartId();
                lock (syncLock)
                {
                    if (currentIndex <= productsCount)
                    {
                        form.fullerProgressBar.Value = (int)currentIndex;
                    }
                    form.fullerTsslCurrentNum.Text = currentIndex.ToString();
                }
            }
    
            /// <summary>
            /// 全部索引运行
            /// </summary>
            public void FullerRun()
            {
                //GetStartId();
                //form.fullerTsslCurrentNum.Text = currentIndex.ToString();
                DataTable dt = DataAccess.GetNextProductsInfo(StartId);
                StartId = AppCongfig.StartId;
                if (state == State.Break)
                {
                    this.SendMesasge("完全索引已继续,起始ID[" + StartId + "]...");
                }
                else
                {
                    startTime = DateTime.Now;
                    this.SendMesasge("完全索引已启动,起始ID[" + StartId + "]...");
                }
                state = State.Runing;
                form.btnInitIndex.Enabled = false;
                form.btnSuspend.Enabled = true;
                form.btnStop.Enabled = true;
          
                while (dt != null && dt.Rows.Count > 0 && state == State.Runing)
                {
                    try
                    {
                        InitIndex(dt);//单线程
                       // CreateIndexer(dt);//多线程
                    }
                    catch (Exception ex)
                    {
                        state = State.Stop;
                        form.btnInitIndex.Enabled = true;
                        form.btnSuspend.Enabled = false;
                        form.btnStop.Enabled = false;
                        GetStartId();
                        this.SendMesasge(ex.Message.ToString());
                    }
                    form.fullerTsslTimeSpan.Text = "已运行 :" + GetTimeSpanShow(DateTime.Now - startTime) + ",预计还需:" + GetTimeSpanForecast();
    
                    try
                    {
                        dt = DataAccess.GetNextProductsInfo(StartId);//获取下一组产品
                    }
                    catch (Exception err)
                    {
                        this.SendMesasge("获取下一组产品出错,起始ID[" + StartId + "]:" + err.Message);
                    }
                }
                if (state == State.Runing)
                {
                    state = State.Stop;
                    form.btnInitIndex.Enabled = true;
                    form.btnSuspend.Enabled = false;
                    form.btnStop.Enabled = false;
                    AppCongfig.SetValue("StartId", StartId.ToString());
                    this.SendMesasge("完全索引已完成,总计索引数[" + currentIndex + "]结束的产品Id" + StartId);
                }
                else if (state == State.Break)
                {
                    GetStartId();
                    state = State.Break;
                    form.btnInitIndex.Enabled = true;
                    form.btnSuspend.Enabled = false;
                    form.btnStop.Enabled = false;
                    AppCongfig.SetValue("StartId", StartId.ToString());
                    this.SendMesasge("完全索引已暂停,当前索引位置[" + currentIndex + "]结束的产品Id" + StartId);
                }
                else if (state == State.Stop)
                {
                    GetStartId();
                    state = State.Stop;
                    this.SendMesasge("完全索引已停止,已索引数[" + currentIndex + "]结束的产品Id" + StartId);
                    form.btnInitIndex.Enabled = true;
                    form.btnSuspend.Enabled = false;
                    form.btnStop.Enabled = false;
                    AppCongfig.SetValue("StartId", StartId.ToString());
                    productsCount = DataAccess.GetCount(StartId);       //产品数统计
                    form.fullerTsslMaxNum.Text = productsCount.ToString();
                    form.fullerProgressBar.Minimum = 0;
                    form.fullerProgressBar.Maximum = productsCount;
                }
                endTime = DateTime.Now;
            }
    
            /// <summary>
            /// 多线程构建索引数据方法
            /// </summary>
            /// <param name="threadDataParam"></param>
            public void MultiThreadCreateIndex(object threadDataParam)
            {
                InitIndex(threadDataParam);
            }
    
            /// <summary>
            /// 获取最大的索引id
            /// </summary>
            private void GetStartId()
            {
                IDictionary<string, ICollection<string>> options = new Dictionary<string, ICollection<string>>();
                options[CommonParams.SORT] = new string[] { "ProductID DESC" };
                options[CommonParams.START] = new string[] { "0" };
                options[CommonParams.ROWS] = new string[] { "1" };
                options[HighlightParams.FIELDS] = new string[] { "ProductID" };
                options[CommonParams.Q] = new string[] { "*:*" };
                var result = solrQuery.Query("/select", null, options);
                var solrDocumentList = (SolrDocumentList)result.Get("response");
                currentIndex = solrDocumentList.NumFound;
                if (solrDocumentList != null && solrDocumentList.Count() > 0)
                {
                    StartId = (int)solrDocumentList[0]["ProductID"];
                    //AppCongfig.SetValue("StartId", solrDocumentList[0]["ProductID"].ToString());
                }
                else
                {
                    StartId = 0;
                    // AppCongfig.SetValue("StartId", "0");
                }
            }
    
    
            /// <summary>
            /// 优化索引
            /// </summary>
            public void Optimize()
            {
                this.SendMesasge("开始优化索引,请耐心等待...");
                var result = solr.Update("/update", new UpdateOptions() { OptimizeOptions = optimizeOptions });
                var header = binaryResponseHeaderParser.Parse(result);
                this.SendMesasge("优化索引耗时:" + header.QTime + "毫秒");
            }
    
            /// <summary>
            /// 发送消息到界面
            /// </summary>
            /// <param name="message">发送消息到界面</param>
            protected void SendMesasge(string message)
            {
                form.fullerDgvMessage.Rows.Add(form.fullerDgvMessage.Rows.Count + 1, message, DateTime.Now.ToString());
            }
            /// <summary>
            /// 获取时间间隔显示
            /// </summary>
            /// <param name="ts">时间间隔</param>
            /// <returns></returns>
            protected string GetTimeSpanShow(TimeSpan ts)
            {
                string text = "";
                if (ts.Days > 0)
                {
                    text += ts.Days + "";
                }
                if (ts.Hours > 0)
                {
                    text += ts.Hours + "";
                }
                if (ts.Minutes > 0)
                {
                    text += ts.Minutes + "";
                }
                if (ts.Seconds > 0)
                {
                    text += ts.Seconds + "";
                }
                return text;
            }
            /// <summary>
            /// 获取预测时间
            /// </summary>
            /// <returns></returns>
            protected string GetTimeSpanForecast()
            {
                if (currentIndex != 0)
                {
                    TimeSpan tsed = DateTime.Now - startTime;
                    double d = ((tsed.TotalMilliseconds / currentIndex) * productsCount) - tsed.TotalMilliseconds;
                    return GetTimeSpanShow(TimeSpan.FromMilliseconds(d));
                }
                return "";
            }
        }
    }

        (5)运行索引器,创建索引,这里是我的索引器界面,如图

     

       可以随时跟踪索引生成的情况

      (6)索引创建完毕后,可以进入Solr服务器界面http://localhost:8080/one/#/collection1/query进行测试

     

    以上就是Solr的前期工作,主要是Solr服务器搭建和客户端调用生成索引,后期再对客户端的查询进行详细的说明,下期预告

    1.全文搜索,分词配置,以及类似于谷歌和百度那种输入关键字自动完成功能

    2.Facet查询

     

     

        

  • 相关阅读:
    xsd的解释说明
    SDUT 2498-AOE网上的关键路径(spfa+字典序路径)
    java实现各种数据统计图(柱形图,饼图,折线图)
    软件測试计划模板
    范式图形辨析
    Android做法说明(3)---Fragment使用app袋或v4包解析
    登录同步多个副本如何实现的拷贝数发生变化分布式
    ZOJ1463:Brackets Sequence(间隙DP)
    jquery php 百度搜索框智能提示效果
    Hibernate在关于一对多,多对一双向关联映射
  • 原文地址:https://www.cnblogs.com/rainbowzc/p/3621624.html
Copyright © 2020-2023  润新知