给你的网站加上站内搜索---Compass入门教程
syxChina(syxchina.cnblogs.com)
1 序言
这些天一直在学点新的东西,想给毕业设计添加点含量,长时间的SSH项目也想尝试下新的东西和完善以前的技术,搜索毋容置疑是很重要的。作为javaer,作为apache的顶级开源项目lucene应该有所耳闻吧,刚学完lucene,知道了基本使用,学的程度应该到可以使用的地步,但不的不说lucene官方给的文档例子不是很给力的,还好互联网上资料比较丰富!在搜索lucene的过程中,知道了基于lucene的compass和lucene-nutch。lucene可以对给定内容加上索引搜索,但比如搜索本地数据库和web网页,你需要把数据给拿出来索引再搜索,所以你就想可不可以直接搜索数据库,以数据库内容作为索引,并且伴随着数据库的CRUD,索引也会更新,compass出现了,compass作为站内搜索那是相当的方便的,并且官方提供了spring和hibernate的支持,更是方便了。Lucene-nutch是基于lucene搜索web页面的,如果有必要我在分享下lucene、lecene-nutch的学习经验,快速入门,其他的可以交给文档和谷歌了。
不得不提下,compass09年貌似就不更新了,网上说只支持lucene3.0以下版本,蛮好的项目不知道为什么不更新了,试了下3.0以后的分词器是不能使用了,我中文使用JE-Analyzer.jar。我使用的环境:
Spring3.1.0+Hibernate3.6.6+Compass2.2.0。
2 Compass介绍
Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括:
* 搜索引擎抽象层(使用Lucene搜索引荐),
* OSEM (Object/Search Engine Mapping) 支持,
* 事务管理,
* 类似于Google的简单关键字查询语言,
* 可扩展与模块化的框架,
* 简单的API.
官方网站:谷歌
3 单独使用Compass
Compass可以不继承到hibernate和spring中的,这个是从网上摘录的,直接上代码:
@Searchable
public class Book {
private String id;//编号
private String title;//标题
private String author;//作者
private float price;//价格
public Book() {
}
public Book(String id, String title, String author, float price) {
super();
this.id = id;
this.title = title;
this.author = author;
this.price = price;
}
@SearchableId
public String getId() {
return id;
}
@SearchableProperty(boost = 2.0F, index = Index.TOKENIZED, store = Store.YES)
public String getTitle() {
return title;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getAuthor() {
return author;
}
@SearchableProperty(index = Index.NO, store = Store.YES)
public float getPrice() {
return price;
}
public void setId(String id) {
this.id = id;
}
public void setTitle(String title) {
this.title = title;
}
public void setAuthor(String author) {
this.author = author;
}
public void setPrice(float price) {
this.price = price;
}
@Override
public String toString() {
return "[" + id + "] " + title + " - " + author + " $ " + price;
}
}
public class Searcher {
protected Compass compass;
public Searcher() {
}
public Searcher(String path) {
compass = new CompassAnnotationsConfiguration()//
.setConnection(path).addClass(Book.class)//
.setSetting("compass.engine.highlighter.default.formatter.simple.pre", "<font color='red'>")//
.setSetting("compass.engine.highlighter.default.formatter.simple.post", "</font>")//
.buildCompass();//
Runtime.getRuntime().addShutdownHook(new Thread() {
public void run() {
compass.close();
}
});
}
/**
* 新建索引
* @param book
*/
public void index(Book book) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
session.create(book);
tx.commit();
} catch (RuntimeException e) {
if (tx != null)
tx.rollback();
throw e;
} finally {
if (session != null) {
session.close();
}
}
}
/**
* 删除索引
* @param book
*/
public void unIndex(Book book) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
session.delete(book);
tx.commit();
} catch (RuntimeException e) {
tx.rollback();
throw e;
} finally {
if (session != null) {
session.close();
}
}
}
/**
* 重建索引
* @param book
*/
public void reIndex(Book book) {
unIndex(book);
index(book);
}
/**
* 搜索
* @param queryString
* @return
*/
public List<Book> search(String queryString) {
CompassSession session = null;
CompassTransaction tx = null;
try {
session = compass.openSession();
tx = session.beginTransaction();
CompassHits hits = session.find(queryString);
int n = hits.length();
if (0 == n) {
return Collections.emptyList();
}
List<Book> books = new ArrayList<Book>();
for (int i = 0; i < n; i++) {
books.add((Book) hits.data(i));
}
hits.close();
tx.commit();
return books;
} catch (RuntimeException e) {
tx.rollback();
throw e;
} finally {
if (session != null) {
session.close();
}
}
}
public class Main {
static List<Book> db = new ArrayList<Book>();
static Searcher searcher = new Searcher("index");
public static void main(String[] args) {
add(new Book(UUID.randomUUID().toString(), "Thinking in Java", "Bruce", 109.0f));
add(new Book(UUID.randomUUID().toString(), "Effective Java", "Joshua", 12.4f));
add(new Book(UUID.randomUUID().toString(), "Java Thread Programing", "Paul", 25.8f));
long begin = System.currentTimeMillis();
int count = 30;
for(int i=1; i<count; i++) {
if(i%10 == 0) {
long end = System.currentTimeMillis();
System.err.println(String.format("当时[%d]条,剩[%d]条,已用时间[%ds],估计时间[%ds].", i,count-i,(end-begin)/1000, (int)((count-i)*((end-begin)/(i*1000.0))) ));
}
String uuid = new Date().toString();
add(new Book(uuid, uuid.substring(0, uuid.length()/2), uuid.substring(uuid.length()/2), (float)Math.random()*100));
}
int n;
do {
n = displaySelection();
switch (n) {
case 1:
listBooks();
break;
case 2:
addBook();
break;
case 3:
deleteBook();
break;
case 4:
searchBook();
break;
case 5:
return;
}
} while (n != 0);
}
static int displaySelection() {
System.out.println("\n==select==");
System.out.println("1. List all books");
System.out.println("2. Add book");
System.out.println("3. Delete book");
System.out.println("4. Search book");
System.out.println("5. Exit");
int n = readKey();
if (n >= 1 && n <= 5)
return n;
return 0;
}
/**
* 增加一本书到数据库和索引中
*
* @param book
*/
private static void add(Book book) {
db.add(book);
searcher.index(book);
}
/**
* 打印出数据库中的所有书籍列表
*/
public static void listBooks() {
System.out.println("==Database==");
int n = 1;
for (Book book : db) {
System.out.println(n + ")" + book);
n++;
}
}
/**
* 根据用户录入,增加一本书到数据库和索引中
*/
public static void addBook() {
String title = readLine(" Title: ");
String author = readLine(" Author: ");
String price = readLine(" Price: ");
Book book = new Book(UUID.randomUUID().toString(), title, author, Float.valueOf(price));
add(book);
}
/**
* 删除一本书,同时删除数据库,索引库中的
*/
public static void deleteBook() {
listBooks();
System.out.println("Book index: ");
int n = readKey();
Book book = db.remove(n - 1);
searcher.unIndex(book);
}
/**
* 根据输入的关键字搜索书籍
*/
public static void searchBook() {
String queryString = readLine(" Enter keyword: ");
List<Book> books = searcher.search(queryString);
System.out.println(" ====search results:" + books.size() + "====");
for (Book book : books) {
System.out.println(book);
}
}
public static int readKey() {
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
try {
int n = reader.read();
n = Integer.parseInt(Character.toString((char) n));
return n;
} catch (Exception e) {
throw new RuntimeException();
}
}
public static String readLine(String propt) {
System.out.println(propt);
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
try {
return reader.readLine();
} catch (Exception e) {
throw new RuntimeException();
}
}
}
这种方法向数据库插入数据和加索引速度很慢,下面方法可以提高,注意这上面没设置分词器,所以使用默认的,如果是中文的话会分隔为一个一个的。
4 spring+hibernate继承compass
4-1 jar包
4-2 配置文件
Beans.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context-3.0.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-3.0.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-3.0.xsd">
<context:annotation-config />
<context:component-scan base-package="com.syx.compass"></context:component-scan>
<aop:aspectj-autoproxy></aop:aspectj-autoproxy>
<import resource="hibernate-beans.xml"/>
<import resource="compass-beans.xml"/>
</beans>
compass-beans.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="...">
<!--compass主配置 -->
<bean id="compass" class="org.compass.spring.LocalCompassBean">
<property name="compassSettings">
<props>
<prop key="compass.engine.connection">file://compass</prop><!-- 数据索引存储位置 -->
<prop key="compass.transaction.factory">
org.compass.spring.transaction.SpringSyncTransactionFactory</prop>
<prop key="compass.engine.analyzer.default.type">
jeasy.analysis.MMAnalyzer</prop><!--定义分词器-->
<prop key="compass.engine.highlighter.default.formatter.simple.pre">
<![CDATA[<font color="red"><b>]]></prop>
<prop key="compass.engine.highlighter.default.formatter.simple.post">
<![CDATA[</b></font>]]></prop>
</props>
</property>
<property name="transactionManager">
<ref bean="txManager" />
</property>
<property name="compassConfiguration" ref="annotationConfiguration" />
<property name="classMappings">
<list>
<value>com.syx.compass.test1.Article</value>
</list>
</property>
</bean>
<bean id="annotationConfiguration"
class="org.compass.annotations.config.CompassAnnotationsConfiguration">
</bean>
<bean id="compassTemplate" class="org.compass.core.CompassTemplate">
<property name="compass" ref="compass" />
</bean>
<!-- 同步更新索引, 数据库中的数据变化后同步更新索引 -->
<bean id="hibernateGps" class="org.compass.gps.impl.SingleCompassGps"
init-method="start" destroy-method="stop">
<property name="compass">
<ref bean="compass" />
</property>
<property name="gpsDevices">
<list>
<ref bean="hibernateGpsDevice"/>
</list>
</property>
</bean>
<!--hibernate驱动 链接compass和hibernate -->
<bean id="hibernateGpsDevice"
class="org.compass.spring.device.hibernate.dep.SpringHibernate3GpsDevice">
<property name="name">
<value>hibernateDevice</value>
</property>
<property name="sessionFactory">
<ref bean="sessionFactory" />
</property>
<property name="mirrorDataChanges">
<value>true</value>
</property>
</bean>
<!-- 定时重建索引(利用quartz)或随Spring ApplicationContext启动而重建索引 -->
<bean id="compassIndexBuilder"
class="com.syx.compass.test1.CompassIndexBuilder"
lazy-init="false">
<property name="compassGps" ref="hibernateGps" />
<property name="buildIndex" value="false" />
<property name="lazyTime" value="1" />
</bean>
<!-- 搜索引擎服务类 -->
<bean id="searchService" class=" com.syx.compass.test1.SearchServiceBean">
<property name="compassTemplate">
<ref bean="compassTemplate" />
</property>
</bean>
</beans>
hibernate-beans.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="...">
<!-- DataSource -->
<bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource">
<property name="driverClass" value="${jdbc.driverClassName}" />
<property name="jdbcUrl" value="${jdbc.url}" />
<property name="user" value="${jdbc.username}" />
<property name="password" value="${jdbc.password}" />
<property name="autoCommitOnClose" value="true" />
<property name="checkoutTimeout" value="${cpool.checkoutTimeout}" />
<property name="initialPoolSize" value="${cpool.minPoolSize}" />
<property name="minPoolSize" value="${cpool.minPoolSize}" />
<property name="maxPoolSize" value="${cpool.maxPoolSize}" />
<property name="maxIdleTime" value="${cpool.maxIdleTime}" />
<property name="acquireIncrement" value="${cpool.acquireIncrement}" />
<!-- <property name="maxIdleTimeExcessConnections" value="${cpool.maxIdleTimeExcessConnections}"/> -->
</bean>
<bean
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="locations">
<value>classpath:jdbc.properties</value>
</property>
</bean>
<!-- SessionFacotory -->
<bean id="sessionFactory"
class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="annotatedClasses">
<list>
<value>com.syx.compass.model.Article</value>
<value>com.syx.compass.model.Author</value>
<value>com.syx.compass.test1.Article</value>
</list>
</property>
<property name="hibernateProperties">
<props>
<prop key="hibernate.dialect">org.hibernate.dialect.MySQLDialect</prop>
<prop key="hibernate.current_session_context_class">thread</prop>
<prop key="javax.persistence.validation.mode">none</prop>
<prop key="hibernate.show_sql">true</prop>
<prop key="hibernate.format_sql">false</prop>
<prop key="hibernate.hbm2ddl.auto">update</prop>
</props>
</property>
</bean>
<bean id="hibernateTemplate" class="org.springframework.orm.hibernate3.HibernateTemplate">
<property name="sessionFactory" ref="sessionFactory"></property>
</bean>
<bean id="txManager"
class="org.springframework.orm.hibernate3.HibernateTransactionManager">
<property name="sessionFactory" ref="sessionFactory" />
</bean>
</beans>
jdbc.properties
jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.hostname=localhost
jdbc.url=jdbc:mysql://localhost:3306/compass
jdbc.username=root
jdbc.password=root
cpool.checkoutTimeout=5000
cpool.minPoolSize=1
cpool.maxPoolSize=4
cpool.maxIdleTime=25200
cpool.maxIdleTimeExcessConnections=1800
cpool.acquireIncrement=5
log4j.properties
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.rootLogger=error, stdout
4-3 源代码
@Searchable(alias = "article")
@Entity(name="_article")
public class Article {
private Long ID; // 标识ID
private String content; // 正文
private String title; // 文章标题
private Date createTime; // 创建时间
public Article(){}
public Article(Long iD, String content, String title, Date createTime) {
ID = iD;
this.content = content;
this.title = title;
this.createTime = createTime;
}
public String toString() {
return String.format("%d,%s,%s,%s", ID, title, content, createTime.toString());
}
@SearchableId
@Id
@GeneratedValue
public Long getID() {
return ID;
}
public void setID(Long id) {
ID = id;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
@SearchableProperty(index = Index.TOKENIZED, store = Store.YES)
public Date getCreateTime() {
return createTime;
}
public void setCreateTime(Date createTime) {
this.createTime = createTime;
}
}
public class CompassIndexBuilder implements InitializingBean {
// 是否需要建立索引,可被设置为false使本Builder失效.
private boolean buildIndex = false;
// 索引操作线程延时启动的时间,单位为秒
private int lazyTime = 10;
// Compass封装
private CompassGps compassGps;
// 索引线程
private Thread indexThread = new Thread() {
@Override
public void run() {
try {
Thread.sleep(lazyTime * 1000);
System.out.println("begin compass index...");
long beginTime = System.currentTimeMillis();
// 重建索引.
// 如果compass实体中定义的索引文件已存在,索引过程中会建立临时索引,
// 索引完成后再进行覆盖.
compassGps.index();
long costTime = System.currentTimeMillis() - beginTime;
System.out.println("compss index finished.");
System.out.println("costed " + costTime + " milliseconds");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
};
/**
* 实现<code>InitializingBean</code>接口,在完成注入后调用启动索引线程.
*/
public void afterPropertiesSet() throws Exception {
if (buildIndex) {
indexThread.setDaemon(true);
indexThread.setName("Compass Indexer");
indexThread.start();
}
}
public void setBuildIndex(boolean buildIndex) {
this.buildIndex = buildIndex;
}
public void setLazyTime(int lazyTime) {
this.lazyTime = lazyTime;
}
public void setCompassGps(CompassGps compassGps) {
this.compassGps = compassGps;
}
}
public class SearchServiceBean {
private CompassTemplate compassTemplate;
/** 索引查询 * */
public Map find(final String keywords, final String type, final int start, final int end) {
return compassTemplate.execute(new CompassCallback<Map>() {
public Map doInCompass(CompassSession session) throws CompassException {
List result = new ArrayList();
int totalSize = 0;
Map container = new HashMap();
CompassQuery query = session.queryBuilder().queryString(keywords).toQuery();
CompassHits hits = query.setAliases(type).hits();
totalSize = hits.length();
container.put("size", totalSize);
int max = 0;
if (end < hits.length()) {
max = end;
} else {
max = hits.length();
}
if (type.equals("article")) {
for (int i = start; i < max; i++) {
Article article = (Article) hits.data(i);
String title = hits.highlighter(i).fragment("title");
if (title != null) {
article.setTitle(title);
}
String content = hits.highlighter(i).setTextTokenizer(CompassHighlighter.TextTokenizer.AUTO).fragment("content");
if (content != null) {
article.setContent(content);
}
result.add(article);
}
}
container.put("result", result);
return container;
}
});
}
public CompassTemplate getCompassTemplate() {
return compassTemplate;
}
public void setCompassTemplate(CompassTemplate compassTemplate) {
this.compassTemplate = compassTemplate;
}
}
public class MainTest {
public static ClassPathXmlApplicationContext applicationContext;
private static HibernateTemplate hibernateTemplate;
@BeforeClass
public static void init() {
System.out.println("sprint init...");
applicationContext = new ClassPathXmlApplicationContext("beans.xml");
hibernateTemplate = applicationContext.getBean(HibernateTemplate.class);
System.out.println("sprint ok");
}
@Test
public void addData() {
System.out.println("addDate");
//把compass-beans.xml 中 bean id="compassIndexBuilder"
//buildIndex=true lazyTime=1
//会自动的根据数据库中的数据重新建立索引
try {
Thread.sleep(10000000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
@Test
public void search() {
String keyword = "全文搜索引擎";
SearchServiceBean ssb = applicationContext.getBean(SearchServiceBean.class);
Map map = ssb.find(keyword, "article", 0, 100);//第一次搜索加载词库
long begin = System.currentTimeMillis();
map = ssb.find(keyword, "article", 0, 100);//第二次才是搜索用时
long end = System.currentTimeMillis();
System.out.println(String.format(
"搜索:[%s],耗时(ms):%d,记录数:%d", keyword, end-begin, map.get("size")));
List<Article> list = (List<Article>) map.get("result");
for(Article article : list) {
System.out.println(article);
}
}
4-4 说明
compass-beans.xml中可以设置建立索引的目录和分词器,测试的时候我们使用数据库添加数据,启动的建立索引,测试速度。
4-5 测试
使用mysql,写了一个添加数据的函数:
DELIMITER $$
CREATE
FUNCTION `compass`.`addDateSyx`(num int(8))
RETURNS varchar(32)
BEGIN
declare i int(8);
set i = 0;
while ( i < num) DO
insert into _article (title,content, createTime) values (i, num-i, now());
set i = i + 1;
end while;
return "OK";
END$$
DELIMITER ;
4-5-1 10000条重复的中文数据测试
数据库函数的时候修改下insert:
insert into _article (title,content, createTime) values ('用compass实现站内全文搜索引擎(一)', 'Compass是一个强大的,事务的,高性能的对象/搜索引擎映射(OSEM:object/search engine mapping)与一个Java持久层框架.Compass包括:
* 搜索引擎抽象层(使用Lucene搜索引荐),
* OSEM (Object/Search Engine Mapping) 支持,
* 事务管理,
* 类似于Google的简单关键字查询语言,
* 可扩展与模块化的框架,
* 简单的API.
如果你需要做站内搜索引擎,而且项目里用到了hibernate,那用compass是你的最佳选择。 ', now());
插入数据:
select addDateSyx1(10000);//hibernate 中的 hibernate.hbm2ddl.auto=update
建立索引:
10000条,8045ms,速度还不错。
索引大小:
搜索:
的确分词了,如果使用默认的分词,中文会每个中文分一个,速度比较快,如果使用JE-Anaylzer 116ms也是可以接受的。
4-5-2 10w条重复的中文数据测试
插入数据:
Mysql 10w大约12s左右。
建立索引:
索引大小和我想象的差不多,就是时间比我像的长多了,但我不想在试了。
搜索:
10w的是数据,243ms还是很不错的,看来只要索引建好,搜索还是很方便的。
5 总结下吧
Compass用起来还是挺顺手的,应该基本需求可以满足的,不知道蛮好的项目怎么就不更新了,不然hibernate search就不会有的。
因为compass的不更新,所以lucene3.0以后的特性就不能用了,蛮可以的,虽然compass可以自动建索引(当然也可以手动CRUD),但如果封装下lucene来完成compass应该可以得到比较好的实现,期待同学们出手了。
参考文章:
ITEYE上一篇也不错,不小心页面关了...