Lucene系列-FieldCache

Lucene系列-FieldCache
域缓存，加载所有文档中某个特定域的值到内存，便于随机存取该域值。

用途及使用场景

当用户需要访问各文档中某个域的值时，IndexSearcher.doc(docId)获得Document的所有域值，但访问速度比较慢，而且只能获得Stored域的值。
FieldCache能获得域值数组，根据docId random access域值。FieldCache是高级内部API，通常用户不会直接使用，Lucene的域值排序、过滤等功能会在内部使用域缓存。

原理

域缓存构造过程：
un-invert倒排索引，从(field value -> doc)数据结构转化得到(doc -> field value)数据结构，获得域值数组。

Lucene提供了如下方式显示获取域缓存：
```
/**
 * reader 对应一个段(segment)的索引reader
 * field 域名
 * setDocsWithField true会获得一个bitset标记一个文档是否有该field
 */
FieldCache.Ints FieldCache.DEFAULT.getInts(AtomicReader reader, String field, boolean setDocsWithField)
```
对于给定的reader和域进行首次域缓存访问时，程序访问所有文档值并以一维大数组的形式加载到内存，用weakhashmap管理，key为reader实例和域名，value为域值数组。每当reader实例被关闭或被没有引用时，对应的缓存会被清除。首次访问后、被清除前的调用都会返回相同数组的引用。

域缓存有2个不足：
1. 常驻内存，大小是所有文档个数 * 值类型大小
2. 初始加载过程耗时，需要遍历倒排索引及类型转换

注意点：
1. 域值要单一，对于string类型不能分词(NOT_ANALYZED)
2. 该域需要建入索引(INDEXED)
3. 支持的数据类型，byte/short/int/long/float/double

改进：
Lucene针对FieldCache的不足进行了改进，在建索引的时候生成了doc -> field value数据结构，无需全驻内存和遍历解析。实现依赖于DocValues，域类型设为DocValues格式，在加载FieldCache时，程序会先尝试获取DocValues，获取失败才会开始遍历倒排索引。对于DocValues再另起文章介绍。
```
 final NumericDocValues valuesIn = reader.getNumericDocValues(field);
 if (valuesIn != null) {
      // Not cached here by FieldCacheImpl (cached instead
      // per-thread by SegmentReader):
      return new Ints() {
        @Override
        public int get(int docID) {
          return (int) valuesIn.get(docID);
      }
 };
```
一些API

基于lucene 4.10.0
```
//获取Int
FieldCache.Ints ints = FieldCache.DEFAULT.getInts(AtomicReader reader, String field, boolean setDocsWithField)
//获取docId的域值
int value = ints.get(docId)
//获取string
BinaryDocValues terms = getTerms(AtomicReader reader, String field, boolean setDocsWithField)
String value = terms.get(docId).utf8ToString()

//基于FieldCache的Filter
Filter f = FieldCacheRangeFilter.newIntRange("left", 0, 100, true, true);
Filter filter = new FieldCacheTermsFilter("type",
                new BytesRef[]{new BytesRef("science"), new BytesRef("it")});
```
参考
http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/
相关阅读:
整理Eclipse常用快捷键
 前端网站资源推荐
 Node.js 官方示例中的 ECMAScript 2015
D3.js 入门系列 — 选择元素和绑定数据
 D3.js 入门系列
 PlaceHolder的两种实现方式
 Vue.js 是什么
 Webstorm官方最新版本for Mac版本不用注册码／破坏原文件
 vue.js 学习仅自己加强记忆
 jQuery 动画animate，显示隐藏，淡入淡出，下拉切换,过渡效果
原文地址：https://www.cnblogs.com/whuqin/p/4981950.html

Lucene系列-FieldCache

用途及使用场景

原理

一些API