通过使用JanusGraph索引提高性能

通过使用JanusGraph索引提高性能
翻译整理：纪玉奇

Extending JanusGraph Server

JanusGraph支持两种类型的索引：graph index和vertex-centric index。graph index常用于根据属性查询Vertex或Edge的场景；vertex index在图遍历场景非常高效，尤其是当Vertex有很多Edge的情况下。

Graph Index

Graph Index是整个图上的全局索引结构，用户可以通过属性高效查询Vertex或Edge。如下面的代码：
g.V().has('name','hercules') g.E().has('reason', textContains('loves'))
上面的例子即为根据属性查找Vertex或Edge的实例，如果没有设置索引，上述的操作将会导致全表扫描，对大图来说是不可接受的。
JanusGraph支持两种不同的Graph Index，Composte index和Mixed Index，Compostie非常高效和快速，但只能应用对某特定的，预定义的属性key组合进行相等查询。Mixed index可用在查询任何index key的组合上并支持多条件查询，除了相等条件要依赖于后端索引存储。

这两种类型的Index都是通过JanusGraph的management操作的：
JanusGraphManagement.buildIndex(String,Class）
第一个参数是index的名称，第二个参数是要索引的类（如Vertex.class），name必须唯一。如果是在同一事务中新增的属性key所构成Index将会即刻生效，否则需要运行一个reindex proceudre来同步索引和数据，直到同步完成，否则索引不可用。推荐在初始化schema时同时定义索引。
注意：如果没有建索引，会进行全表扫面，此时性能非常低，可以通过配置force-index参数禁止全表扫描。

Composite Index

Comosite index通过一个或多个固定的key组合来获取Vertex Key或Edge，也即查询条件是在Index中固定的。
// 在graph中有事务执行时绝不能创建索引（否则可能导致死锁） graph.tx().rollback() mgmt = graph.openManagement() name = mgmt.getPropertyKey('name') age = mgmt.getPropertyKey('age') // 构建根据name查询vertex的组合索引 mgmt.buildIndex('byNameComposite',Vertex.class).addKey(name).buildCompositeIndex() // 构建根据name和age查询vertex的组合索引 mgmt.buildIndex('byNameAndAgeComposite',Vertex.class).addKey(name).addKey(age).buildCompositeIndex() mgmt.commit() //等待索引生效 mgmt.awaitGraphIndexStatus(graph,'byNameComposite').call() mgmt.awaitGraphIndexStatus(graph,'byNameAndAgeComposite').call() //对已有数据重新索引 mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"),SchemaAction.REINDEX).get() mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"),SchemaAction.REINDEX).get() mgmt.commit()
需要注意的是，Composite index需要在查询条件完全匹配的情况下才能触发，如上面代码，g.V().has('name', 'hercules')和g.V().has('age',30).has('name','hercules')都是可以触发索引的，但g.V().has('age',30)则不行，因并未对age建索引。g.V().has('name','hercules').has('age',inside(20,50))也不可以，因只支持精确匹配，部支持范围查询。

Index Uniqueness

Composite Index也可以作为图的属性唯一约束使用，如果composite graph index被设置为unique()，则只能存在最多一个对应的属性组合。
graph.tx().rollback()//Never create new indexes while a transaction is active mgmt = graph.openManagement() name = mgmt.getPropertyKey('name') mgmt.buildIndex('byNameUnique',Vertex.class).addKey(name).unique().buildCompositeIndex() mgmt.commit() //Wait for the index to become available mgmt.awaitGraphIndexStatus(graph,'byNameUnique').call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex("byNameUnique"),SchemaAction.REINDEX).get() mgmt.commit()
注意：对于设置为最终一致性的后端存储，index的一致性必须被设置为允许锁定。

Mixed Index

Mixed Index支持通过其中的任意key的组合查询Vertex或者Edge。Mix Index使用上更加灵活，而且支持范围查询等（不仅包含相等）；从另外一方面说，Mixed index效率要比Composite Index低。

与Composite key不同，Mixed Index需要配置索引后端，JanusGraph可以在一次安装中支持多个索引后端，而且每个索引后端必须使用JanusGraph中配置唯一标识：称为indexing backend name。
graph.tx().rollback()//Never create new indexes while a transaction is active mgmt = graph.openManagement() name = mgmt.getPropertyKey('name') age = mgmt.getPropertyKey('age') mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name).addKey(age).buildMixedIndex("search") mgmt.commit() //Wait for the index to become available mgmt.awaitGraphIndexStatus(graph,'nameAndAge').call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex("nameAndAge"),SchemaAction.REINDEX).get() mgmt.commit()
上面的代码建立了一个名为nameAndAge的索引，该索引使用name和age属性构成，并设定其索引后端为"search"，对应到配置文件中为：index.serarch.backend，如果叫solrsearch，则需要增加：index.solrsearch.backend配置。
下面展示了如果使用text search作为默认的搜索行为：
mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")
更加详细的使用参考：Charpter21, Index Parameter and Full-Test Search
在使用上，支持范围查询和索引中任何组合查询，而不仅局限于“相等”查询方式：
g.V().has('name', textContains('hercules')).has('age', inside(20,50)) g.V().has('name', textContains('hercules')) g.V().has('age', lt(50))
Mixed Index支持全文检索，范围检索，地理检索和其他方式，参考Chapter20, Search Predicates and Data Types。
注意：不像composite index，mixed index不支持唯一性。

Adding Property Keys

可以向已经存在的mixed index中新增属性，之后就可以在查询条件中使用了。
//Never create new indexes while a transaction is active graph.tx().rollback() mgmt = graph.openManagement() //创建一个新的属性 location = mgmt.makePropertyKey('location').dataType(Geoshape.class).make() nameAndAge = mgmt.getGraphIndex('nameAndAge') //修改索引 mgmt.addIndexKey(nameAndAge, location) mgmt.commit() //Wait for the index to become available mgmt.awaitGraphIndexStatus(graph,'nameAndAge').call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex("nameAndAge"),SchemaAction.REINDEX).get() mgmt.commit()
如果索引是在同意事务中创建的，则在该事务中马上可以使用。如果该属性Key已经被使用，需要执行reindex procedure来保证索引中包含了所有数据，知道该过程执行完毕，否则不能使用。
Mapping Parameters

当向mixed index增加新的property key时（无论通过何种方式创建），可以指定一组参数来设置property value在后端的存储方式。参考mapping paramters overview章节。

Ordering

图查询的集合返回顺序可由order().by()指定，该方法包含了两个参数：
- 排序依据的属性名称
- 升降序，incr和decr
如：
g.V().has('name', textContains('hercules')).order().by('age', decr).limit(10)
返回了name属性中包含‘hercules’且以'age'降序返回的10条数据。
使用Order时需要注意：
- composite graph index原生不支持对返回结果排序，数据会被先加载到内存中再进行排序，对于大数据集合来讲成本非常高
- Mixed graph index本身支持排序返回，但排序中要使用的property key需要提前被加到mix index中去，如果要排序的property key不是index的一部分，将会导致整个数据集合加载到内存。
Label Constraint

有些情况下，我们不想对图中具有某一label的所有Vertex或Edge进行索引，例如，我们只想对有GOD标签的节点进行索引，此时我们可以使用indexOnly方法表示只索引具有某一Label的Vertex和Edge。如下：
//Never create new indexes while a transaction is active graph.tx().rollback() mgmt = graph.openManagement() name = mgmt.getPropertyKey('name') god = mgmt.getVertexLabel('god') //只索引有god这一label的顶点 mgmt.buildIndex('byNameAndLabel',Vertex.class).addKey(name).indexOnly(god).buildCompositeIndex() mgmt.commit() //Wait for the index to become available mgmt.awaitGraphIndexStatus(graph,'byNameAndLabel').call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex("byNameAndLabel"),SchemaAction.REINDEX).get() mgmt.commit()
label约束对mix index也是类似的，当一个有label约束的composite index被设置为唯一时，唯一约束只应用于具有此label的vertex或edge属性上。
Composite versus Mixed Indexes

1. 使用comosite key应用与确切的匹配场景，composite key不需要外部索引系统且通常具有更好的性能。

作为一个例外，如果要精确匹配的值数量很小（如12个月份）或一个元素与图中很多的元素有关联，此时应使用mix index。

2. 对取范围，全文检索或位置查询这样的应用场景，应该使用mix index，而且使用mixed index可以提供order().by()的性能。

Vertex-centric Indexs

Vertex-centric index（顶点中心索引）是为每个vertex建立的本地索引结构，在大型graph中，每个vertex有数千条Edge，在这些vertex中遍历效率将会非常低（需要在内存中过滤符合要求的Edge）。Vertex-centric index可以通过使用本地索引结构加速遍历效率。

如：
h = g.V().has('name','hercules').next() g.V(h).outE('battled').has('time', inside(10,20)).inV()
如果没有vertex-centric index，则需要便利所有的batteled边并找出记录，在边的数量庞大时效率非常低。
建立一个vertex-centric index可以加速查询：
//Never create new indexes while a transaction is active graph.tx().rollback() mgmt = graph.openManagement() //找到一个property key time = mgmt.getPropertyKey('time') // 找到一个label battled = mgmt.getEdgeLabel('battled') // 创建vertex-centric index mgmt.buildEdgeIndex(battled,'battlesByTime',Direction.BOTH,Order.decr, time) mgmt.commit() //Wait for the index to become available mgmt.awaitGraphIndexStatus(graph,'battlesByTime').call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex("battlesByTime"),SchemaAction.REINDEX).get() mgmt.commit()
上面的代码对battled边根据time以降序建立了双向索引。buildEdgeIndex()方法中的第一个参数是要索引的Edge的Label，第二个参数是index的名称，第三个参数是边的方向，BOTH意味着可以使用IN/OUT，如果只设置为某一方向，可以减少一半的存储和维护成本。最后两个参数是index的排序方向，以及要索引的property key，property key可以是多个，order默认为升序（Order.ASC）。
graph.tx().rollback()//Never create new indexes while a transaction is active mgmt = graph.openManagement() time = mgmt.getPropertyKey('time') rating = mgmt.makePropertyKey('rating').dataType(Double.class).make() battled = mgmt.getEdgeLabel('battled') mgmt.buildEdgeIndex(battled,'battlesByRatingAndTime',Direction.OUT,Order.decr, rating, time) mgmt.commit() //Wait for the index to become available mgmt.awaitRelationIndexStatus(graph,'battlesByRatingAndTime','battled').call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getRelationIndex(battled,'battlesByRatingAndTime'),SchemaAction.REINDEX).get() mgmt.commit()
上面的代码建立了battlesByRatingAndTime索引，并以rating和time构成，需要注意构成索引的property key的顺序非常重要，查询时只能根据propety key定义的顺序查询。
h = g.V().has('name','hercules').next() g.V(h).outE('battled').property('rating',5.0)//Add some rating properties g.V(h).outE('battled').has('rating', gt(3.0)).inV() g.V(h).outE('battled').has('rating',5.0).has('time', inside(10,50)).inV() g.V(h).outE('battled').has('time', inside(10,50)).inV()
对上面部分的代码，只有查询1,2是可以使用索引的，查询3使用time查询无法匹配先根据rating再根据time的index构造顺序。可以对一个label创建多个不同的索引来支持不同的遍历。JanusGraph自动选择最有效的索引，Vertex-centric仅支持相等和range/interval约束。
注意：在vertex-centirc中使用的property key必须是显式定义的且未确定的class类型（不是Object.class）才能支持排序。如果数据类型浮点型，必须使用JanusGraph的Decimal或Precision数据类型。

根据在同一事务中新建的label所创建的索引可以即刻生效，如果edge正在被使用，则需要运行reindex程序，直到该程序运行结束，否则该索引无法使用。

注意：JanusGraph自动为每个edge label的每个property key建立了vertex-centric label，因此即使有数千个边也能高效查询。

Vertex-centric label无法加速不受约束的遍历（在所有边中遍历），这种遍历随着边的增加会变的更慢，通常这些遍历可以作为受约束遍历重写来提高性能。

Ordering Traversals

下面的查询使用了local和limit方法获取了遍历过程的排序子集。
h = g..V().has('name','hercules').next() g.V(h).local(outE('battled').order().by('time', decr).limit(10)).inV().values('name') g.V(h).local(outE('battled').has('rating',5.0).order().by('time', decr).limit(10)).values('place')
如果排序字段和排序方向与vertex-centric index一致的话，上面的查询非常高效。
注意：vertex 排序查询时JanusGraph对Gremlin的扩展，要使用该功需要一段冗长的语句，而且需要_()步骤将JanusGraph转换为Gremlin管道。
相关阅读:
SSM(六)JDK动态代理和Cglib动态代理
 SSM(三)Mybatis动态SQL
MyBatis无限级分类实现的两种方法--自关联与map集合
 idea上MyBatis第一个例子
 idea 给maven项目添加依赖（二）
idea 创建maven项目（一）
Java学习笔记——三层架构
 MySQL实现自动使用uuid作为主键以及解决不能调用触发器的一点思路
 Java学习笔记——MySQL创建表结构
 jQuery入门——注册事件
原文地址：https://www.cnblogs.com/jiyuqi/p/7132986.html

通过使用JanusGraph索引提高性能

翻译整理：纪玉奇

Extending JanusGraph Server

Graph Index

Composite Index

Index Uniqueness

Mixed Index

Adding Property Keys

Mapping Parameters

Ordering

Label Constraint

Composite versus Mixed Indexes

Vertex-centric Indexs

Ordering Traversals