使用saiku的过程中发现一个重要问题,速度慢!下面是跟踪和优化过程
一、首先抓包,发现ajax请求:http://l-tdata2.tkt.cn6.qunar.com:8080/saiku/rest/saiku/api/query/execute
里面的参数不少,下面是截屏
二、看日志:发现了mdx语句
WITH SET [~ROWS_create_date_create_date] AS {[create_date].[create_date].[2016-04-12]} SET [~ROWS_dimPartner_dimPartner] AS Hierarchize({{[dimPartner].[dimPartner].[All dimPartners]}, {[dimPartner].[dimPartner].[name].Members}}) SET [~ROWS_in_track_in_track] AS {[in_track].[in_track].[All in_tracks]} SET [~ROWS_product_product] AS {[product].[product].[All products]} SET [~ROWS_self_self] AS {[self].[self].[All selfs]} SET [~ROWS_sight_sight] AS {[sight].[sight].[All sights]} SET [~ROWS_ticket_type_ticket_type] AS {[ticket_type].[ticket_type].[All ticket_types]} SET [~ROWS_order_status_order_status] AS {[order_status].[order_status].[All order_statuss]} SET [~ROWS_refund_status_refund_status] AS {[refund_status].[refund_status].[All refund_statuss]} SELECT NON EMPTY {[Measures].[money], [Measures].[quantity], [Measures].[qunar_income], [Measures].[order_num]} ON COLUMNS, NON EMPTY Order(NonEmptyCrossJoin([~ROWS_create_date_create_date], NonEmptyCrossJoin([~ROWS_dimPartner_dimPartner], NonEmptyCrossJoin([~ROWS_in_track_in_track], NonEmptyCrossJoin([~ROWS_product_product], NonEmptyCrossJoin([~ROWS_self_self], NonEmptyCrossJoin([~ROWS_sight_sight], NonEmptyCrossJoin([~ROWS_ticket_type_ticket_type], NonEmptyCrossJoin([~ROWS_order_status_order_status], [~ROWS_refund_status_refund_status])))))))), [Measures].[money], BDESC) ON ROWS FROM [com_order_detail_cube] 2016-04-15 17:02:55,948 INFO [org.saiku.datasources.connection.SaikuOlapConnection] Clearing cache 2016-04-15 17:03:26,603 WARN [mondrian.rolap.RolapSchema] Model is in legacy format 2016-04-15 17:04:09,714 INFO [org.saiku.datasources.connection.SaikuOlapConnection] Catalogs:1 2016-04-15 17:06:03,660 DEBUG [org.saiku.service.olap.ThinQueryService] Query End 2016-04-15 17:06:03,661 INFO [org.saiku.service.olap.ThinQueryService] RUN#:84 Size: 14/7 Execute: 190420ms Format: 0ms Totals: 0ms Total: 190420ms
观察日志,发现前端一直执行不返回。分析主要原因是执行mdx需要很长时间,190秒
3、找代码:org.saiku.web.rest.resources.Query2Resource的execute方法
继续追踪代码:org.saiku.service.olap.ThinQueryService的execute方法()。下面是核心重点:
private CellDataSet execute(ThinQuery tq, ICellSetFormatter formatter) { try { Long start = (new Date()).getTime(); log.debug("Query Start"); CellSet cellSet = executeInternalQuery(tq); //这是执行mdx语句的地方,需要较长时间 log.debug("Query End"); String runId = "RUN#:" + ID_GENERATOR.get(); Long exec = (new Date()).getTime(); CellDataSet result = OlapResultSetUtil.cellSet2Matrix(cellSet,formatter); Long format = (new Date()).getTime(); if (ThinQuery.Type.QUERYMODEL.equals(tq.getType()) && formatter instanceof FlattenedCellSetFormatter && tq.hasAggregators()) { calculateTotals(tq, result, cellSet, formatter); } Long totals = (new Date()).getTime(); log.info(runId + " Size: " + result.getWidth() + "/" + result.getHeight() + " Execute: " + (exec - start) + "ms Format: " + (format - exec) + "ms Totals: " + (totals - format) + "ms Total: " + (totals - start) + "ms"); result.setRuntime(new Double(format - start).intValue()); return result; } catch (Exception | Error e) { throw new SaikuServiceException("Can't execute query: " + tq.getName(),e); } }
4、查看数据执行的sql,看看为什么执行的很慢
4.1 选择情况
首先任何的筛选都是对立方体内的字段进行全表的扫描,比如我的立方体对应的数据表是:com_order_detail_view,时间对应的字段是create_date,那么选择时间的时候,捕获执行的sql如下:
select "com_order_detail_view"."create_date" as "c0" from "com_order_detail_view" as "com_order_detail_view" group by "com_order _detail_view"."create_date" order by "com_order_detail_view"."create_date" ASC NULLS LAST
发现根本没有where条件。好吧,这个可以理解!
4.2 执行情况
筛选的时候,为了提升效率,选择了一个日期,并且只是选择了name字段作为区分。执行时间:190s
抓取的sql如下:
4.2.1
select "dim_partner"."name" as "c0", sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view", "dim_partner" as "dim_partner" where "com_order_detail_view"."partner" = "dim_partner"."code" group by "dim_partner"."name"
4.2.2
select sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view"
没有发现where条件。猜测可能是选择日期没有在过滤条件里面,所以全表扫描,那么将日期放入过滤条件,mdx被修改为:
WITH SET [~FILTER] AS {[create_date].[create_date].[2016-04-01]} SET [~ROWS_dimPartner_dimPartner] AS Hierarchize({{[dimPartner].[dimPartner].[All dimPartners]}, {[dimPartner].[dimPartner].[name].Members}}) SET [~ROWS_in_track_in_track] AS {[in_track].[in_track].[All in_tracks]} SET [~ROWS_product_product] AS {[product].[product].[All products]} SET [~ROWS_self_self] AS {[self].[self].[All selfs]} SET [~ROWS_sight_sight] AS {[sight].[sight].[All sights]} SET [~ROWS_ticket_type_ticket_type] AS {[ticket_type].[ticket_type].[All ticket_types]} SET [~ROWS_order_status_order_status] AS {[order_status].[order_status].[All order_statuss]} SET [~ROWS_refund_status_refund_status] AS {[refund_status].[refund_status].[All refund_statuss]} SELECT NON EMPTY {[Measures].[money], [Measures].[quantity], [Measures].[qunar_income], [Measures].[order_num]} ON COLUMNS, NON EMPTY Order(NonEmptyCrossJoin([~ROWS_dimPartner_dimPartner], NonEmptyCrossJoin([~ROWS_in_track_in_track], NonEmptyCrossJoin([~ROWS_product_product], NonEmptyCrossJoin([~ROWS_self_self], NonEmptyCrossJoin([~ROWS_sight_sight], NonEmptyCrossJoin([~ROWS_ticket_type_ticket_type], NonEmptyCrossJoin([~ROWS_order_status_order_status], [~ROWS_refund_status_refund_status]))))))), [Measures].[money], BDESC) ON ROWS FROM [com_order_detail_cube] WHERE [~FILTER] 2016-04-15 17:13:02,448 DEBUG [org.saiku.service.olap.ThinQueryService] Query End 2016-04-15 17:13:02,449 INFO [org.saiku.service.olap.ThinQueryService] RUN#:86 Size: 13/8 Execute: 20679ms Format: 1ms Totals: 0ms Total: 20680ms
发现有了效果,执行时间:20s。下面是抓取的sql
4.2.3
select "com_order_detail_view"."create_date" as "c0", "dim_partner"."name" as "c1", sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view", "dim_partner" as "dim_partner" where "com_order_detail_view"."create_date" = DATE '2016-04-01' and "com_order_detail_view"."partner" = "dim_partner"."code" group by "com_order_detail_view"."create_date", "dim_partner"."name"
4.2.4
select "com_order_detail_view"."create_date" as "c0", sum("com_order_detail_view"."money") as "m0", sum("com_order_detail_view"."quantity") as "m1", sum("com_order_detail_view"."qunar_income") as "m2", count(distinct "com_order_detail_view"."display_id") as "m3" from "com_order_detail_view" as "com_order_detail_view" where "com_order_detail_view"."create_date" = DATE '2016-04-01' group by "com_order_detail_view"."create_date"
总结:使用saiku的时候,将时间条件放在《行》或者《列》里面,基本不起作用。最好放入在《过滤》里面