背景
在后端开发中优化查询效率始终是一个永恒的主题,如何使用更加高效的查询方式来优化服务器的查询性能是优化用户体验的一个非常重要的方式,在充分理解本篇文章之前有必要对SQL SERVER数据库中的一些基本概念有一个充分的认识,聚集索引、非聚集索引、覆盖查询这些基本的概念有一个充分的认识,我想了解这些东西最好的方式还是去看微软的官方文档,有了这些基本的认识之后你才能够对整篇文章有一个更加充分的认识,同时也能够加深你对这些基础概念的理解,否则下次再碰到这些可以使用覆盖查询的场景时还是不清楚到底该如何使用。
另外你也可以看一下这篇博客,这个也能在一定程度上加深你对本篇文章的理解程度。
情景再现
我们先来看一下在我们的项目中的一个统计报表中按照一个维度的查询的EFCore代码(这段代码个人认为在很大程度上有很大的优化空间),然后具体交代里面的重点内容,这里面的input是前端界面传入的查询条件,这里的pageRequest是前端传入的分页请求,在读下面这些代码之前我们对立面主要的一些实体做一些描述:RepairContract(维修合同)-->RepairContractWorkItem(维修项目)-->RepairContractMaterial(维修材料),这个里面一个维修合同包含多个维修项目,而一个维修项目下面又包含多个维修材料,所以这三者之前的关系是1:N: N,这里的_repairContractRepository的定义如下 private readonly IRepository<RepairContract, Guid> _repairContractRepository,这个是使用ABP框架默认的实体注入的方式,这里我们使用_repairContractWorkItemRepository.GetAll().Include(rw => rw.Materials)这样的一个语句就能够查询出对应的维修项目并且带出该维修项目下面的所有维修材料,这个就是能够在一定程度上简化代码的方式。
/// <summary> /// 按维修类别 /// </summary> /// <param name="input">查询输入</param> /// <param name="pageRequest">分页请求</param> /// <returns></returns> public PageWithTotal<GetRepairStatisticByRepairTypeOutput, GetRepairStatisticByRepairTypeOutput> GetRepairStatisticByRepairType(GetRepairStatisticInput input, PageRequest pageRequest) { var repairContracts = GetRepairContract(input); var outputQuery = (from rc in repairContracts join rw in _repairContractWorkItemRepository.GetAll().Include(rw => rw.Materials) on rc.Id equals rw.RepairContractId select new { RepairContractId = rc.Id, rw.RepairType, RepairContractWorkItemId = rw.Id, Amount = (rw.LaborFeeAfter ?? 0) + (rw.Materials.Any() ? rw.Materials.Sum(m => m.MaterialFeeAfter ?? 0) : 0), MaterialCost = (decimal?)(!_partInDetailRepository.GetAll().Where(p => rw.Materials.Any(m => p.SourceBillId == m.Id)).Any() ? _partOutDetailRepository.GetAll().Where(p => rw.Materials.Any(m => p.SourceBillId == m.Id)).Sum(de => de.CostPrice * de.OutQuantity) : _partOutDetailRepository.GetAll().Where(p => rw.Materials.Any(m => p.SourceBillId == m.Id)).Sum(de => de.CostPrice * de.OutQuantity) - _partInDetailRepository.GetAll().Where(p => rw.Materials.Any(m => p.SourceBillId == m.Id)).Sum(de => de.CostPrice * de.InQuantity)) }).ToList(); var output = outputQuery.GroupBy(o => o.RepairType).Select(g => new GetRepairStatisticByRepairTypeOutput { RepairType = g.Key, Amount = g.Select(de => new { de.RepairContractId, de.RepairContractWorkItemId, de.Amount }).Distinct().Sum(de => de.Amount), OrderQuantity = g.Select(de => de.RepairContractId).Distinct().Count(), AverageRepairFee = !g.Select(de => de.RepairContractId).Distinct().Any() ? 0 : Math.Round( g.Select(de => new { de.RepairContractId, de.RepairContractWorkItemId, de.Amount }).Distinct().Sum(de => de.Amount) / g.Select(de => de.RepairContractId).Distinct().Count(), 2), MaterialCost = g.Select(de => new { de.RepairContractId, de.RepairContractWorkItemId, MaterialCost = de.MaterialCost ?? 0 }).Distinct() .Sum(de => de.MaterialCost) }).ToList(); var totalCount = output.Count; var total = new GetRepairStatisticByRepairTypeOutput { Amount = output.Sum(o => o.Amount), AverageRepairFee = output.Sum(o => o.OrderQuantity) == 0 ? 0 : Math.Round(output.Sum(o => o.Amount) / output.Sum(o => o.OrderQuantity), 2), OrderQuantity = output.Sum(o => o.OrderQuantity), MaterialCost = output.Sum(o => o.MaterialCost) }; var outputPage = output.AsQueryable().PageAndOrderBy(pageRequest).ToList(); return new PageWithTotal<GetRepairStatisticByRepairTypeOutput, GetRepairStatisticByRepairTypeOutput>(pageRequest, totalCount, outputPage, total); }
这里我们贴出上面的代码中的子查询GetRepairContract(input),这里的input是前端输入的查询条件的一个Dto输入,我们也来看看这段代码
private IQueryable<RepairContract> GetRepairContract(GetRepairStatisticInput input) { var repairContracts = _repairContractRepository.GetAll().Where(r => r.DealerId == SdtSession.TenantId && r.Status != RepairContractStatus.作废) .WhereIf(!string.IsNullOrWhiteSpace(input.ServiceAdvisorName), r => r.ServiceAdvisorName.Contains(input.ServiceAdvisorName)) .WhereIf(!string.IsNullOrWhiteSpace(input.LicensePlate), r => r.LicensePlate.Contains(input.LicensePlate)) .WhereIf(!string.IsNullOrWhiteSpace(input.CustomerName), r => r.CustomerName.Contains(input.CustomerName)) .WhereIf(!string.IsNullOrWhiteSpace(input.CreatorName), r => r.CreatorName.Contains(input.CreatorName)) .WhereIf(!string.IsNullOrWhiteSpace(input.ProductCategoryCode), r => r.ProductCategoryCode.Contains(input.ProductCategoryCode)) .WhereIf(!string.IsNullOrWhiteSpace(input.ProductCode), r => r.ProductCode.Contains(input.ProductCode)) .WhereIf(input.RepairType.Any(), r => _repairContractWorkItemRepository.GetAll().Any(rc => rc.RepairContractId == r.Id && input.RepairType.Contains(rc.RepairType))) .WhereIf(input.Status.Any(), r => input.Status.Contains(r.Status)) .WhereIf(input.BeginArriveTime.HasValue, r => input.BeginArriveTime <= r.CreateTime) .WhereIf(input.EndArriveTime.HasValue, r => r.CreateTime <= input.EndArriveTime) .WhereIf(input.BeginFinishTime.HasValue, r => input.BeginFinishTime <= r.FinishTime) .WhereIf(input.EndFinishTime.HasValue, r => r.FinishTime <= input.EndFinishTime) .WhereIf(input.BeginSettlementTime.HasValue, r => input.BeginSettlementTime <= r.SettlementTime) .WhereIf(input.EndSettlementTime.HasValue, r => r.SettlementTime <= input.EndSettlementTime); return repairContracts; }
这些都是一些常见的EFCore查询的语句,这里我们重点来看看我们的EFCore语句最终生成的SQL语句到底是怎么样的?
SELECT [r].[Id] AS [RepairContractId], [rw].[RepairType], [rw].[Id] AS [RepairContractWorkItemId], COALESCE([rw].[LaborFeeAfter], 0.0) + CASE WHEN ( SELECT CASE WHEN EXISTS ( SELECT 1 FROM [RepairContractMaterial] AS [r0] WHERE [rw].[Id] = [r0].[RepairContractWorkItemId]) THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END ) = 1 THEN ( SELECT SUM(COALESCE([m].[MaterialFeeAfter], 0.0)) FROM [RepairContractMaterial] AS [m] WHERE [rw].[Id] = [m].[RepairContractWorkItemId] ) ELSE 0.0 END AS [Amount], CASE WHEN NOT (( SELECT CASE WHEN EXISTS ( SELECT 1 FROM [PartInDetail] AS [p] WHERE EXISTS ( SELECT 1 FROM [RepairContractMaterial] AS [m0] WHERE ([p].[SourceBillId] = [m0].[Id]) AND ([rw].[Id] = [m0].[RepairContractWorkItemId]))) THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END ) = 1) THEN ( SELECT SUM([p0].[CostPrice] * [p0].[OutQuantity]) FROM [PartOutDetail] AS [p0] WHERE EXISTS ( SELECT 1 FROM [RepairContractMaterial] AS [m1] WHERE ([p0].[SourceBillId] = [m1].[Id]) AND ([rw].[Id] = [m1].[RepairContractWorkItemId])) ) ELSE ( SELECT SUM([p1].[CostPrice] * [p1].[OutQuantity]) FROM [PartOutDetail] AS [p1] WHERE EXISTS ( SELECT 1 FROM [RepairContractMaterial] AS [m2] WHERE ([p1].[SourceBillId] = [m2].[Id]) AND ([rw].[Id] = [m2].[RepairContractWorkItemId])) ) - ( SELECT SUM([p2].[CostPrice] * [p2].[InQuantity]) FROM [PartInDetail] AS [p2] WHERE EXISTS ( SELECT 1 FROM [RepairContractMaterial] AS [m3] WHERE ([p2].[SourceBillId] = [m3].[Id]) AND ([rw].[Id] = [m3].[RepairContractWorkItemId])) ) END AS [MaterialCost] FROM [RepairContract] AS [r] INNER JOIN [RepairContractWorkItem] AS [rw] ON [r].[Id] = [rw].[RepairContractId] WHERE (((([r].[DealerId] = '457ef8b6-c935-4eac-18f2-08d70733bcf8') AND ([r].[Status] <> -1)) AND [r].[Status] IN (1, 2, 3, 4, 5, 6)) AND ('2020-04-01T00:00:00' <= [r].[CreateTime])) AND ([r].[CreateTime] <= '2020-04-30T23:59:59')
这里需要注意,我们这里生成的SQL都是调用ToList()方法之前的部分,使用了ToList方法意味着将IQueryable类型的对象最终转换为IEnumerable对象,所以后面代码中的GroupBy其实都是在内存中进行的。在这段SQL中我们为了表达今天这篇博客的主题,我们重点来看下面这段获取Amount的方式。
CASE WHEN ( SELECT CASE WHEN EXISTS ( SELECT 1 FROM [RepairContractMaterial] AS [r0] WHERE [rw].[Id] = [r0].[RepairContractWorkItemId]) THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END ) = 1 THEN ( SELECT SUM(COALESCE([m].[MaterialFeeAfter], 0.0)) FROM [RepairContractMaterial] AS [m] WHERE [rw].[Id] = [m].[RepairContractWorkItemId] ) ELSE 0.0 END AS [Amount]
这段SQL需要根据当前维修项目获取对应的维修材料,并将维修材料里面的MaterialFeeAfter(实收材料费)进行汇总,由于在我们的数据库中维修项目和维修材料记录非常多,数据量非常大所以用维修项目去关联维修项目下面的材料时候由于在维修材料中有对应的维修项目外键 public Guid RepairContractWorkItemId { get; set; },所以在维修材料中是存在维修项目的索引的,按照道理这个查询应该不会慢,但是我们最后发现在某些时候这个查询还是会超时的,原因是这两张表的数据是在是太多了,怎么办?怎么优化查询效率?
解决方法
最后我们发现在维修材料上面建立的这个和维修项目上面的索引是一个非聚集索引IX_RepairContractMaterial_RepairContractWorkItemId,即使通过这个项目找到了最终的维修材料还是需要通过维修材料的Id还是要到最终的物理位置找到这条记录然后取出里面的实收材料费的,这个过程如果熟悉MySQL数据库的话,这个过程就是称之为回表,其实SQL SERVER也是这样,所以由于维修项目下面的维修材料是非常多的,所以这个通过非聚集索引找最终的记录,然后取这条记录里面的MaterialFeeAfter(实收材料费)实际上是一个非常耗时的过程。分析到了这里我们想到了能不能够省略掉第二个步骤,我们想到了SQL SERVER数据库中的覆盖查询的理念,由于我们这边使用到的维修材料里面的字段确实不多,我们完全可以把这个字段也包含在非聚集索引里面,从而优化查询效率,就这样干,我们来看看在EFCore中该怎样进行处理。
在当前的DbContext下面的OnModelCreating方法下面,我们加了下面的一段内容
modelBuilder.Entity<RepairContractMaterial>(rwi => { rwi.HasIndex(r => r.RepairContractWorkItemId) .ForSqlServerInclude(nameof(RepairContractMaterial.MaterialFeeAfter)); });
这里我们用到了ForSqlServerInclude方法将字段MaterialFeeAfter包含进非聚集索引里面,后面我们在Add-Migration 看看生成的SQL到底长什么样?
CREATE INDEX [IX_RepairContractMaterial_RepairContractWorkItemId] ON [RepairContractMaterial] ([RepairContractWorkItemId]) INCLUDE ([MaterialFeeAfter]);
我们发现这里和我们常见的创建索引的SQL相比,后面多了一个Include参数,这里经过上面的过程,整个优化过程完毕,最终达到了我们想要的优化效果。