• 使用EntityFramework访问数据时的一些效率问题


    一、准备工作

      1、数据库模型:

      如你所见,EF模型是上图中三个表,第四个则是数据库视图。

      2、数据:

      先在HeadAddress表中插入三条数据,再在EndAddress表中也插入三条数据,最后往Customer表中插入三万条随机数据作为测试数据。

    二、效率比较

      1、视图 vs 跨表:遍历所有用户信息(HeadAddress、EndAddress、Customer中的字段)

     1                  // 视图(ToList)
     2                  var temp = _DataContext.CustomerView;
     3                  foreach (var item in temp) ;
     4                  // 跨表(ToList)
     5                  var temp = _DataContext.CustomerSet.Select(c => new
     6                  {
     7                      Name = c.Name,
     8                      Sex = c.Sex,
     9                      Street = c.EndAddress.Street,
    10                      Number = c.EndAddress.Number,
    11                      Province = c.EndAddress.HeadAddress.Province,
    12                      City = c.EndAddress.HeadAddress.City,
    13                      County = c.EndAddress.HeadAddress.County
    14                  });
    15                  foreach (var item in temp) ;

      对应的SQL:

    1 SELECT 
    2 [Extent1].[Name] AS [Name], 
    3 [Extent1].[Sex] AS [Sex], 
    4 [Extent1].[Province] AS [Province], 
    5 [Extent1].[City] AS [City], 
    6 [Extent1].[County] AS [County], 
    7 [Extent1].[Street] AS [Street], 
    8 [Extent1].[Number] AS [Number]
    9 FROM [dbo].[CustomerView] AS [Extent1]
     1 SELECT 
     2 [Extent1].[EndAddressId] AS [EndAddressId], 
     3 [Extent1].[Name] AS [Name], 
     4 [Extent1].[Sex] AS [Sex], 
     5 [Extent2].[Street] AS [Street], 
     6 [Extent2].[Number] AS [Number], 
     7 [Extent3].[Province] AS [Province], 
     8 [Extent3].[City] AS [City], 
     9 [Extent3].[County] AS [County]
    10 FROM   [dbo].[CustomerSet] AS [Extent1]
    11 INNER JOIN [dbo].[EndAddressSet] AS [Extent2] ON [Extent1].[EndAddressId] = [Extent2].[Id]
    12 INNER JOIN [dbo].[HeadAddressSet] AS [Extent3] ON [Extent2].[HeadAddressId] = [Extent3].[Id]

      结果:

      

      在接下来的所有统计中,我都没有把第1次(即上图中的0次)的时间算在平均时间内(因为EF第一次访问有初始时间)。可见使用视图做遍历效果并没有提升,但是当我把测试代码改为:

     1                  // 视图(ToList)
     2                  var temp = _DataContext.CustomerView.ToList();
     3                  foreach (var item in temp) ;
     4                  // 跨表(ToList)
     5                  var temp = _DataContext.CustomerSet.Select(c => new
     6                  {
     7                      Name = c.Name,
     8                      Sex = c.Sex,
     9                      Street = c.EndAddress.Street,
    10                      Number = c.EndAddress.Number,
    11                      Province = c.EndAddress.HeadAddress.Province,
    12                      City = c.EndAddress.HeadAddress.City,
    13                      County = c.EndAddress.HeadAddress.County
    14                  }).ToList();

      时,我发现效率发生了明显改变:

      我们看到,视图ToList所用时间与上次相比几乎一致,甚至还有缩短,而使用跨表查找然后ToList耗时大大增加。至于原因,我认为可能是视图的ToList结果在数据结构内部为我们节省了非常多的工作。

      2、视图 vs 跨表:遍历省份是“湖北”的用户信息(HeadAddress、EndAddress、Customer中的字段)  

     1                 // 视图
     2                 var temp = _DataContext.CustomerView.Where(c => c.Province == "湖北");
     3                 foreach (var item in temp) ;
     4                 // 跨表
     5                 var temp = _DataContext.CustomerSet.Where(c => c.EndAddress.HeadAddress.Province == "湖北")
     6                     .Select(c => new
     7                     {
     8                         Name = c.Name,
     9                         Sex = c.Sex,
    10                         Street = c.EndAddress.Street,
    11                         Number = c.EndAddress.Number,
    12                         Province = c.EndAddress.HeadAddress.Province,
    13                         City = c.EndAddress.HeadAddress.City,
    14                         County = c.EndAddress.HeadAddress.County
    15                     });
    16                 foreach (var item in temp) ;

      对应的SQL与上面的非常相似,就是在最后多了一个Where语句,结果:

      我们发现两者时间消耗基本一致,同样如果改为使用ToList的话,使用视图会比跨表查询快5ms左右。

      3、Foreach vs Linq:测试把所有不为性别空的数据输出为List<T>,source是用户信息集

    1                 var source = _DataContext.CustomerSet;
    2                 //  foreach
    3                 List<Customer> temp = new List<Customer>();
    4                 foreach (var item in source)
    5                     if (item.Sex != null)
    6                         temp.Add(item);
    7                 // Linq
    8                 source.Where(c => c.Sex != null).ToList();

      它们执行的SQL语句:

    1 SELECT 
    2 [Extent1].[Id] AS [Id], 
    3 [Extent1].[Name] AS [Name], 
    4 [Extent1].[Sex] AS [Sex], 
    5 [Extent1].[EndAddressId] AS [EndAddressId]
    6 FROM [dbo].[CustomerSet] AS [Extent1]
    1 SELECT 
    2 [Extent1].[Id] AS [Id], 
    3 [Extent1].[Name] AS [Name], 
    4 [Extent1].[Sex] AS [Sex], 
    5 [Extent1].[EndAddressId] AS [EndAddressId]
    6 FROM [dbo].[CustomerSet] AS [Extent1]
    7 WHERE [Extent1].[Sex] IS NOT NULL

      使用Foreach的时候是全部查询出来,然后进行筛选,而使用Linq的Where则是先筛选了Sex Not Null的数据,再组成List。我想大家都能猜到结果了,没错,Linq大大领先传统的foreach:

      这也验证了一点,ToList()效率确实非常地高!

      4、SelectMany vs Select New:使用SelectMany和new生成两次分组的数据,source是所有的用户信息,生成的分组数据是先按姓别分组,再按省份分组的数据集,要求保存两次分组的依据

     1                 var source = _DataContext.CustomerView;
     2                 // SelectMany
     3                 var result = source
     4                     .GroupBy(c => c.Sex)
     5                     .SelectMany(c => c
     6                         .GroupBy(a => a.Province)
     7                         .GroupBy(a => c.Key));
     8                 foreach (var items in result)
     9                     foreach (var item in items)
    10                         ;
    11                 // Select New
    12                 var result = source
    13                     .GroupBy(c => c.Sex)
    14                     .Select(c => new {
    15                         Key = c.Key,
    16                         Value = c.GroupBy(a => a.Province)
    17                     });
    18                 foreach (var items in result)
    19                     foreach (var item in items)
    20                         ;

      使用SelectMany得到的结果数据类型:

      使用Select New得到的结果数据类型:

      SelectMany执行的SQL:

     1 SELECT 
     2 [Project6].[Sex] AS [Sex], 
     3 [Project6].[C2] AS [C1], 
     4 [Project6].[C1] AS [C2], 
     5 [Project6].[C4] AS [C3], 
     6 [Project6].[Province] AS [Province], 
     7 [Project6].[C3] AS [C4], 
     8 [Project6].[Name] AS [Name], 
     9 [Project6].[Sex1] AS [Sex1], 
    10 [Project6].[Province1] AS [Province1], 
    11 [Project6].[City] AS [City], 
    12 [Project6].[County] AS [County], 
    13 [Project6].[Street] AS [Street], 
    14 [Project6].[Number] AS [Number]
    15 FROM ( SELECT 
    16     [Project4].[C1] AS [C1], 
    17     [Project4].[Sex] AS [Sex], 
    18     [Project4].[C2] AS [C2], 
    19     [Filter3].[Province1] AS [Province], 
    20     [Filter3].[Name] AS [Name], 
    21     [Filter3].[Sex] AS [Sex1], 
    22     [Filter3].[Province2] AS [Province1], 
    23     [Filter3].[City] AS [City], 
    24     [Filter3].[County] AS [County], 
    25     [Filter3].[Street] AS [Street], 
    26     [Filter3].[Number] AS [Number], 
    27     CASE WHEN ([Filter3].[Province1] IS NULL) THEN CAST(NULL AS int) WHEN ([Filter3].[Name] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3], 
    28     CASE WHEN ([Filter3].[Province1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C4]
    29     FROM   (SELECT 
    30         [Distinct3].[C1] AS [C1], 
    31         [Distinct1].[Sex] AS [Sex], 
    32         1 AS [C2]
    33         FROM   (SELECT DISTINCT 
    34             [Extent1].[Sex] AS [Sex]
    35             FROM [dbo].[CustomerView] AS [Extent1] ) AS [Distinct1]
    36         CROSS APPLY  (SELECT DISTINCT 
    37             [Distinct1].[Sex] AS [C1]
    38             FROM ( SELECT DISTINCT 
    39                 [Extent2].[Province] AS [Province]
    40                 FROM [dbo].[CustomerView] AS [Extent2]
    41                 WHERE ([Distinct1].[Sex] = [Extent2].[Sex]) OR (([Distinct1].[Sex] IS NULL) AND ([Extent2].[Sex] IS NULL))
    42             )  AS [Distinct2] ) AS [Distinct3] ) AS [Project4]
    43     OUTER APPLY  (SELECT [Distinct4].[Province] AS [Province1], [Extent4].[Name] AS [Name], [Extent4].[Sex] AS [Sex], [Extent4].[Province] AS [Province2], [Extent4].[City] AS [City], [Extent4].[County] AS [County], [Extent4].[Street] AS [Street], [Extent4].[Number] AS [Number]
    44         FROM   (SELECT DISTINCT 
    45             [Extent3].[Province] AS [Province]
    46             FROM [dbo].[CustomerView] AS [Extent3]
    47             WHERE ([Project4].[Sex] = [Extent3].[Sex]) OR (([Project4].[Sex] IS NULL) AND ([Extent3].[Sex] IS NULL)) ) AS [Distinct4]
    48         LEFT OUTER JOIN [dbo].[CustomerView] AS [Extent4] ON (([Project4].[Sex] = [Extent4].[Sex]) OR (([Project4].[Sex] IS NULL) AND ([Extent4].[Sex] IS NULL))) AND ([Distinct4].[Province] = [Extent4].[Province])
    49         WHERE ([Project4].[C1] = [Project4].[Sex]) OR (([Project4].[C1] IS NULL) AND ([Project4].[Sex] IS NULL)) ) AS [Filter3]
    50 )  AS [Project6]
    51 ORDER BY [Project6].[Sex] ASC, [Project6].[C1] ASC, [Project6].[C4] ASC, [Project6].[Province] ASC, [Project6].[C3] ASC

      Select New执行的SQL:

     1 SELECT 
     2 [Project4].[C1] AS [C1], 
     3 [Project4].[Sex] AS [Sex], 
     4 [Project4].[C3] AS [C2], 
     5 [Project4].[Province] AS [Province], 
     6 [Project4].[C2] AS [C3], 
     7 [Project4].[Name] AS [Name], 
     8 [Project4].[Sex1] AS [Sex1], 
     9 [Project4].[Province1] AS [Province1], 
    10 [Project4].[City] AS [City], 
    11 [Project4].[County] AS [County], 
    12 [Project4].[Street] AS [Street], 
    13 [Project4].[Number] AS [Number]
    14 FROM ( SELECT 
    15     [Project2].[Sex] AS [Sex], 
    16     [Project2].[C1] AS [C1], 
    17     [Join1].[Province1] AS [Province], 
    18     [Join1].[Name] AS [Name], 
    19     [Join1].[Sex] AS [Sex1], 
    20     [Join1].[Province2] AS [Province1], 
    21     [Join1].[City] AS [City], 
    22     [Join1].[County] AS [County], 
    23     [Join1].[Street] AS [Street], 
    24     [Join1].[Number] AS [Number], 
    25     CASE WHEN ([Join1].[Province1] IS NULL) THEN CAST(NULL AS int) WHEN ([Join1].[Name] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2], 
    26     CASE WHEN ([Join1].[Province1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3]
    27     FROM   (SELECT 
    28         [Distinct1].[Sex] AS [Sex], 
    29         1 AS [C1]
    30         FROM ( SELECT DISTINCT 
    31             [Extent1].[Sex] AS [Sex]
    32             FROM [dbo].[CustomerView] AS [Extent1]
    33         )  AS [Distinct1] ) AS [Project2]
    34     OUTER APPLY  (SELECT [Distinct2].[Province] AS [Province1], [Extent3].[Name] AS [Name], [Extent3].[Sex] AS [Sex], [Extent3].[Province] AS [Province2], [Extent3].[City] AS [City], [Extent3].[County] AS [County], [Extent3].[Street] AS [Street], [Extent3].[Number] AS [Number]
    35         FROM   (SELECT DISTINCT 
    36             [Extent2].[Province] AS [Province]
    37             FROM [dbo].[CustomerView] AS [Extent2]
    38             WHERE ([Project2].[Sex] = [Extent2].[Sex]) OR (([Project2].[Sex] IS NULL) AND ([Extent2].[Sex] IS NULL)) ) AS [Distinct2]
    39         LEFT OUTER JOIN [dbo].[CustomerView] AS [Extent3] ON (([Project2].[Sex] = [Extent3].[Sex]) OR (([Project2].[Sex] IS NULL) AND ([Extent3].[Sex] IS NULL))) AND ([Distinct2].[Province] = [Extent3].[Province]) ) AS [Join1]
    40 )  AS [Project4]
    41 ORDER BY [Project4].[Sex] ASC, [Project4].[C3] ASC, [Project4].[Province] ASC, [Project4].[C2] ASC

      上面两种方法,都保留了每层的分组依据(性别、省份)的值,语法上来看,可能SelectMany更紧凑,Select New更为清晰,不过效率上由于SelectMany的投影操作,所以耗时会更多一些:

      不过我们也可以看到,3W条数据,时间差别也并不是很大。如果我们只需要保存最后一层分组依据(省份)的值,把测试代码改为:

    1                // 只保留省份分组Key
    2                var result = source
    3                     .GroupBy(c => c.Sex)
    4                     .Select(c => c
    5                         .GroupBy(a => a.Province));
    6                 foreach (var items in result)
    7                     foreach (var item in items)
    8                         ;

      这样,消耗的时间平均会在309.4ms左右,但就不知道哪个组是哪个性别了:

      4、SelectMany vs Double Foreach:测试使用SelectMany和双重循环来遍历两次分组后的数据,并统计生成List<T>,source是:先按姓别分组再按省份分组的数据集,List<T>是:各姓别在各省份的人数,T:Sex,Province,Count。

     1 // 临时数据结果类
     2 class TempDTO { public bool? Sex; public string Province; public int Count;}
     3                // 数据源
     4                 var source = _DataContext.CustomerView.GroupBy(c => c.Sex)
     5                              .Select(c => new
     6                              {
     7                                  Key = c.Key,
     8                                  Value = c.GroupBy(b => b.Province)
     9                              });
    10                 // SelectMany
    11                 var temp = source.SelectMany(c => c.Value.Select(b => new TempDTO()
    12                     {
    13                         Sex = c.Key,
    14                         Province = b.Key,
    15                         Count = b.Count()
    16                     })).ToList();
    17                 // 双得Foreach
    18                 List<TempDTO> temp = new List<TempDTO>();
    19                 foreach (var items in source)
    20                 {
    21                     bool? sex = items.Key;
    22                     foreach (var item in items.Value)
    23                     {
    24                         temp.Add(new TempDTO()
    25                         {
    26                             Sex = sex,
    27                             Province = item.Key,
    28                             Count = item.Count()
    29                         });
    30                     }
    31                 }

      结果:

      产生这么悬殊的结果也出乎我的意料,起初我认为是因为SelectMany中ToList的原因,但是后来更改了测试方法并没有改变这一现象,而且上面两种方法得到的结果也是完全一致的。于是我想可能是由于Linq的延时查询技术在起作用。因为source返回的结果类型是IQuerable<T>,它并没有真实地查询,在使用SelectMany时会对生成的SQL语句一起进行优化,而Foreach则是先把source中的每个结果都算了出来,再一个一个地填。验证的方法很简单,把source添加一个ToList()就行了:

    1    var source = _DataContext.CustomerView.GroupBy(c => c.Sex)
    2                 .Select(c => new
    3                 {
    4                     Key = c.Key,
    5                     Value = c.GroupBy(b => b.Province)
    6                 }).ToList();

      所得测试结果非常小,都在1ms左右,多次测试难以认定哪种方法更好。于是我增加了50次循环量,所得结果:

      可见SelectMany和双重Foreach在执行效率上实际上是一致的,当然前提说数据源是已经计算好的。

    三、总结

      写了一段时间的数据库,经常会被这些问题困扰,担心这担心那,于是便有此文,总的来说,这些方法都差不多,只是在不同的应用环境下(关键是:是否要把数据ToList保存起来)有不同的结果。

    转载请注明原址:http://www.cnblogs.com/lekko/archive/2013/01/03/2843080.html 

  • 相关阅读:
    第三十五篇 os模块、sys模块、json模块、pickle模块
    第三十三篇 包
    <词云图>疾风剑豪-亚索词云图
    <爬虫>常见网址的爬虫整理
    <爬虫>反反爬虫的各种知识
    <爬虫>崔庆才的爬虫课
    <随便写>番茄工作法笔记
    <就业指导>为了找到更好的工作
    <人事面试>人事面试整理
    <面试题>面试题整理(101-200)
  • 原文地址:https://www.cnblogs.com/lekko/p/2843080.html
Copyright © 2020-2023  润新知