Entity Framework Core performance tuning – a worked example
This is the first in a series of articles on performance tuning EF Core. In this article I take an example book selling site that I have built for my book and see how fast I can get it to perform. I have got some really excellent performance out of Entity Framework Core (EF Core) and I wanted to share it with you all.
- Entity Framework Core performance tuning – a worked example (this article).
- EF Core – Combining SQL and NoSQL for better performance.
- NEW IN 2019 – “Building a robust CQRS database with EF Core and Cosmos DB“
Note: This article is from chapter 13 of my book, and gives more information on the implementation (the chapter is 33 pages long). Manning Publications have offered the coupon to get 37% off my book, “Entity Framework in Action” – use fccefcoresmith.
Executive summary for those in a hurry
I build an example book selling web site using EF Core version 2.0.0 and filled it with 100,000 books and ½ million reviews. I then went through four staged of performance tuning. The figure below shows you the results through these stages for two different, but demanding areas of performance:
- Displaying 100 books out of 100,000 possible books.
- Displaying 100 books out of 100,000 possible books, sorted by average votes
(timings are how long the web request took to finish, as measured by Chrome on my development PC).
The four stages on the graph are:
- Part 1: Working with EF Core built queries
- I get my LINQ code in a form that EF Core can turn it into good SQL
- I performance tune the existing EF Core query by adding a DbFunction.
- Part 2: I build a SQL query by copying EF Core’s SQL and ‘tweaked’ one bit to get some extra performance
- Part 3: I added cached values to the database that held hard-to-calculate values, such as the average votes for a book.
You can try the final, Part 3, “tuned up” example book selling site at http://efcoreinaction.com/.
Note: The site www.efcoreinaction.com isn’t running on some massively powerful cloud system, but on shared hosting site that costs £3.50/month. (see Web Wiz, who are a great hosting site for .NET. They have always been early supporters of new .NET features.)
Setting the scene – my example application
Right from the second chapter of my book I introduce an example book selling site, and as the book progresses I add more feature to the application. Admittedly the site only has a few features, but I purposely included some sorting and filtering features that can be a challenge when the site has a lot of books to show. The diagram below shows you the features of the site.
What makes the book list a challenge?
I tried to create an example application which was at least credible, and included some complicated queries. The diagram below shows you the listing of a book, and some information on how easy or complex it is to get that data from the database.
On top of that the sorting, filtering and paging commands adds another level of complexity. I came up with four sort and filter tests that got progressively difficult, each done with page size of 10 and 100 to see how much showing more books effected performance. Here are the four tests.
- Easy – just show the page, with default everything.
- Moderate – sort the books by their publication date.
- Hard – sort by average votes, which is calculated from all the reviews linked to a book.
- Double hard – filter by average votes and the sort by price, both of which are calculated values.
Note: I did all my testing on my development PC using the ASP.NET Core example book selling site running in debug mode. The timings are from Chrome, and are the total time it takes for the page to return (including downloading the content). For comparison, accessing the About page, which has no database accesses in it, typically takes 11 ms to load, so the database accesses are always going to be slower than that.
My four steps in performance tuning
I went through four iterations of tuning of the application, and I will cover each one in this article. The stages are:
- Tuning up my EF Core query
- Making sure EF Core could translate all parts of my LINQ query to SQL
- Tweaking my query by adding a user defined function
- Going SQL – moving the query to Dapper
- Modifying the database – adding cached values (this is running on efcoreinaction.com)
Part 1a – Getting my query to translate into SQL
I started this performance tuning process by looking at what EF Core can do with well written code. Throughout my book, I have recommended the LINQ Select query, as this creates the most efficient database access. The Select query works well for the book display as it needs to get data from other tables, this the Reviews, and it doesn’t need some of the columns in the Books table, such as the book’s Description column, which could be very long. Here is my Select query object.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
public static IQueryable<BookListDto> MapBookToDto( this IQueryable<Book> books) { return books.Select(p => new BookListDto { BookId = p.BookId, Title = p.Title, Price = p.Price, PublishedOn = p.PublishedOn, ActualPrice = p.Promotion == null ? p.Price : p.Promotion.NewPrice, PromotionPromotionalText = p.Promotion == null ? null : p.Promotion.PromotionalText, AuthorNamesOrdered = p.AuthorsLink .OrderBy(q => q.Order) .Select(q => q.Author.Name), ReviewsCount = p.Reviews.Count, ReviewsAverageVotes = p.Reviews.Select(y => ( double ?)y.NumStars).Average() }); } |
Note the calculation at the end of the MapBookToDto code, which calculates the average votes from all the reviews linked to this book. I took me a bit of time to get the LINQ command in the right format – finally solved it by asking a question of the EF Core issues page. But the result is that average is done in the database using the SQL command AVG. This was critical in getting the performance I needed, not only in calculating the value to so in each book info, but also to make the “sort by average votes” as quick as it was.
Here are the results for 100,000 books (same data as on the www.efcoreinaction.com site).
Test | 10 books | 100 books | Comment |
All books | 45 ms | 230 ms | Sorts on primary key |
Sort by publication dates | 45 ms | 230 ms | Sorts on PublishedOn, with index |
Sort by votes | 500 ms | 650 ms | Must average ½ million reviews! |
Sort by price, filter by 4+ votes | 450 ms | 600 ms | Less books, so quicker |
The “sort on votes” performance, at 500ms sounds slow, but it was a lot quicker than I expected. EF Core produced the ideal SQL, using the SQL AVG command, but it had to be called 100,000 times, and had a total of over ½ million Reviews to process. For some applications, this level of performance would be acceptable, but for this example I deem that this is too slow, and I will be looking again at the “sort on votes” in Part 2 and Part 3.
The one item that did seem very slow was the display of 100 books. I analysed that and, because the Authors names came back as a collection, EF Core ran a separate query for each book – so, instead of 2 database access it had 12 database accesses for 10 books, and 102 database accesses for the 100 books.
I know that every separate database access costs, so I wanted to remove those extra database accesses. And because I knew something that EF Core didn’t know, then I could fix it, as you will see in the next Part.
Part 1b – tweak the query with the DbFunction feature
So, in my first performance tweak, I wanted to see if I keep all the good SQL that EF Core has produced, but inject a bit of my own SQL to improve the performance of the AuthorNamesOrdered part. What I knew was I only wanted the names to concatenate them into a comma-delimited string, and I could do that in SQL very easily.
EF Core 2.0.0, released in August 2017, has a great new feature contribution by Paul Middleton. This allows you to register a SQL user defined function (UDF) with EF Core and then use it in the query. This is a really useful tool, and I expect to be using DbFunctions a lot in the future.
So, with help from Stack Overflow I produced a UDF, as shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
CREATE FUNCTION AuthorsStringUdf (@bookId int ) RETURNS NVARCHAR(4000) AS BEGIN -- Thanks to https://stackoverflow.com/a/194887/1434764 DECLARE @Names AS NVARCHAR(4000) SELECT @Names = COALESCE (@Names + ', ' , '' ) + a. Name FROM Authors AS a, Books AS b, BookAuthor AS ba WHERE ba.BookId = @bookId AND ba.AuthorId = a.AuthorId AND ba.BookId = b.BookId ORDER BY ba.[ Order ] RETURN @Names END GO |
I could then replace the extraction of a list of author’s names with a call to my UDF, as shown below.
1
2
3
4
5
6
7
8
9
10
|
public static IQueryable<BookListDto> MapBookToDto( this IQueryable<Book> books) { return books.Select(p => new BookListDto { //… other code removed for clarity AuthorsOrdered = UdfDefinitions.AuthorsStringUdf(p.BookId), //… etc. }); } |
The UDF returns a single string, so there is no need for extra database accessed. After this change, there were only two databases accessed, a count for paging and then the real query, no matter how many books in a page.
Note: I have not covered how to register the UDF with EF Core, nor the best way to add the UDF to the EF Core created database. There just isn’t room in this article. There is some information on this in the this GitHub documentation. I do have more information on DbFunctions in chapter 8 of my book, and how to add UDFs and other SQL code to an EF Core database in chapter 11.
Before I show you the improved performance figures let me first show you the new SQL that EF Core produces. EF Core now produces very readable SQL, almost like how a human would lay it out (I did a couple of minor changes to the layout to make the diagram work, but is mostly as EF Core created it).
The only change in the SQL from the Part 1a version is the call to the UDF, but the effect on performance is great, especially on the 100-book display, is good. See the figures below
Here are the results for using the 100,000 book test data, with the ‘Improvement’ columns showing how much faster the Part 1b implementation is compared to Part 1a’s performance.
Test | 10 books | Improvement | 100 books | Improvement |
All books | 30 ms | 150% | 80 ms | 250% |
Sort by publication dates | 30 ms | 150% | 80 ms | 250% |
Sort by votes | 480 ms | minor | 530 ms | minor |
Sort by price, filter by 4+ votes | 420 ms | minor | 470 ms | minor |
Part 2 – converting the query to hand-written SQL
When I studied the SQL from part 1b, I noticed that EF Core repeated any calculation in the SQL ORDER BY section, rather than referring to the SELECT’s alias name of the calculated value. I did some tests and they showed using the SELECT’s alias name, ReviewsAverageVote, which holds the calculated average vote, in the ORDER BY section would improve performance.
Based on this I decided to replace the EF Core query with a fully, hand-code SQL command. That meant I had to write the SQL code, but because EF Core’s SQL is so great (and readable), I just copied it (!), and replaced the one area, the ORDER BY part, with some tweaked SQL as shown below – see the ReviewsAverageVotes value used in the ORDER BY at the end of the SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
SELECT TOP (@pageSize) [b].[BookId], [b].[Title], [b].[Price], [b].[PublishedOn], CASE WHEN [p.Promotion].[PriceOfferId] IS NULL THEN [b].[Price] ELSE [p.Promotion].[NewPrice] END AS [ActualPrice], [p.Promotion].[PromotionalText] AS [PromotionPromotionalText], [dbo].AuthorsStringUdf([b].[BookId]) AS [AuthorsOrdered], ( SELECT COUNT (*) FROM [Review] AS [r] WHERE [b].[BookId] = [r].[BookId] ) AS [ReviewsCount], ( SELECT AVG ( CAST ([y].[NumStars] AS float )) FROM [Review] AS [y] WHERE [b].[BookId] = [y].[BookId] ) AS [ReviewsAverageVotes] FROM [Books] AS [b] LEFT JOIN [PriceOffers] AS [p.Promotion] ON [b].[BookId] = [p.Promotion].[BookId] WHERE ([b].[SoftDeleted] = 0) ORDER BY [ReviewsAverageVotes] DESC |
The next problem was how I would execute the SQL. I could have used ADO.NET but the well-respected mini-OR/M called Dapper. Dapper has a feature that maps the data returned from the SQL to a .NET class by matching the class property names to the column names returned. EF Core also has a feature that can provide the correct type of connection for Dapper to access the database, which makes using Dapper pretty easy.
The unit test code below shows how to use Dapper with an EF Core’s application DbContext – the Dapper part is the line “Query<Book>(“SELECT * FROM Books”)” in the code below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
[Fact] public void DapperReadBooksEfCoreSqlServer() { //SETUP //… I removed the options and database setup //so that it’s easier to see the Dapper part using ( var context = new EfCoreContext(options)) { //ATTEMPT var books = context.Database .GetDbConnection() .Query<Book>( "SELECT * FROM Books" ); //VERIFY books.Count().ShouldEqual(4); } } |
The book list query I implemented in chapter 2 was split up into four separate query objects: MapBookToDto, OrderBooksBy, FilterBooksBy, and a generic paging method. That meant each part was easy to understand and could be separately tested. Duplicating this query in Dapper requires me to sacrifice the elegance of query objects and move to a design that combines snippets of SQL to create the query. But that’s what I need to do to squeeze out the best performance.
The results of swapping over to this improved SQL is shown in the table below, with the big change in bold (Sort on Average votes).
Test scenario | Part 1b – EF Core | Part 2 – SQL | Comment |
Show books | 10 =30 ms 100 = 80 ms |
10 =30 ms 100 = 85 ms |
No real change – within normal variations |
Sort on PublishedOn | 10 = 30 ms 100 = 80 ms |
10 = 30 ms 100 = 90 ms |
No real change – within normal variations |
Sort by average votes | 10 = 500 ms 100 = 530 ms |
10 = 325 ms 100 = 390 ms |
The improved SQL is about 40% faster than EF Core |
Filter by average votes (4 or more), sort by price | 10 = 440 ms 100 = 490 ms |
10 = 455 ms 100 = 520 ms |
No real change – within normal variations |
As you can see, the only part that improved was the “sort by average votes” situation. What I found interesting is that none of the other more mundane queries had improved. I kind of expected them to because Dapper’s current site states its ten times faster or more than EF (most likely EF6.x), which didn’t show up in these results.
I therefore tried a simple “read one book row” test, and yes, in that case Dapper is about seven times faster than EF Core. That makes sense, as EF Core has several things to do on loading, like relational fixup, while Dapper just executes the SQL. But it turns out, when it comes to large or complex queries where the database execution is the limiting factor, then the performance is down to the quality SQL produced, and not the library you use. The fact is, small accesses aren’t normally the ones that need performance tuning, as they are quick anyway, using Dapper or EF Core only has a difference if the SQL is different, which in the ”sort by average votes” case it was.
But the question is, “was all that effort to swap to SQL worth it?” Before you decide that you should read Part 3, where I use another approach that take more work, but improves performance much more than my hand-tuned SQL does.
Part 3 – Modifying the database to increase performance
The final stage I decided to try modifying the database and adding cached values for the calculations that take a lot of time. These are:
- The Price, which varies depending on whether there is a PriceOffer promotion linked to the book.
- The average votes and number of reviews for this book
- The AuthorsString, containing the comma-delimited list of authors of a book.
Caching is a great tool for improving performance, but ensuring the cached value is up to date needs very careful handing. This is especially true in a database, where you may have a simultaneous update of the same row, which could make the cached value incorrect.
In chapter 13 in my book I cover how I handled the three types of caching, which need different techniques. But for this article I’m going to focus on the most difficult one of the three, that is the caching of the average votes and number of reviews for this book. This is difficult because I still need to keep the actual Review entities so that users can look at the Review’s comments on a book. This means I have two versions of the average votes value: one that is found by averaging the reviews dynamically, as I have done in Part 1a to 2, or the cached value held in the AverageVotes in my modified Book entity class. Keeping these in set is a challenge.
To keep my cached values up to date I need to make some fairly large changes to how things work, as summarised below:
- I still need the Reviews property, which sets up the one-to-many relationship between a Book entity and its Review But I must ensure that Review entities cannot be added or removed from the Reviews property, that is, all adds or removes must go through my AddReview and RemoveReview methods.I also I need to “lock down” the Review entity class by making its properties have private setters, and stop any methods outside of the DataLayer from being able to create a Review entity instance.
- Because the AverageVotes and ReviewsCount properties are cached values then there is the possibility of a concurrent adding or removing of a Review, which could invalidate the cached values.
I will deal with each of these two stages separately
Lock down access to Reviews collection property
EF Core allows navigational properties to be set up which EF Core will use them when doing certain read and writes to the database. My initial Book entity class had a property called Reviews that contained a list of Review entities for this book instance. I would then add or remove a Review instance from the list to change what reviews where linked to the Book.
For my caching of the values related to the Reviews to be kept up to date I need intercept all changes to the Book’s Reviews. To do that I use a feature in EF Core called backing fields. This allow me to ‘hide’ the Reviews list, and provide a IEnumerable<Review> Reviews property, which a developer cannot add or remove from. Here is what the updated Book entity class looks like
1
2
3
4
5
6
7
8
9
10
|
public class Book { private readonly List<Review> _reviews = new List<Review>(); //… other properties removed for clarity public IEnumerable<Review> Reviews => _reviews.ToList(); … |
Now I can add two methods, AddReview and RemoveReview, to the Book class that update its backing field _reviews and at the same time recalculate the AverageVotes and ReviewCount cache properties.
The other back door to changing the Reviews is if the develop loaded a Review entity and altered its values, say changing the reviews vote from 1 to 5. To stop the developer from do that I set all the Review class property setters to private, and I make the constructor internal, so only my DataLayer can create a new instance of the Review class.
The figure below shows the backing field in the Book class and its AddReview method in blue, with the “locked down” Review class in purple.
Handling simultaneous additions of a Reviews to the same book
The other problem is multiple users adding Reviews at the same time. There is a small window of time between the loading of the current Reviews linked to a Book and the saving of the new review collection and the associated cached values. In that time window, another Review could be added/removed from the same book by another user, which would make the cached values out of step with the database.
The solution to this problem is to use EF Core’s concurrency conflict feature. What I do is add the attribute [ConcurrencyCheck] to the AverageVotes and ReviewsCount properties. This means that EF Core will check they haven’t changed since the Book instance was loaded, which is window where other updates could make my calculations invalid. If they have been changed then I can write some code that recalculates the cached values using the current state of the database. I’m not going to show the code for that because the article is already very long. You can find my concurrency code here.
The performance results from Part 3 – cached values
The results from all this is impressive, as you can see in the table below
Test | 10 books | Improvement over Part 2 (SQL) | 100 books | Improvement over Part 2 (SQL) |
All books | 30 ms | no improvement | 80 ms | no improvement |
Sort by publication dates | 30 ms | no improvement | 80 ms | no improvement |
Sort by votes | 30 ms | 12 times better | 80 ms | 5 times better |
Sort by price, filter by 4+ votes | 30 ms | 12 times better | 80 ms | 5 times better |
Extra notes on performance:
- I tried ½ million books: the 10-book time was 85 ms and the 100-book time was 150 ms. For this amount of books the database server is starting to be a limiting factor.
- In Parts 1a, 1b and 2 if the sort by price, filtered by +1 votes, then the timings were all over a second in length. But this Part 3 implementation stills comes in at 30ms.
- I took the SQL produced by EF Core in Part 3 and used Dapper to execute it, but there was no further performance improvement so it wasn’t worth doing.
Obviously, the standout improvement are the last two items, which were the ones I was targeting. You can see this code in action, with 100,000 books, at http://efcoreinaction.com/
Summary of the whole performance tuning work
Below is a repeat of the chart I had at the beginning that shows you the results for Part 1a to 3 for the 100-book display and 100-book display with sort on average votes.
However, what this graph does convey is that the Part 3 performance improvements make all the 10-book displays, or any sort or filter, come in at around 30 ms. As the 10-book display is the default then the site is now really fast. Also, the 30 ms is the time for the ASP.NET page to display, which includes other overheads outside the database access – in actual fact, the database accesses are between 10 ms to 15 ms.
The other questions I asked myself was how much effort did each stage take and where there any problem or adverse effects to the application? The table below gives my answers to these questions.
Part | Effort | Comments |
1a. Straight EF Core | Few days | Finding the correct format for the LINQ average wasn’t easy. I had to ask the EF Core team via EF Core’s GitHub issue page. But once I got this right EF Core produced great SQL code. |
1b. +DbFunction | ½ day | Very easy. I like the DbFunction a lot, and in this case, it made a significant improvement to the performance. |
2. Better SQL | One day | Fairly easy, as I simply copied the SQL code that EF Core produces and tweaked the bit that my experiments said would help. But only useful if you can come up with some better SQL, i.e. the difference in speed of Dapper over EF Core only helps on small, very quick database accesses, which you don’t need to performance tune anyway! |
3. Cached values | Three days | Definitely hard work, but also a fantastic result. I think the performance changes to the code don’t hinder future development of the application at all, and in some cases improves it. |
Overall I am very pleased with the process. EF Core produces great SQL code from the get go, but only because I made sure my LINQ queries in a way such that EF Core can translate every part into SQL. I had to persevere with the LINQ Average method to get EF Core to translate that to SQL.
Note I think plenty of developer will fall foul of this problem and think EF Core is slow – but before you blame EF Core check your logs for QueryClientEvaluationWarning warnings, which tell you part of your LINQ query didn’t translate to SQL and is being run in software. See the section “Understanding the limitations of Client vs. Server evaluation” in my article about EF Core’s Client vs. Server evaluation feature.
But when I had to go beyond straight EF Core I found there was plenty I could do. And I didn’t go as far as I wanted to due to time and limits on the chapter length – my next step was to change the application’s architecture (I give a few ideas at the end of chapter 13).
Conclusion
I took a query that was designed around a fictitious, but believable example, which wasn’t a walk over to implement. I got the best out of EF Core, and I think that performance was quite credible. I then pushed through two more stages, ending with a complex, but eminently implementable solution where the slow parts of the query were between eight and fifteen times faster from where I started. But more importantly all the queries for a 10-book display are at 30 ms.
You most likely selected EF Core for your database access library because it allowed you to develop your application rapidly (good choice!). That means you should have a working application quickly, and can then you can performance tune the parts that need it.
When you start performance tuning I would recommend persevering with conventional EF Core (see my note about QueryClientEvaluationWarning). But at some point you might decide that the standard EF Core approach won’t produce the performance you need. That’s when this article could be useful to you, so bookmark this URL for that day 。
转载:https://www.thereformedprogrammer.net/entity-framework-core-performance-tuning-a-worked-example/