• HttpClient的使用


    Should we create a new single instance of HttpClient for all requests?

     

    recently I came across this blog post from asp.net monsters which talks about issues with using HttpClientin following way:

    using(var client = new HttpClient())
    {
    }
    

    As per the blog post, if we dispose the HttpClient after every request it can keep the TCP connections open. This can potentially lead to System.Net.Sockets.SocketException.

    The correct way as per the post is to create a single instance of HttpClient as it helps to reduce waste of sockets.

    From the post:

    If we share a single instance of HttpClient then we can reduce the waste of sockets by reusing them:

    namespace ConsoleApplication
    {
        public class Program
        {
            private static HttpClient Client = new HttpClient();
            public static void Main(string[] args)
            {
                Console.WriteLine("Starting connections");
                for(int i = 0; i<10; i++)
                {
                    var result = Client.GetAsync("http://aspnetmonsters.com").Result;
                    Console.WriteLine(result.StatusCode);
                }
                Console.WriteLine("Connections done");
                Console.ReadLine();
            }
        }
    }
    

    I have always disposed HttpClient object after using it as I felt this is the best way of using it. But this blog post now makes me feel I was doing it wrong all this long.

    It seems like a compelling blog post. However, before making a decision, I would first run the same tests that the blog writer ran, but on your own code. I would also try and find out a bit more about HttpClient and its behavior.

    This post states:

    An HttpClient instance is a collection of settings applied to all requests executed by that instance. In addition, every HttpClient instance uses its own connection pool, isolating its requests from requests executed by other HttpClient instances.

    So what is probably happening when an HttpClient is shared is that the connections are being reused, which is fine if you don't require persistent connections. The only way you're going to know for sure whether or not this matters for your situation is to run your own performance tests.

    If you dig, you'll find several other resources that address this issue (including a Microsoft Best Practices article), so it's probably a good idea to implement anyway (with some precautions).

    References

    You're Using Httpclient Wrong and It Is Destabilizing Your Software
    Singleton HttpClient? Beware of this serious behaviour and how to fix it
    Microsoft Patterns and Practices - Performance Optimization: Improper Instantiation
    Single instance of reusable HttpClient on Code Review
    Singleton HttpClient doesn't respect DNS changes (CoreFX)
    General advice for using HttpClient

    To properly close the TCP connection, we need to complete a FIN - FIN+ACK - ACK packet sequence (just like SYN - SYN+ACK - ACK, when opening a TCP connection). If we just call a .Close() method (usually happens when an HttpClient is disposing), and we don't wait for the remote side to confirm our close request (with FIN+ACK), we end up with the TIME_WAIT state on the local TCP port, because we disposed our listener (HttpClient) and we never got the chance to reset the port state to a proper closed state, once the remote peer sends us the FIN+ACK packet.

    The proper way to close the TCP connection would be to call the .Close() method and wait for the close event from the other side (FIN+ACK) to arrive on our side. Only then we can send our final ACK and dispose the HttpClient.

    Just to add, it makes sense to keep TCP connections open, if you are performing HTTP requests, because of the "Connection: Keep-Alive" HTTP header. Further more, you might ask the remote peer to close the connection for you, instead, by setting the HTTP header "Connection: Close". That way, your local ports will always be properly closed, instead of being in a TIME_WAIT state.

    What is the overhead of creating a new HttpClient per call in a WebAPI client?

    HttpClient has been designed to be re-used for multiple calls. Even across multiple threads. The HttpClientHandler has Credentials and Cookies that are intended to be re-used across calls. Having a new HttpClient instance requires re-setting up all of that stuff. Also, the DefaultRequestHeaders property contains properties that are intended for multiple calls. Having to reset those values on each request defeats the point.

    Another major benefit of HttpClient is the ability to add HttpMessageHandlers into the request/response pipeline to apply cross cutting concerns. These could be for logging, auditing, throttling, redirect handling, offline handling, capturing metrics. All sorts of different things. If a new HttpClient is created on each request, then all of these message handlers need to be setup on each request and somehow any application level state that is shared between requests for these handlers also needs to be provided.

    The more you use the features of HttpClient, the more you will see that reusing an existing instance makes sense.

    However, the biggest issue, in my opinion is that when a HttpClient class is disposed, it disposes HttpClientHandler, which then forcibly closes the TCP/IP connection in the pool of connections that is managed by ServicePointManager. This means that each request with a new HttpClient requires re-establishing a new TCP/IP connection.

    From my tests, using plain HTTP on a LAN, the performance hit is fairly negligible. I suspect this is because there is an underlying TCP keepalive that is holding the connection open even when HttpClientHandler tries to close it.

    On requests that go over the internet, I have seen a different story. I have seen a 40% performance hit due to having to re-open the request every time.

    I suspect the hit on a HTTPS connection would be even worse.

    My advice is to keep an instance of HttpClient for the lifetime of your application for each distinct API that you connect to.

    If you want your application to scale, the difference is HUGE! Depending on the load, you will see very different performance numbers. As Darrel Miller mentions, the HttpClient was designed to be re-used across requests. This was confirmed by guys on the BCL team who wrote it.

    A recent project I had was to help a very large and well-known online computer retailer scale out for Black Friday/holiday traffic for some new systems. We ran into some performance issues around the usage of HttpClient. Since it implements IDisposable, the devs did what you would normally do by creating an instance and placing it inside of a using() statement. Once we started load testing the app brought the server to its knees - yes, the server not just the app. The reason is that every instance of HttpClient opens a port on the server. Because of non-deterministic finalization of GC and the fact that you are working with computer resources that span across multiple OSI layers, closing network ports can take a while. In fact Windows OS itself can take up to 20 secs to close a port (per Microsoft). We were opening ports faster than they could be closed - server port exhaustion which hammered the CPU to 100%. My fix was to change the HttpClient to a static instance which solved the problem. Yes, it is a disposable resource, but any overhead is vastly outweighed by the difference in performance. I encourage you to do some load testing to see how your app behaves.

    You can also check out the WebAPI Guidance page for documentation and example at https://www.asp.net/web-api/overview/advanced/calling-a-web-api-from-a-net-client

    Pay special attention to this call-out:

    HttpClient is intended to be instantiated once and re-used throughout the life of an application. Especially in server applications, creating a new HttpClient instance for every request will exhaust the number of sockets available under heavy loads. This will result in SocketException errors.

    If you find that you need to use a static HttpClient with different headers, base address, etc. what you will need to do is to create the HttpRequestMessage manually and set those values on the HttpRequestMessage. Then, use the HttpClient:SendAsync(HttpRequestMessage requestMessage, ...)

     

    HttpClient does not throw an exception when the HTTP response contains an error code. Instead, the IsSuccessStatusCode property is false if the status is an error code. If you prefer to treat HTTP error codes as exceptions, call HttpResponseMessage.EnsureSuccessStatusCode on the response object. EnsureSuccessStatusCode throws an exception if the status code falls outside the range 200–299. Note that HttpClient can throw exceptions for other reasons — for example, if the request times out.

     

     

     

    Call a Web API From a .NET Client (C#)

    HttpClient is intended to be instantiated once and reused throughout the life of an application. The following conditions can result in SocketException errors:

    • Creating a new HttpClient instance per request.
    • Server under heavy load.

    Creating a new HttpClient instance per request can exhaust the available sockets.

     

     

    But the most visible examples from Microsoft don't call Dispose() either explicitly or implicitly. For instance:

    n the announcement's comments, someone asked the Microsoft employee:

    After checking your samples, I saw that you didn't perform the dispose action on HttpClient instance. I have used all instances of HttpClient with using statement on my app and I thought that it is the right way since HttpClient implements the IDisposable interface. Am I on the right path?

    His answer was:

    In general that is correct although you have to be careful with "using" and async as they dont' really mix in .Net 4, In .Net 4.5 you can use "await" inside a "using" statement.

    Btw, you can reuse the same HttpClient as many times are [as] you like so typically you won't create/dispose them all the time.

    The second paragraph is superfluous to this question, which is not concerned about how many times you can use an HttpClient instance, but about if it is necessary to dispose it after you no longer need it.

    This is plain wrong: "As a rule, when you use an IDisposable object, you should declare and instantiate it in a using statement". I would read the documentation on the class implementing IDisposable always before deciding whether I should use a using for it. As the author of libraries where I implement IDisposable becuase need to release unmanged resources, i would be horrified if consumers created disposed an instance each time instead of re-using an existing instance. That is not to say don't dispose of instance eventually.

     

    Do HttpClient and HttpClientHandler have to be disposed?

    The general consensus is that you do not (should not) need to dispose of HttpClient.

    Many people who are intimately involved in the way it works have stated this.

    See Darrel Miller's blog post and a related SO post: HttpClient crawling results in memory leak for reference.

    I'd also strongly suggest that you read the HttpClient chapter from Designing Evolvable Web APIs with ASP.NET for context on what is going on under the hood, particularly the "Lifecycle" section quoted here:

    Although HttpClient does indirectly implement the IDisposable interface, the standard usage of HttpClient is not to dispose of it after every request. The HttpClient object is intended to live for as long as your application needs to make HTTP requests. Having an object exist across multiple requests enables a place for setting DefaultRequestHeaders and prevents you from having to re-specify things like CredentialCache and CookieContainer on every request as was necessary with HttpWebRequest.

    Or even open up DotPeek.

    To clarify your answer, would it be correct to say that "you do not need to dispose of HttpClient IF YOU HOLD ON TO THE INSTANCE TO REUSE IT LATER"? For instance, if a method is called repeatedly and creates a new HttpClient instance (even though it's not the recommended pattern in most cases), would it still be correct to say this method should not dispose the instance (that will not be reused)? It could lead to thousands of undisposed instances. In other words, that you should try and reuse the instances, but if you don't reuse, you'd better dispose them (to release the connections)?

    Yes. If for some reason you do repeatedly create and destroy HttpClient instances then yes, you should Dispose it. I'm not suggesting to ignore the IDisposable interface, just trying to encourage people to re-use instances. 

    Just to add further credence to this answer, I spoke with the HttpClient team today and they confirmed that HttpClient was not designed to be used per-request. An instance of HttpClient should be kept alive whilst a client application continues to interact with a particular host.

    The current answers are a bit confusing and misleading, and they are missing some important DNS implications. I'll try to summarize where things stand clearly.

    1. Generally speaking most IDisposable objects should ideally be disposed when you are done with them, especially those that own Named/shared OS resourcesHttpClient is no exception, since as Darrel Miller points out it allocates cancellation tokens, and request/response bodies can be unmanaged streams.
    2. However, the best practice for HttpClient says you should create one instance and reuse it as much as possible (using its thread-safe members in multi-threaded scenarios). Therefore, in most scenarios you'll never dispose of it simply because you will be needing it all the time.
    3. The problem with re-using the same HttpClient "forever" is that the underlying HTTP connection might remain open against the originally DNS-resolved IP, regardless of DNS changes. This can be an issue in scenarios like blue/green deployment and DNS-based failover. There are various approaches for dealing with this issue, the most reliable one involving the server sending out a Connection:close header after DNS changes take place. Another possibility involves recycling the HttpClient on the client side, either periodically or via some mechanism that learns about the DNS change. See https://github.com/dotnet/corefx/issues/11224 for more information (I suggest reading it carefully before blindly using the code suggested in the linked blog post).

     If you do need to dispose the HttpClient for whatever reason, you should keep a static instance of the HttpMessageHandler around, as disposing that one is actually the cause of the issues attributed to disposing the HttpClient. HttpClient has a constructor overload that allows you to specify that the supplied handler should not be disposed, in which case you can reuse the HttpMessageHandler with other HttpClient instances.

    In my understanding, calling Dispose() is necessary only when it's locking resources you need later (like a particular connection). It's always recommended to free resources you're no longer using, even if you don't need them again, simply because you shouldn't generally be holding onto resources you're not using (pun intended).

    The Microsoft example is not incorrect, necessarily. All resources used will be released when the application exits. And in the case of that example, that happens almost immediately after the HttpClient is done being used. In like cases, explicitly calling Dispose() is somewhat superfluous.

    But, in general, when a class implements IDisposable, the understanding is that you should Dispose() of its instances as soon as you're fully ready and able. I'd posit this is particularly true in cases like HttpClient wherein it's not explicitly documented as to whether resources or connections are being held onto/open. In the case wherein the connection will be reused again [soon], you'll want to forgo Dipose()ing of it -- you're not "fully ready" in that case.

    See also: IDisposable.Dispose Method and When to call Dispose

    Since it doesn't appear that anyone has mentioned it here yet, the new best way to manage HttpClient and HttpClientHandler in .Net Core 2.1 is using HttpClientFactory.

    It solves most of the aforementioned issues and gotchas in a clean and easy to use way. From Steve Gordon's great blog post:

    Add the following packages to your .Net Core (2.1.1 or later) project:

    Microsoft.AspNetCore.All
    Microsoft.Extensions.Http
    

    Add this to Startup.cs:

    services.AddHttpClient();
    

    Inject and use:

    [Route("api/[controller]")]
    public class ValuesController : Controller
    {
        private readonly IHttpClientFactory _httpClientFactory;
    
        public ValuesController(IHttpClientFactory httpClientFactory)
        {
            _httpClientFactory = httpClientFactory;
        }
    
        [HttpGet]
        public async Task<ActionResult> Get()
        {
            var client = _httpClientFactory.CreateClient();
            var result = await client.GetStringAsync("http://www.google.com");
            return Ok(result);
        }
    }
    

    Explore the series of posts in Steve's blog for lots more features.

    HttpClient, it lives, and it is glorious.

    Along with the latest release of WCF Web API there was a updated version of HTTPClient .  With it came a bunch of breaking changes, most notably, there are no more Sync methods for doing HTTP requests.  This is a change that brings consistency with Microsoft’s new policy that all APIs that take more than 30ms (or is it 50ms?) should be async requests.  Yes, it’s a bit annoying to get used to, but I believe in the long run it will be worth it.

    YOU'RE USING HTTPCLIENT WRONG AND IT IS DESTABILIZING YOUR SOFTWARE

    I’ve been using HttpClient wrong for years and it finally came back to bite me. My site was unstable and my clients furious, with a simple fix performance improved greatly and the instability disapeared.

    At the same time I actually improved the performance of the application through more efficient socket usage.

    Microservices can be a bear to deal with. As more services are added and monoliths are broken down there tends to be more communication paths between services. There are many options for communicating, but HTTP is an ever popular option. If the microservies are built in C# or any .NET language then chances are you’ve made use of HttpClient. I know I did.

    The typical usage pattern looked a little bit like this:

    using(var client = new HttpClient())
    {
    //do something with http client
    }

    Here’s the Rub

    The using statement is a C# nicity for dealing with disposable objects. Once the using block is complete then the disposable object, in this case HttpClient, goes out of scope and is disposed.

    The dispose method is called and whatever resources are in use are cleaned up. This is a very typical pattern in .NET and we use it for everything from database connections to stream writers. Really any object which has external resources that must be clean up uses the IDisposable interface.

    And you can’t be blamed for wanting to wrap it with the using. First of all, it’s considered good practice to do so. In fact, the official docs for using state:

    As a rule, when you use an IDisposable object, you should declare and instantiate it in a using statement.

    Secondly, all code you may have seen since…the inception of HttpClient would have told you to use a using statement block, including recent docs on the ASP.NET site itself. The internet is generally in agreement as well.

    But HttpClient is different. Although it implements the IDisposable interface it is actually a shared object. This means that under the covers it is reentrant) and thread safe. Instead of creating a new instance of HttpClient for each execution you should share a single instance of HttpClient for the entire lifetime of the application. 

    See For Yourself

    Here is a simple program written to demonstrate the use of HttpClient:


    using System;
    using System.Net.Http;

    namespace ConsoleApplication
    {
    public class Program
    {
    public static async Task Main(string[] args)
    {
    Console.WriteLine("Starting connections");
    for(int i = 0; i<10; i++)
    {
    using(var client = new HttpClient())
    {
    var result = await client.GetAsync("http://aspnetmonsters.com");
    Console.WriteLine(result.StatusCode);
    }
    }
    Console.WriteLine("Connections done");
    }
    }
    }

    This will open up 10 requests to one of the best sites on the internet http://aspnetmonsters.com and do a GET. We just print the status code so we know it is working. The output is going to be:

    C:codesocket> dotnet run
    Project socket (.NETCoreApp,Version=v1.0) will be compiled because inputs were modified
    Compiling socket for .NETCoreApp,Version=v1.0

    Compilation succeeded.
    0 Warning(s)
    0 Error(s)

    Time elapsed 00:00:01.2501667


    Starting connections
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    OK
    Connections done

    But Wait, There’s More!

    All work and everything is right with the world. Except that it isn’t. If we pull out the netstat tool and look at the state of sockets on the machine running this we’ll see:

    C:codesocket>NETSTAT.EXE
    ...
    Proto Local Address Foreign Address State
    TCP 10.211.55.6:12050 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12051 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12053 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12054 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12055 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12056 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12057 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12058 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12059 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12060 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12061 waws-prod-bay-017:http TIME_WAIT
    TCP 10.211.55.6:12062 waws-prod-bay-017:http TIME_WAIT
    TCP 127.0.0.1:1695 SIMONTIMMS742B:1696 ESTABLISHED
    ...

    Huh, that’s weird…the application has exited and yet there are still a bunch of these connections open to the Azure machine which hosts the ASP.NET Monsters website. They are in the TIME_WAIT state which means that the connection has been closed on one side (ours) but we’re still waiting to see if any additional packets come in on it because they might have been delayed on the network somewhere. Here is a diagram of TCP/IP states I stole from https://www4.cs.fau.de/Projects/JX/Projects/TCP/tcpstate.html.

    Windows will hold a connection in this state for 240 seconds (It is set by [HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesTcpipParametersTcpTimedWaitDelay]). There is a limit to how quickly Windows can open new sockets so if you exhaust the connection pool then you’re likely to see error like:

    Unable to connect to the remote server
    System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network a

    Searching for that in the Googles will give you some terrible advice about decreasing the connection timeout. In fact, decreasing the timeout can lead to other detrimental consequences when applications that properly use HttpClient or similar constructs are run on the server. We need to understand what “properly” means and fix the underlying problem instead of tinkering with machine level variables.

    The Fix is In

    I really must thank Harald S. Ulrksen and Darrel Miller for pointing me to The Patterns and Practices documents on this.

    If we share a single instance of HttpClient then we can reduce the waste of sockets by reusing 

    them:

    using System;
    using System.Net.Http;

    namespace ConsoleApplication
    {
    public class Program
    {
    private static HttpClient Client = new HttpClient();
    public static async Task Main(string[] args)
    {
    Console.WriteLine("Starting connections");
    for(int i = 0; i<10; i++)
    {
    var result = await Client.GetAsync("http://aspnetmonsters.com");
    Console.WriteLine(result.StatusCode);
    }
    Console.WriteLine("Connections done");
    Console.ReadLine();
    }
    }
    }

    Note here that we have just one instance of HttpClient shared for the entire application. Eveything still works like it use to (actually a little faster due to socket reuse). Netstat now just shows:

    In the production scenario I had the number of sockets was averaging around 4000, and at peak would exceed 5000, effectively crushing the available resources on the server, which then caused services to fall over. After implementing the change, the sockets in use dropped from an average of more than 4000 to being consistently less than 400, and usually around 100.

    This is dramatic. If you have any kind of load at all you need to remember these two things:

    1. Make your HttpClient static.
    2. Do not dispose of or wrap your HttpClient in a using unless you explicitly are looking for a particular behaviour (such as causing your services to fail).

    Wrapping Up

    The socket exhaustion problems we had been struggling with for months disapeared and our client threw a virtual parade. I cannot understate how unobvious this bug was. For years we have been conditioned to dispose of objects that implement IDisposable and many refactoring tools like R# and CodeRush actually warn if you don’t. In this case disposing of HttpClient was the wrong thing to do. It is unfortunate that HttpClient implements IDisposable and encourages the wrong behaviour

  • 相关阅读:
    使用 requests 维持会话
    使用 requests 发送 POST 请求
    使用 requests 发送 GET 请求
    requests 安装
    使用 urllib 分析 Robots 协议
    使用 urllib 解析 URL 链接
    使用 urllib 处理 HTTP 异常
    使用 urllib 处理 Cookies 信息
    使用 urllib 设置代理服务
    按单生产程序发布
  • 原文地址:https://www.cnblogs.com/panpanwelcome/p/12144813.html
Copyright © 2020-2023  润新知