写在前面
前文讨论了HealthCheck的理论部分,本文将讨论有关HealthCheck的应用内容。
- 可以监视内存、磁盘和其他物理服务器资源的使用情况来了解是否处于正常状态。
- 运行状况检查可以测试应用的依赖项(如数据库和外部服务终结点)以确认是否可用和正常工作。
- 运行状况探测可以由容器业务流程协调程序和负载均衡器用于检查应用的状态。
源码研究
在应用中引入HealthCheck,一般需要配置Startup文件,如下所示:
1: public void ConfigureServices(IServiceCollection services)
2: {
3: services.AddHealthChecks();
4: }
5:
6: public void Configure(IApplicationBuilder app)
7: {
8: app.UseRouting();
9:
10: app.UseEndpoints(endpoints =>
11: {
12: endpoints.MapHealthChecks("/health");
13: });
14: }
其中services.AddHealthChecks();会把我们引入到HealthCheckService的扩展方法中,代码如下:
1: public static class HealthCheckServiceCollectionExtensions
2: {
3: public static IHealthChecksBuilder AddHealthChecks(this IServiceCollection services)
4: {
5: services.TryAddSingleton<HealthCheckService, DefaultHealthCheckService>();
6: services.TryAddEnumerable(ServiceDescriptor.Singleton<IHostedService, HealthCheckPublisherHostedService>());
7: return new HealthChecksBuilder(services);
8: }
9: }
该扩展方法会尝试注册一个HealthCheckService的单例对象。HealthCheckService本身是一个抽象类,它内部含有一个抽象方法,主要用于执行健康检查并返回健康状态的聚合信息。抽象方法如下所示:
1: public abstract Task<HealthReport> CheckHealthAsync(
2: Func<HealthCheckRegistration, bool> predicate,
3: CancellationToken cancellationToken = default);
HealthCheckService有一个默认派生类,就是DefaultHealthCheckService,在其构造方法中,会去验证是否有重复的健康检查名称存在,如果有,就会抛出异常。另外名称的检查是不区分大小写的。该类所实现的抽象方法作为健康检查的核心功能,内部实现还是比较复杂的。
首先我们看一下该方法的实现源码:
1: public override async Task<HealthReport> CheckHealthAsync(
2: Func<HealthCheckRegistration, bool> predicate,
3: CancellationToken cancellationToken = default)
4: {
5: var registrations = _options.Value.Registrations;
6: if (predicate != null)
7: {
8: registrations = registrations.Where(predicate).ToArray();
9: }
10:
11: var totalTime = ValueStopwatch.StartNew();
12: Log.HealthCheckProcessingBegin(_logger);
13:
14: var tasks = new Task<HealthReportEntry>[registrations.Count];
15: var index = 0;
16: using (var scope = _scopeFactory.CreateScope())
17: {
18: foreach (var registration in registrations)
19: {
20: tasks[index++] = Task.Run(() => RunCheckAsync(scope, registration, cancellationToken), cancellationToken);
21: }
22:
23: await Task.WhenAll(tasks).ConfigureAwait(false);
24: }
25:
26: index = 0;
27: var entries = new Dictionary<string, HealthReportEntry>(StringComparer.OrdinalIgnoreCase);
28: foreach (var registration in registrations)
29: {
30: entries[registration.Name] = tasks[index++].Result;
31: }
32:
33: var totalElapsedTime = totalTime.GetElapsedTime();
34: var report = new HealthReport(entries, totalElapsedTime);
35: Log.HealthCheckProcessingEnd(_logger, report.Status, totalElapsedTime);
36: return report;
37: }
1、其内部有比较完善的监控机制,会在内部维护了一个Log功能,全程监控健康检查的耗时,该日志所记录的健康检查不仅仅是一个健康检查集合的耗时,而且也记录了每个Name的耗时。
2、该方法会通过await Task.WhenAll(tasks).ConfigureAwait(false);并发执行健康检查。当然,我需要注意的是,过多的健康检查任务将会导致系统性能的下降,这主要看如何取舍了
CheckHealthAsync内部还会调用一个私有方法RunCheckAsync,这是真正执行健康检查的方法。RunCheckAsync方法执行完成后,会创建HealthReportEntry对象返回到CheckHealthAsync中,并组装到HealthReport对象中,到此该抽象方法执行完毕。
以下是RunCheckAsync方法的源码:
1: private async Task<HealthReportEntry> RunCheckAsync(IServiceScope scope, HealthCheckRegistration registration, CancellationToken cancellationToken)
2: {
3: cancellationToken.ThrowIfCancellationRequested();
4:
5: var healthCheck = registration.Factory(scope.ServiceProvider);
6:
7: using (_logger.BeginScope(new HealthCheckLogScope(registration.Name)))
8: {
9: var stopwatch = ValueStopwatch.StartNew();
10: var context = new HealthCheckContext { Registration = registration };
11:
12: Log.HealthCheckBegin(_logger, registration);
13:
14: HealthReportEntry entry;
15: CancellationTokenSource timeoutCancellationTokenSource = null;
16: try
17: {
18: HealthCheckResult result;
19:
20: var checkCancellationToken = cancellationToken;
21: if (registration.Timeout > TimeSpan.Zero)
22: {
23: timeoutCancellationTokenSource = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
24: timeoutCancellationTokenSource.CancelAfter(registration.Timeout);
25: checkCancellationToken = timeoutCancellationTokenSource.Token;
26: }
27:
28: result = await healthCheck.CheckHealthAsync(context, checkCancellationToken).ConfigureAwait(false);
29:
30: var duration = stopwatch.GetElapsedTime();
31:
32: entry = new HealthReportEntry(
33: status: result.Status,
34: description: result.Description,
35: duration: duration,
36: exception: result.Exception,
37: data: result.Data,
38: tags: registration.Tags);
39:
40: Log.HealthCheckEnd(_logger, registration, entry, duration);
41: Log.HealthCheckData(_logger, registration, entry);
42: }
43: catch (OperationCanceledException ex) when (!cancellationToken.IsCancellationRequested)
44: {
45: var duration = stopwatch.GetElapsedTime();
46: entry = new HealthReportEntry(
47: status: HealthStatus.Unhealthy,
48: description: "A timeout occured while running check.",
49: duration: duration,
50: exception: ex,
51: data: null);
52:
53: Log.HealthCheckError(_logger, registration, ex, duration);
54: }
55:
56: // Allow cancellation to propagate if it's not a timeout.
57: catch (Exception ex) when (ex as OperationCanceledException == null)
58: {
59: var duration = stopwatch.GetElapsedTime();
60: entry = new HealthReportEntry(
61: status: HealthStatus.Unhealthy,
62: description: ex.Message,
63: duration: duration,
64: exception: ex,
65: data: null);
66:
67: Log.HealthCheckError(_logger, registration, ex, duration);
68: }
69:
70: finally
71: {
72: timeoutCancellationTokenSource?.Dispose();
73: }
74:
75: return entry;
76: }
77: }
来自官方的应用
其他更多内容请参考:https://docs.microsoft.com/zh-cn/aspnet/core/host-and-deploy/health-checks?view=aspnetcore-3.1- 数据库探测,例子可以是执行
select 1 from tableName
根据数据库响应来判断是否健康- Entity Framework Core DbContext 探测,
DbContext
检查确认应用可以与为 EF CoreDbContext
配置的数据库通信。- 单独的就绪情况和运行情况探测,在某些托管方案中,可能初始化是一个比较耗时的操作,应用正常运行,但是可能还不能正常处理请求并响应
- 具有自定义响应编写器的基于指标的探测,比如检查内存占用是否超标,cpu 是否占用过高,连接数是否达到上限
- 按端口筛选,指定端口,一般用于容器环境,根据容器启动时配置的端口号进行响应
- 分发运行状况检查库,将检查接口实现独立一个类,并通过依赖注入获取参数,检查时根据参数编写逻辑
- 运行状况检查发布服务器,如果向 DI 添加 IHealthCheckPublisher,则运行状态检查系统将定期执行状态检查,并使用结果调用 PublishAsync。适用于需要推送的健康系统,而不是健康系统
- 使用 MapWhen 限制运行状况检查,使用 MapWhen 对运行状况检查终结点的请求管道进行条件分支