• 一次kuberneets evicted的历险


    一、概述

      kubernetes 的eviction检测diskpresure,检测的是kubelet的root-dir。kubelet的默认root-dir是/var/lib/kubelet,可以使用参数--root-dir进行修改,源码:

        kubernetes/cmd/kubelet/app/options/options.go

       

    const defaultRootDir = "/var/lib/kubelet"
    
    fs.StringVar(&f.RootDirectory, "root-dir", f.RootDirectory, "Directory path for managing kubelet files (volume mounts,etc).")

     kubernetes/pkg/kubelet/eviction/helpers.go

      

    // diskUsage converts used bytes into a resource quantity.
    func diskUsage(fsStats *statsapi.FsStats) *resource.Quantity {
        if fsStats == nil || fsStats.UsedBytes == nil {
            return &resource.Quantity{Format: resource.BinarySI}
        }
        usage := int64(*fsStats.UsedBytes)
        return resource.NewQuantity(usage, resource.BinarySI)
    }
    
    // rankDiskPressureFunc returns a rankFunc that measures the specified fs stats.
    func rankDiskPressureFunc(fsStatsToMeasure []fsStatsType, diskResource v1.ResourceName) rankFunc {
        return func(pods []*v1.Pod, stats statsFunc) {
            orderedBy(exceedDiskRequests(stats, fsStatsToMeasure, diskResource), priority, disk(stats, fsStatsToMeasure, diskResource)).Sort(pods)
        }
    }
    
    if nodeFs := summary.Node.Fs; nodeFs != nil {
            if nodeFs.AvailableBytes != nil && nodeFs.CapacityBytes != nil {
                result[evictionapi.SignalNodeFsAvailable] = signalObservation{
                    available: resource.NewQuantity(int64(*nodeFs.AvailableBytes), resource.BinarySI),
                    capacity:  resource.NewQuantity(int64(*nodeFs.CapacityBytes), resource.BinarySI),
                    time:      nodeFs.Time,
                }
            }
    type NodeStats struct {
        // Reference to the measured Node.
        NodeName string `json:"nodeName"`
        // Stats of system daemons tracked as raw containers.
        // The system containers are named according to the SystemContainer* constants.
        // +optional
        // +patchMergeKey=name
        // +patchStrategy=merge
        SystemContainers []ContainerStats `json:"systemContainers,omitempty" patchStrategy:"merge" patchMergeKey:"name"`
        // The time at which data collection for the node-scoped (i.e. aggregate) stats was (re)started.
        StartTime metav1.Time `json:"startTime"`
        // Stats pertaining to CPU resources.
        // +optional
        CPU *CPUStats `json:"cpu,omitempty"`
        // Stats pertaining to memory (RAM) resources.
        // +optional
        Memory *MemoryStats `json:"memory,omitempty"`
        // Stats pertaining to network resources.
        // +optional
        Network *NetworkStats `json:"network,omitempty"`
        // Stats pertaining to total usage of filesystem resources on the rootfs used by node k8s components.
        // NodeFs.Used is the total bytes used on the filesystem.
        // +optional
        Fs *FsStats `json:"fs,omitempty"`
        // Stats about the underlying container runtime.
        // +optional
        Runtime *RuntimeStats `json:"runtime,omitempty"`
        // Stats about the rlimit of system.
        // +optional
        Rlimit *RlimitStats `json:"rlimit,omitempty"`
    }

     

    二、事故

       事情发生在几个月前,有人修改了fluentd的pattern,fluentd使用ds部署的,里面有挂载了一个hostpath,/var/log.里面的日志会输出到syslog里面。导致pattern不匹配的日志全部打入到/var/log/syslog里面,一个小时写入了7个多G。后面磁盘使用率直接达到了90%,而我们在kubelet里面设置的驱逐策略如下:

      

    evictionHard:
      imagefs.available: 15%
      memory.available: 100Mi
      nodefs.available: 10%
      nodefs.inodesFree: 5%

    当kubelet的root-dir所在的磁盘使用率达到90%就开始evicted,这个fluentd是没有报错的,只是pattern不匹配然后就把日志输出到了sysylog,所以使用的时候一定要设置好日志的输出路径和日志的输出级别。

     

    三、善后

    通过分析源码得出结论,紧急恢复服务。(系统盘的告警阈值没有减掉kubelet里面设置的驱逐阈值)。重新规划监控阈值,线上的node节点设置特性,不同的业务部署在不同node节点上。

  • 相关阅读:
    三行Python代码查询IP
    剑指offer面试题29:数组中出现次数超过一半的数字
    【简】题解 AWSL090429 【数塔问题】
    Re.常系数齐次递推
    Re.多项式除法/取模
    【翻译】A simple stone game
    Re.多项式求逆
    Re.FFT
    题解 P4783 【【模板】矩阵求逆】
    关于win10企业版在极域电子教室软件 v4.0 2015 豪华版的全屏控制下如何取得自由
  • 原文地址:https://www.cnblogs.com/cuishuai/p/10980224.html
Copyright © 2020-2023  润新知