K8s核心资源对象-Pod(健康检查)

基于1.25

什么是健康检查

K8s支持三种健康检查:

  • livenessProbe:存活探针
    • 表示容器是否在运行,是否需要重启
    • 如果liveness探测为Failure,kubelete会终止容器,按照策略重启
  • readinessProbe:就绪探针
    • 表示容器是准备好接受请求
    • 如果readiness探测失败,从Endpoint控制器种移除改PodIP
  • startupProbe:启动探针
    • 设置了启动探针,其他的被暂时禁用,直至启动探针成功
    • 启动探针失败,按照策略重启
    • 对于启动时间长的场景

ProbeManager

ProbeManager负责管理Pod探针,是一个Manager的接口声明。

  • 当执行AddPod func,会为Pod每一个容器创建一个探测的worker
  • worker会对分配的容器,定期周期性探测,并且缓存探测结果
  • 执行UpdatePodStatus func,Manager会使用缓存结果把PodStatus设置类似Ready状态

相关接口说明:


// Manager manages pod probing. It creates a probe "worker" for every container that specifies a
// probe (AddPod). The worker periodically probes its assigned container and caches the results. The
// manager use the cached probe results to set the appropriate Ready state in the PodStatus when
// requested (UpdatePodStatus). Updating probe parameters is not currently supported.
type Manager interface {
// AddPod creates new probe workers for every container probe. This should be called for every
// pod created.
AddPod(pod *v1.Pod)

// StopLivenessAndStartup handles stopping liveness and startup probes during termination.
StopLivenessAndStartup(pod *v1.Pod)

// RemovePod handles cleaning up the removed pod state, including terminating probe workers and
// deleting cached results.
RemovePod(pod *v1.Pod)

// CleanupPods handles cleaning up pods which should no longer be running.
// It takes a map of "desired pods" which should not be cleaned up.
CleanupPods(desiredPods map[types.UID]sets.Empty)

// UpdatePodStatus modifies the given PodStatus with the appropriate Ready state for each
// container based on container running status, cached probe results and worker states.
UpdatePodStatus(types.UID, *v1.PodStatus)
}

探针的数据结构

// Probe describes a health check to be performed against a container to determine whether it is
// alive or ready to receive traffic.
type Probe struct {
// The action taken to determine the health of a container
// 探测方式,支持四种,命令行、HTTP、TCP、gRPC
ProbeHandler
// Length of time before health checking is activated. In seconds.
// +optional
// 容器启动之后延迟多久进行探测
InitialDelaySeconds int32
// Length of time before health checking times out. In seconds.
// +optional
// 每次探测的超时时间
TimeoutSeconds int32
// How often (in seconds) to perform the probe.
// +optional
// 多长时间一次探测
PeriodSeconds int32
// Minimum consecutive successes for the probe to be considered successful after having failed.
// Must be 1 for liveness and startup.
// +optional
// 最少连续探测成功的次数,满足该次数认为是success
SuccessThreshold int32
// Minimum consecutive failures for the probe to be considered failed after having succeeded.
// +optional
// 最少连续探测失败的次数,才认为是Fail
FailureThreshold int32
// Optional duration in seconds the pod needs to terminate gracefully upon probe failure.
// The grace period is the duration in seconds after the processes running in the pod are sent
// a termination signal and the time when the processes are forcibly halted with a kill signal.
// Set this value longer than the expected cleanup time for your process.
// If this value is nil, the pod's terminationGracePeriodSeconds will be used. Otherwise, this
// value overrides the value provided by the pod spec.
// Value must be non-negative integer. The value zero indicates stop immediately via
// the kill signal (no opportunity to shut down).
// This is a beta field and requires enabling ProbeTerminationGracePeriod feature gate.
// +optional
// Pod探测失败需要正常终止的可选持续时间,优雅关闭
TerminationGracePeriodSeconds *int64
}
...
/ ProbeHandler defines a specific action that should be taken in a probe.
// One and only one of the fields must be specified.
type ProbeHandler struct {
// Exec specifies the action to take.
// +optional
Exec *ExecAction
// HTTPGet specifies the http request to perform.
// +optional
HTTPGet *HTTPGetAction
// TCPSocket specifies an action involving a TCP port.
// +optional
TCPSocket *TCPSocketAction

// GRPC specifies an action involving a GRPC port.
// This is a beta field and requires enabling GRPCContainerProbe feature gate.
// +featureGate=GRPCContainerProbe
// +optional
GRPC *GRPCAction
}

四种探测模式

命令行

在kubelete在容器内使用cmd,命令行进行探测

  • 执行成功 返回值0=存活

  • 非0,容器终止重启

  • Command参数是在容器内执行的命令行

    • 容器的工作目录是容器文件系统的根目录/
    • 命令简单执行,不在shell里面,需要显式调用shell

HTTP

基于使用HTTP GET请求进行探测:

  • 返回>=200 &<=400 =success
  • 其他都是failed

相关字段:

  • Ref:https://github.com/kubernetes/kubernetes/blob/88e994f6bf8fc88114c5b733e09afea339bea66d/pkg/apis/core/types.go#L2023

    // HTTPGetAction describes an action based on HTTP Get requests.
    type HTTPGetAction struct {
    // Optional: Path to access on the HTTP server.
    // +optional
    // 访问HTTP的Path
    Path string
    // Required: Name or number of the port to access on the container.
    // +optional
    // 访问容器的端口名称或者端口号
    Port intstr.IntOrString
    // Optional: Host name to connect to, defaults to the pod IP. You
    // probably want to set "Host" in httpHeaders instead.
    // +optional
    // 连接的主机名,默认PodIP
    Host string
    // Optional: Scheme to use for connecting to the host, defaults to HTTP.
    // +optional
    // 协议,默认HTTP
    Scheme URIScheme
    // Optional: Custom headers to set in the request. HTTP allows repeated headers.
    // +optional
    // 自定义请求头
    HTTPHeaders []HTTPHeader
    }

TCP

容器尝试执行TCP检查

// TCPSocketAction describes an action based on opening a socket
type TCPSocketAction struct {
// Required: Port to connect to.
// +optional
// 访问容器的端口名称或者端口号
Port intstr.IntOrString
// Optional: Host name to connect to, defaults to the pod IP.
// +optional
// 连接的主机名,默认PodIP
Host string
}

gRPC

gRPC还在beta阶段,必须启用GRPCContainerProbe特性才能用


type GRPCAction struct {
// Port number of the gRPC service.
// Note: Number must be in the range 1 to 65535.
// 访问容器的端口名称或者端口号
Port int32

// Service is the name of the service to place in the gRPC HealthCheckRequest
// (see https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
//
// If this is not specified, the default behavior is to probe the server's overall health status.
// +optional
// 放置在gPRC HealthCheckRequest的服务名称,必须配置Port
// 如果健康状态端点不在默认服务智商,必须配置Service属性
Service *string
}