K8s工作负载-Job

基于1.25

什么是Job

Job会创建一个或多个Pod,并持续重试Pod的执行,直至指定数量的Pod成功终止

  • 随着Pod成功终止,Job会记录成功的Pod个数,Pod到达指定数量,Job终止
  • 删除Job,会删除Job的所有Pod

JobSpec

// JobSpec describes how the job execution will look like.
type JobSpec struct {

// Specifies the maximum desired number of pods the job should
// run at any given time. The actual number of pods running in steady state will
// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
// i.e. when the work left to do is less than max parallelism.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
// +optional
// 并行度,给指定Job 在任何给定时间的最大所需的Pod数量
Parallelism *int32 `json:"parallelism,omitempty" protobuf:"varint,1,opt,name=parallelism"`

// Specifies the desired number of successfully finished pods the
// job should be run with. Setting to nil means that the success of any
// pod signals the success of all pods, and allows parallelism to have any positive
// value. Setting to 1 means that parallelism is limited to 1 and the success of that
// pod signals the success of the job.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
// +optional
// 完成数量,指定作业允许的成功完成的Pod数量
Completions *int32 `json:"completions,omitempty" protobuf:"varint,2,opt,name=completions"`

// Specifies the duration in seconds relative to the startTime that the job
// may be continuously active before the system tries to terminate it; value
// must be positive integer. If a Job is suspended (at creation or through an
// update), this timer will effectively be stopped and reset when the Job is
// resumed again.
// +optional
// Job允许的最长耗时 挂起暂停计时 必须是正数
ActiveDeadlineSeconds *int64 `json:"activeDeadlineSeconds,omitempty" protobuf:"varint,3,opt,name=activeDeadlineSeconds"`

// Specifies the policy of handling failed pods. In particular, it allows to
// specify the set of actions and conditions which need to be
// satisfied to take the associated action.
// If empty, the default behaviour applies - the counter of failed pods,
// represented by the jobs's .status.failed field, is incremented and it is
// checked against the backoffLimit. This field cannot be used in combination
// with restartPolicy=OnFailure.
//
// This field is alpha-level. To use this field, you must enable the
// `JobPodFailurePolicy` feature gate (disabled by default).
// +optional
// 指定处理Pod的策略
// 如果为空,默认行为,由Job的.status.failed字段表示失败Pod计数器增加
PodFailurePolicy *PodFailurePolicy `json:"podFailurePolicy,omitempty" protobuf:"bytes,11,opt,name=podFailurePolicy"`

// Specifies the number of retries before marking this job failed.
// Defaults to 6
// +optional
// 重试次数
BackoffLimit *int32 `json:"backoffLimit,omitempty" protobuf:"varint,7,opt,name=backoffLimit"`

// TODO enabled it when https://github.com/kubernetes/kubernetes/issues/28486 has been fixed
// Optional number of failed pods to retain.
// +optional
// FailedPodsLimit *int32 `json:"failedPodsLimit,omitempty" protobuf:"varint,9,opt,name=failedPodsLimit"`

// A label query over pods that should match the pod count.
// Normally, the system sets this field for you.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
// +optional
// 标签选择器
Selector *metav1.LabelSelector `json:"selector,omitempty" protobuf:"bytes,4,opt,name=selector"`

// manualSelector controls generation of pod labels and pod selectors.
// Leave `manualSelector` unset unless you are certain what you are doing.
// When false or unset, the system pick labels unique to this job
// and appends those labels to the pod template. When true,
// the user is responsible for picking unique labels and specifying
// the selector. Failure to pick a unique label may cause this
// and other jobs to not function correctly. However, You may see
// `manualSelector=true` in jobs that were created with the old `extensions/v1beta1`
// API.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector
// +optional
// 配置子开启自定义的Selector功能
ManualSelector *bool `json:"manualSelector,omitempty" protobuf:"varint,5,opt,name=manualSelector"`

// Describes the pod that will be created when executing a job.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
// 对应的Pod参数
Template corev1.PodTemplateSpec `json:"template" protobuf:"bytes,6,opt,name=template"`

// ttlSecondsAfterFinished limits the lifetime of a Job that has finished
// execution (either Complete or Failed). If this field is set,
// ttlSecondsAfterFinished after the Job finishes, it is eligible to be
// automatically deleted. When the Job is being deleted, its lifecycle
// guarantees (e.g. finalizers) will be honored. If this field is unset,
// the Job won't be automatically deleted. If this field is set to zero,
// the Job becomes eligible to be deleted immediately after it finishes.
// +optional
// 限制已完成执行(完成或失败)的作业的生命周期
// 在这个时间结束之后,自动被删除Pod
// 默认不设置,不会删除Pod
TTLSecondsAfterFinished *int32 `json:"ttlSecondsAfterFinished,omitempty" protobuf:"varint,8,opt,name=ttlSecondsAfterFinished"`

// CompletionMode specifies how Pod completions are tracked. It can be
// `NonIndexed` (default) or `Indexed`.
//
// `NonIndexed` means that the Job is considered complete when there have
// been .spec.completions successfully completed Pods. Each Pod completion is
// homologous to each other.
//
// `Indexed` means that the Pods of a
// Job get an associated completion index from 0 to (.spec.completions - 1),
// available in the annotation batch.kubernetes.io/job-completion-index.
// The Job is considered complete when there is one successfully completed Pod
// for each index.
// When value is `Indexed`, .spec.completions must be specified and
// `.spec.parallelism` must be less than or equal to 10^5.
// In addition, The Pod name takes the form
// `$(job-name)-$(index)-$(random-string)`,
// the Pod hostname takes the form `$(job-name)-$(index)`.
//
// More completion modes can be added in the future.
// If the Job controller observes a mode that it doesn't recognize, which
// is possible during upgrades due to version skew, the controller
// skips updates for the Job.
// +optional
// 如何跟踪Pod,可以设置为NonIndexed 或 Indexed
// 默认NonIndexe:当Pod数量到达.spec.completions Job被认为成功
// Indexed:Job的所有从0~.spec.completions-1的索引,每一个Podc成功,Job才成功
CompletionMode *CompletionMode `json:"completionMode,omitempty" protobuf:"bytes,9,opt,name=completionMode,casttype=CompletionMode"`

// Suspend specifies whether the Job controller should create Pods or not. If
// a Job is created with suspend set to true, no Pods are created by the Job
// controller. If a Job is suspended after creation (i.e. the flag goes from
// false to true), the Job controller will delete all active Pods associated
// with this Job. Users must design their workload to gracefully handle this.
// Suspending a Job will reset the StartTime field of the Job, effectively
// resetting the ActiveDeadlineSeconds timer too. Defaults to false.
//
// +optional
// 配置挂起一个Job,挂起操作会删除所有允许中的Pod,并且重新设置JOb的StartTime,暂停ActiveDeadlineSeconds 计时器
Suspend *bool `json:"suspend,omitempty" protobuf:"varint,10,opt,name=suspend"`
}

Job的并发

可以分成三个角度:无并发、指定完成数量和工作对列

无并发

  • 同一时间只启动一个Pod,这个Podfailed下才会启动另外一个Pod
  • 有一个Pod启动成功,整个Job退出
  • 不需要啊设置i.spec.completions.spec.parallelism,使用了默认值1

完成指定数量

  • .spec.completions设置为正数
  • 成功的Pod数量达到.spec.completions,整个Job结束
  • 可以指定spec.completionMode=Indexed,这个时候PodName有编号,从0开始
    • Pod自动添加batch.kubernetes.io/jon-completion-index注解和JOB_COMPLETION_INDEX=job-completions-index环境变量
  • 设置了.spec.completion后,可以选择.spec.parallelism控制并发度
    • completion设置为10,spec.parallelism设置3:Job在10个Pod成功之前, 尽量保持并发为3

工作对列

  • 不指定.spec.completions,.spec.parallelism设置为一个非负数
  • 通过MQ方式管理工作对列,每个Pod独立工作,判断整个任务是否完成;一个Pod成功退出,Job结束,不创建新的Pod

索引Job

通过设置Indexed Pod

自定义Pod失效策略

Pod FailurePolicy 是1.25新增的alpha特性,描述了失败的Pod如何Backofflimt

  • 背景:运行多节点的多Pod的Job需要设置Pod重启,实现基础设施故障问题;传统的K8s使用Backofflimte>0策略,会直接导致所有的Pod重启,造成资源浪费
  • 允许一些基础设施引发的Pod问题,在不增加Backofflimt计数期的情况下,重试

好处:

// PodFailurePolicyRule describes how a pod failure is handled when the requirements are met.
// One of OnExitCodes and onPodConditions, but not both, can be used in each rule.
type PodFailurePolicyRule struct {
// Specifies the action taken on a pod failure when the requirements are satisfied.
// Possible values are:
// - FailJob: indicates that the pod's job is marked as Failed and all
// running pods are terminated.
// - Ignore: indicates that the counter towards the .backoffLimit is not
// incremented and a replacement pod is created.
// - Count: indicates that the pod is handled in the default way - the
// counter towards the .backoffLimit is incremented.
// Additional values are considered to be added in the future. Clients should
// react to an unknown action by skipping the rule.
// 指定满足要求时对失败的Pod采取的操作
// 可以设置为 三种
// FailedJob:表示Pod被标记为失败,所有的Pod终止
// Ingnore: 表示BackLimit 计数器没有递增,创建一个新的Pod替代
// Count: 默认的方式处理Pod BackLimit计数器增加
Action PodFailurePolicyAction `json:"action" protobuf:"bytes,1,req,name=action"`

// Represents the requirement on the container exit codes.
// +optional
// 容器退出码要求
OnExitCodes *PodFailurePolicyOnExitCodesRequirement `json:"onExitCodes" protobuf:"bytes,2,opt,name=onExitCodes"`

// Represents the requirement on the pod conditions. The requirement is represented
// as a list of pod condition patterns. The requirement is satisfied if at
// least one pattern matches an actual pod condition. At most 20 elements are allowed.
// +listType=atomic
// +optional
// 表述容器匹配的条件,最多允许20个元素
OnPodConditions []PodFailurePolicyOnPodConditionsPattern `json:"onPodConditions" protobuf:"bytes,3,opt,name=onPodConditions"`
}

// PodFailurePolicy describes how failed pods influence the backoffLimit.
type PodFailurePolicy struct {
// A list of pod failure policy rules. The rules are evaluated in order.
// Once a rule matches a Pod failure, the remaining of the rules are ignored.
// When no rule matches the Pod failure, the default handling applies - the
// counter of pod failures is incremented and it is checked against
// the backoffLimit. At most 20 elements are allowed.
// +listType=atomic
Rules []PodFailurePolicyRule `json:"rules" protobuf:"bytes,1,opt,name=rules"`
}

已经完成Job的TTL机制

Job默认删除策略:OrphanDependents,K8s会保留这些Pod

设置Job的TTLSecondsAffterFinished字段,TTL会自动清理已经结束的资源

  • 删除Job对象,会级联删除依赖的对象
  • 如果设置为0,Job完成之后立即自动删除
  • 不设置不会自动删除