K8s工作负载-Job

基于1.25

什么是Job

Job会创建一个或多个Pod，并持续重试Pod的执行，直至指定数量的Pod成功终止

随着Pod成功终止，Job会记录成功的Pod个数，Pod到达指定数量，Job终止
删除Job，会删除Job的所有Pod

JobSpec

Ref:https://github.com/kubernetes/api/blob/f7b7ea4f0fcc6cb8c8dd42eb46a94c7e163d1b9d/batch/v1/types.go#L206

// JobSpec describes how the job execution will look like.
type JobSpec struct {

	// Specifies the maximum desired number of pods the job should
	// run at any given time. The actual number of pods running in steady state will
	// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
	// i.e. when the work left to do is less than max parallelism.
	// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
	// +optional
  // 并行度，给指定Job 在任何给定时间的最大所需的Pod数量
	Parallelism *int32 `json:"parallelism,omitempty" protobuf:"varint,1,opt,name=parallelism"`

	// Specifies the desired number of successfully finished pods the
	// job should be run with.  Setting to nil means that the success of any
	// pod signals the success of all pods, and allows parallelism to have any positive
	// value.  Setting to 1 means that parallelism is limited to 1 and the success of that
	// pod signals the success of the job.
	// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
	// +optional
  // 完成数量，指定作业允许的成功完成的Pod数量
	Completions *int32 `json:"completions,omitempty" protobuf:"varint,2,opt,name=completions"`

	// Specifies the duration in seconds relative to the startTime that the job
	// may be continuously active before the system tries to terminate it; value
	// must be positive integer. If a Job is suspended (at creation or through an
	// update), this timer will effectively be stopped and reset when the Job is
	// resumed again.
	// +optional
  // Job允许的最长耗时 挂起暂停计时 必须是正数
	ActiveDeadlineSeconds *int64 `json:"activeDeadlineSeconds,omitempty" protobuf:"varint,3,opt,name=activeDeadlineSeconds"`

	// Specifies the policy of handling failed pods. In particular, it allows to
	// specify the set of actions and conditions which need to be
	// satisfied to take the associated action.
	// If empty, the default behaviour applies - the counter of failed pods,
	// represented by the jobs's .status.failed field, is incremented and it is
	// checked against the backoffLimit. This field cannot be used in combination
	// with restartPolicy=OnFailure.
	//
	// This field is alpha-level. To use this field, you must enable the
	// `JobPodFailurePolicy` feature gate (disabled by default).
	// +optional
  // 指定处理Pod的策略
  // 如果为空，默认行为，由Job的.status.failed字段表示失败Pod计数器增加
	PodFailurePolicy *PodFailurePolicy `json:"podFailurePolicy,omitempty" protobuf:"bytes,11,opt,name=podFailurePolicy"`

	// Specifies the number of retries before marking this job failed.
	// Defaults to 6
	// +optional
  // 重试次数
	BackoffLimit *int32 `json:"backoffLimit,omitempty" protobuf:"varint,7,opt,name=backoffLimit"`

	// TODO enabled it when https://github.com/kubernetes/kubernetes/issues/28486 has been fixed
	// Optional number of failed pods to retain.
	// +optional
	// FailedPodsLimit *int32 `json:"failedPodsLimit,omitempty" protobuf:"varint,9,opt,name=failedPodsLimit"`

	// A label query over pods that should match the pod count.
	// Normally, the system sets this field for you.
	// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
	// +optional
  // 标签选择器
	Selector *metav1.LabelSelector `json:"selector,omitempty" protobuf:"bytes,4,opt,name=selector"`

	// manualSelector controls generation of pod labels and pod selectors.
	// Leave `manualSelector` unset unless you are certain what you are doing.
	// When false or unset, the system pick labels unique to this job
	// and appends those labels to the pod template.  When true,
	// the user is responsible for picking unique labels and specifying
	// the selector.  Failure to pick a unique label may cause this
	// and other jobs to not function correctly.  However, You may see
	// `manualSelector=true` in jobs that were created with the old `extensions/v1beta1`
	// API.
	// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector
	// +optional
  // 配置子开启自定义的Selector功能
	ManualSelector *bool `json:"manualSelector,omitempty" protobuf:"varint,5,opt,name=manualSelector"`

	// Describes the pod that will be created when executing a job.
	// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
  // 对应的Pod参数
	Template corev1.PodTemplateSpec `json:"template" protobuf:"bytes,6,opt,name=template"`

	// ttlSecondsAfterFinished limits the lifetime of a Job that has finished
	// execution (either Complete or Failed). If this field is set,
	// ttlSecondsAfterFinished after the Job finishes, it is eligible to be
	// automatically deleted. When the Job is being deleted, its lifecycle
	// guarantees (e.g. finalizers) will be honored. If this field is unset,
	// the Job won't be automatically deleted. If this field is set to zero,
	// the Job becomes eligible to be deleted immediately after it finishes.
	// +optional
  // 限制已完成执行（完成或失败）的作业的生命周期
  // 在这个时间结束之后，自动被删除Pod
  // 默认不设置，不会删除Pod
	TTLSecondsAfterFinished *int32 `json:"ttlSecondsAfterFinished,omitempty" protobuf:"varint,8,opt,name=ttlSecondsAfterFinished"`

	// CompletionMode specifies how Pod completions are tracked. It can be
	// `NonIndexed` (default) or `Indexed`.
	//
	// `NonIndexed` means that the Job is considered complete when there have
	// been .spec.completions successfully completed Pods. Each Pod completion is
	// homologous to each other.
	//
	// `Indexed` means that the Pods of a
	// Job get an associated completion index from 0 to (.spec.completions - 1),
	// available in the annotation batch.kubernetes.io/job-completion-index.
	// The Job is considered complete when there is one successfully completed Pod
	// for each index.
	// When value is `Indexed`, .spec.completions must be specified and
	// `.spec.parallelism` must be less than or equal to 10^5.
	// In addition, The Pod name takes the form
	// `$(job-name)-$(index)-$(random-string)`,
	// the Pod hostname takes the form `$(job-name)-$(index)`.
	//
	// More completion modes can be added in the future.
	// If the Job controller observes a mode that it doesn't recognize, which
	// is possible during upgrades due to version skew, the controller
	// skips updates for the Job.
	// +optional
  // 如何跟踪Pod，可以设置为NonIndexed 或 Indexed
  // 默认NonIndexe：当Pod数量到达.spec.completions Job被认为成功
  // Indexed：Job的所有从0～.spec.completions-1的索引，每一个Podc成功，Job才成功
	CompletionMode *CompletionMode `json:"completionMode,omitempty" protobuf:"bytes,9,opt,name=completionMode,casttype=CompletionMode"`

	// Suspend specifies whether the Job controller should create Pods or not. If
	// a Job is created with suspend set to true, no Pods are created by the Job
	// controller. If a Job is suspended after creation (i.e. the flag goes from
	// false to true), the Job controller will delete all active Pods associated
	// with this Job. Users must design their workload to gracefully handle this.
	// Suspending a Job will reset the StartTime field of the Job, effectively
	// resetting the ActiveDeadlineSeconds timer too. Defaults to false.
	//
	// +optional
  // 配置挂起一个Job，挂起操作会删除所有允许中的Pod，并且重新设置JOb的StartTime，暂停ActiveDeadlineSeconds 计时器
	Suspend *bool `json:"suspend,omitempty" protobuf:"varint,10,opt,name=suspend"`
}

Job的并发

可以分成三个角度：无并发、指定完成数量和工作对列

无并发

同一时间只启动一个Pod，这个Podfailed下才会启动另外一个Pod
有一个Pod启动成功，整个Job退出
不需要啊设置i.spec.completions和.spec.parallelism,使用了默认值1

完成指定数量

.spec.completions设置为正数
成功的Pod数量达到.spec.completions,整个Job结束
可以指定spec.completionMode=Indexed,这个时候PodName有编号，从0开始
- Pod自动添加batch.kubernetes.io/jon-completion-index注解和JOB_COMPLETION_INDEX=job-completions-index环境变量
设置了.spec.completion后，可以选择.spec.parallelism控制并发度
- completion设置为10，spec.parallelism设置3：Job在10个Pod成功之前，尽量保持并发为3

工作对列

不指定.spec.completions,.spec.parallelism设置为一个非负数
通过MQ方式管理工作对列，每个Pod独立工作，判断整个任务是否完成；一个Pod成功退出，Job结束，不创建新的Pod

索引Job

通过设置Indexed Pod

Ref:https://v1-27.docs.kubernetes.io/zh-cn/docs/tasks/job/indexed-parallel-processing-static/

自定义Pod失效策略

Pod FailurePolicy 是1.25新增的alpha特性，描述了失败的Pod如何Backofflimt

背景：运行多节点的多Pod的Job需要设置Pod重启，实现基础设施故障问题；传统的K8s使用Backofflimte>0策略，会直接导致所有的Pod重启，造成资源浪费
允许一些基础设施引发的Pod问题，在不增加Backofflimt计数期的情况下，重试

好处：

避免不必要的Pod重启
避免Pod的干扰（如抢占、API发起的驱逐或者基于污点的驱逐）造成Job失败
Ref:https://github.com/kubernetes/api/blob/f7b7ea4f0fcc6cb8c8dd42eb46a94c7e163d1b9d/batch/v1/types.go#L169

// PodFailurePolicyRule describes how a pod failure is handled when the requirements are met.
// One of OnExitCodes and onPodConditions, but not both, can be used in each rule.
type PodFailurePolicyRule struct {
	// Specifies the action taken on a pod failure when the requirements are satisfied.
	// Possible values are:
	// - FailJob: indicates that the pod's job is marked as Failed and all
	//   running pods are terminated.
	// - Ignore: indicates that the counter towards the .backoffLimit is not
	//   incremented and a replacement pod is created.
	// - Count: indicates that the pod is handled in the default way - the
	//   counter towards the .backoffLimit is incremented.
	// Additional values are considered to be added in the future. Clients should
	// react to an unknown action by skipping the rule.
  // 指定满足要求时对失败的Pod采取的操作
  // 可以设置为 三种
  // FailedJob：表示Pod被标记为失败，所有的Pod终止
  // Ingnore: 表示BackLimit 计数器没有递增，创建一个新的Pod替代
  // Count: 默认的方式处理Pod BackLimit计数器增加
	Action PodFailurePolicyAction `json:"action" protobuf:"bytes,1,req,name=action"`

	// Represents the requirement on the container exit codes.
	// +optional
  // 容器退出码要求
	OnExitCodes *PodFailurePolicyOnExitCodesRequirement `json:"onExitCodes" protobuf:"bytes,2,opt,name=onExitCodes"`

	// Represents the requirement on the pod conditions. The requirement is represented
	// as a list of pod condition patterns. The requirement is satisfied if at
	// least one pattern matches an actual pod condition. At most 20 elements are allowed.
	// +listType=atomic
	// +optional
  // 表述容器匹配的条件，最多允许20个元素
	OnPodConditions []PodFailurePolicyOnPodConditionsPattern `json:"onPodConditions" protobuf:"bytes,3,opt,name=onPodConditions"`
}

// PodFailurePolicy describes how failed pods influence the backoffLimit.
type PodFailurePolicy struct {
	// A list of pod failure policy rules. The rules are evaluated in order.
	// Once a rule matches a Pod failure, the remaining of the rules are ignored.
	// When no rule matches the Pod failure, the default handling applies - the
	// counter of pod failures is incremented and it is checked against
	// the backoffLimit. At most 20 elements are allowed.
	// +listType=atomic
	Rules []PodFailurePolicyRule `json:"rules" protobuf:"bytes,1,opt,name=rules"`
}

已经完成Job的TTL机制

Job默认删除策略：OrphanDependents，K8s会保留这些Pod

设置Job的TTLSecondsAffterFinished字段，TTL会自动清理已经结束的资源

删除Job对象，会级联删除依赖的对象
如果设置为0，Job完成之后立即自动删除
不设置不会自动删除