K8s工作负载-Job

基于1.25

什么是Job

Job会创建一个或多个Pod,并持续重试Pod的执行,直至指定数量的Pod成功终止

  • 随着Pod成功终止,Job会记录成功的Pod个数,Pod到达指定数量,Job终止
  • 删除Job,会删除Job的所有Pod

JobSpec

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
// JobSpec describes how the job execution will look like.
type JobSpec struct {

// Specifies the maximum desired number of pods the job should
// run at any given time. The actual number of pods running in steady state will
// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
// i.e. when the work left to do is less than max parallelism.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
// +optional
// 并行度,给指定Job 在任何给定时间的最大所需的Pod数量
Parallelism *int32 `json:"parallelism,omitempty" protobuf:"varint,1,opt,name=parallelism"`

// Specifies the desired number of successfully finished pods the
// job should be run with. Setting to nil means that the success of any
// pod signals the success of all pods, and allows parallelism to have any positive
// value. Setting to 1 means that parallelism is limited to 1 and the success of that
// pod signals the success of the job.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
// +optional
// 完成数量,指定作业允许的成功完成的Pod数量
Completions *int32 `json:"completions,omitempty" protobuf:"varint,2,opt,name=completions"`

// Specifies the duration in seconds relative to the startTime that the job
// may be continuously active before the system tries to terminate it; value
// must be positive integer. If a Job is suspended (at creation or through an
// update), this timer will effectively be stopped and reset when the Job is
// resumed again.
// +optional
// Job允许的最长耗时 挂起暂停计时 必须是正数
ActiveDeadlineSeconds *int64 `json:"activeDeadlineSeconds,omitempty" protobuf:"varint,3,opt,name=activeDeadlineSeconds"`

// Specifies the policy of handling failed pods. In particular, it allows to
// specify the set of actions and conditions which need to be
// satisfied to take the associated action.
// If empty, the default behaviour applies - the counter of failed pods,
// represented by the jobs's .status.failed field, is incremented and it is
// checked against the backoffLimit. This field cannot be used in combination
// with restartPolicy=OnFailure.
//
// This field is alpha-level. To use this field, you must enable the
// `JobPodFailurePolicy` feature gate (disabled by default).
// +optional
// 指定处理Pod的策略
// 如果为空,默认行为,由Job的.status.failed字段表示失败Pod计数器增加
PodFailurePolicy *PodFailurePolicy `json:"podFailurePolicy,omitempty" protobuf:"bytes,11,opt,name=podFailurePolicy"`

// Specifies the number of retries before marking this job failed.
// Defaults to 6
// +optional
// 重试次数
BackoffLimit *int32 `json:"backoffLimit,omitempty" protobuf:"varint,7,opt,name=backoffLimit"`

// TODO enabled it when https://github.com/kubernetes/kubernetes/issues/28486 has been fixed
// Optional number of failed pods to retain.
// +optional
// FailedPodsLimit *int32 `json:"failedPodsLimit,omitempty" protobuf:"varint,9,opt,name=failedPodsLimit"`

// A label query over pods that should match the pod count.
// Normally, the system sets this field for you.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
// +optional
// 标签选择器
Selector *metav1.LabelSelector `json:"selector,omitempty" protobuf:"bytes,4,opt,name=selector"`

// manualSelector controls generation of pod labels and pod selectors.
// Leave `manualSelector` unset unless you are certain what you are doing.
// When false or unset, the system pick labels unique to this job
// and appends those labels to the pod template. When true,
// the user is responsible for picking unique labels and specifying
// the selector. Failure to pick a unique label may cause this
// and other jobs to not function correctly. However, You may see
// `manualSelector=true` in jobs that were created with the old `extensions/v1beta1`
// API.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#specifying-your-own-pod-selector
// +optional
// 配置子开启自定义的Selector功能
ManualSelector *bool `json:"manualSelector,omitempty" protobuf:"varint,5,opt,name=manualSelector"`

// Describes the pod that will be created when executing a job.
// More info: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
// 对应的Pod参数
Template corev1.PodTemplateSpec `json:"template" protobuf:"bytes,6,opt,name=template"`

// ttlSecondsAfterFinished limits the lifetime of a Job that has finished
// execution (either Complete or Failed). If this field is set,
// ttlSecondsAfterFinished after the Job finishes, it is eligible to be
// automatically deleted. When the Job is being deleted, its lifecycle
// guarantees (e.g. finalizers) will be honored. If this field is unset,
// the Job won't be automatically deleted. If this field is set to zero,
// the Job becomes eligible to be deleted immediately after it finishes.
// +optional
// 限制已完成执行(完成或失败)的作业的生命周期
// 在这个时间结束之后,自动被删除Pod
// 默认不设置,不会删除Pod
TTLSecondsAfterFinished *int32 `json:"ttlSecondsAfterFinished,omitempty" protobuf:"varint,8,opt,name=ttlSecondsAfterFinished"`

// CompletionMode specifies how Pod completions are tracked. It can be
// `NonIndexed` (default) or `Indexed`.
//
// `NonIndexed` means that the Job is considered complete when there have
// been .spec.completions successfully completed Pods. Each Pod completion is
// homologous to each other.
//
// `Indexed` means that the Pods of a
// Job get an associated completion index from 0 to (.spec.completions - 1),
// available in the annotation batch.kubernetes.io/job-completion-index.
// The Job is considered complete when there is one successfully completed Pod
// for each index.
// When value is `Indexed`, .spec.completions must be specified and
// `.spec.parallelism` must be less than or equal to 10^5.
// In addition, The Pod name takes the form
// `$(job-name)-$(index)-$(random-string)`,
// the Pod hostname takes the form `$(job-name)-$(index)`.
//
// More completion modes can be added in the future.
// If the Job controller observes a mode that it doesn't recognize, which
// is possible during upgrades due to version skew, the controller
// skips updates for the Job.
// +optional
// 如何跟踪Pod,可以设置为NonIndexed 或 Indexed
// 默认NonIndexe:当Pod数量到达.spec.completions Job被认为成功
// Indexed:Job的所有从0~.spec.completions-1的索引,每一个Podc成功,Job才成功
CompletionMode *CompletionMode `json:"completionMode,omitempty" protobuf:"bytes,9,opt,name=completionMode,casttype=CompletionMode"`

// Suspend specifies whether the Job controller should create Pods or not. If
// a Job is created with suspend set to true, no Pods are created by the Job
// controller. If a Job is suspended after creation (i.e. the flag goes from
// false to true), the Job controller will delete all active Pods associated
// with this Job. Users must design their workload to gracefully handle this.
// Suspending a Job will reset the StartTime field of the Job, effectively
// resetting the ActiveDeadlineSeconds timer too. Defaults to false.
//
// +optional
// 配置挂起一个Job,挂起操作会删除所有允许中的Pod,并且重新设置JOb的StartTime,暂停ActiveDeadlineSeconds 计时器
Suspend *bool `json:"suspend,omitempty" protobuf:"varint,10,opt,name=suspend"`
}

Job的并发

可以分成三个角度:无并发、指定完成数量和工作对列

无并发

  • 同一时间只启动一个Pod,这个Podfailed下才会启动另外一个Pod
  • 有一个Pod启动成功,整个Job退出
  • 不需要啊设置i.spec.completions.spec.parallelism,使用了默认值1

完成指定数量

  • .spec.completions设置为正数
  • 成功的Pod数量达到.spec.completions,整个Job结束
  • 可以指定spec.completionMode=Indexed,这个时候PodName有编号,从0开始
    • Pod自动添加batch.kubernetes.io/jon-completion-index注解和JOB_COMPLETION_INDEX=job-completions-index环境变量
  • 设置了.spec.completion后,可以选择.spec.parallelism控制并发度
    • completion设置为10,spec.parallelism设置3:Job在10个Pod成功之前, 尽量保持并发为3

工作对列

  • 不指定.spec.completions,.spec.parallelism设置为一个非负数
  • 通过MQ方式管理工作对列,每个Pod独立工作,判断整个任务是否完成;一个Pod成功退出,Job结束,不创建新的Pod

索引Job

通过设置Indexed Pod

自定义Pod失效策略

Pod FailurePolicy 是1.25新增的alpha特性,描述了失败的Pod如何Backofflimt

  • 背景:运行多节点的多Pod的Job需要设置Pod重启,实现基础设施故障问题;传统的K8s使用Backofflimte>0策略,会直接导致所有的Pod重启,造成资源浪费
  • 允许一些基础设施引发的Pod问题,在不增加Backofflimt计数期的情况下,重试

好处:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// PodFailurePolicyRule describes how a pod failure is handled when the requirements are met.
// One of OnExitCodes and onPodConditions, but not both, can be used in each rule.
type PodFailurePolicyRule struct {
// Specifies the action taken on a pod failure when the requirements are satisfied.
// Possible values are:
// - FailJob: indicates that the pod's job is marked as Failed and all
// running pods are terminated.
// - Ignore: indicates that the counter towards the .backoffLimit is not
// incremented and a replacement pod is created.
// - Count: indicates that the pod is handled in the default way - the
// counter towards the .backoffLimit is incremented.
// Additional values are considered to be added in the future. Clients should
// react to an unknown action by skipping the rule.
// 指定满足要求时对失败的Pod采取的操作
// 可以设置为 三种
// FailedJob:表示Pod被标记为失败,所有的Pod终止
// Ingnore: 表示BackLimit 计数器没有递增,创建一个新的Pod替代
// Count: 默认的方式处理Pod BackLimit计数器增加
Action PodFailurePolicyAction `json:"action" protobuf:"bytes,1,req,name=action"`

// Represents the requirement on the container exit codes.
// +optional
// 容器退出码要求
OnExitCodes *PodFailurePolicyOnExitCodesRequirement `json:"onExitCodes" protobuf:"bytes,2,opt,name=onExitCodes"`

// Represents the requirement on the pod conditions. The requirement is represented
// as a list of pod condition patterns. The requirement is satisfied if at
// least one pattern matches an actual pod condition. At most 20 elements are allowed.
// +listType=atomic
// +optional
// 表述容器匹配的条件,最多允许20个元素
OnPodConditions []PodFailurePolicyOnPodConditionsPattern `json:"onPodConditions" protobuf:"bytes,3,opt,name=onPodConditions"`
}

// PodFailurePolicy describes how failed pods influence the backoffLimit.
type PodFailurePolicy struct {
// A list of pod failure policy rules. The rules are evaluated in order.
// Once a rule matches a Pod failure, the remaining of the rules are ignored.
// When no rule matches the Pod failure, the default handling applies - the
// counter of pod failures is incremented and it is checked against
// the backoffLimit. At most 20 elements are allowed.
// +listType=atomic
Rules []PodFailurePolicyRule `json:"rules" protobuf:"bytes,1,opt,name=rules"`
}

已经完成Job的TTL机制

Job默认删除策略:OrphanDependents,K8s会保留这些Pod

设置Job的TTLSecondsAffterFinished字段,TTL会自动清理已经结束的资源

  • 删除Job对象,会级联删除依赖的对象
  • 如果设置为0,Job完成之后立即自动删除
  • 不设置不会自动删除