K8s工作负载-StatefulSet

基于1.25

什么是StatefulSet

StatefulSet负责管理有状态应用,缩写sts

  • 要求管理的Pod具有稳定的网络标识符和存储卷,实现又状态应用的数据持久化和数据访问
  • 管理Pod,会对Pod有一个按照顺序增大的ID

使用场景

  • 具有稳定、唯一的网络标识符(DNS Name)
  • 每个Pod是中对应各自的存储路径(PersistVolumeClaimTempleate)
  • 按照顺序增加副本、减少副本,并且在减少副本的时执行清理
  • 按照顺序执行滚动更新

限制

  • Pod存储的要么由Storage Class的PVC提供,要么事先创建
  • 删除或缩容一个StatefulSet不会删除对应的数据卷,确保数据安全
  • 在删除StatefulSet,无法确保Pod的终止的正常的
    • 如果需要,需要使用优雅的终止,需要先Scale Down到0
  • 在使用默认的Pod Management Policy(OraderReady)进行滚动更新,可能会进入到错误状态,需要人工介入

⚠️:不要时强制删除StatefulSet管理的Pod,本身机制,只会最多提供一个对外访问。

强制删除,可能出现超过一个Pod对外提供

StatefulSetSpec

// A StatefulSetSpec is the specification of a StatefulSet.
type StatefulSetSpec struct {
// replicas is the desired number of replicas of the given Template.
// These are replicas in the sense that they are instantiations of the
// same Template, but individual replicas also have a consistent identity.
// If unspecified, defaults to 1.
// TODO: Consider a rename of this field.
// +optional
// 期望的Pod数量,默认1
Replicas *int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`

// selector is a label query over pods that should match the replica count.
// It must match the pod template's labels.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
// 标签选择器,必须和PodTemplate匹配
Selector *metav1.LabelSelector `json:"selector" protobuf:"bytes,2,opt,name=selector"`

// template is the object that describes the pod that will be created if
// insufficient replicas are detected. Each pod stamped out by the StatefulSet
// will fulfill this Template, but have a unique identity from the rest
// of the StatefulSet. Each pod will be named with the format
// <statefulsetname>-<podindex>. For example, a pod in a StatefulSet named
// "web" with index number "3" would be named "web-3".
// The only allowed template.spec.restartPolicy value is "Always".
// 描述Pod 的Template
Template v1.PodTemplateSpec `json:"template" protobuf:"bytes,3,opt,name=template"`

// volumeClaimTemplates is a list of claims that pods are allowed to reference.
// The StatefulSet controller is responsible for mapping network identities to
// claims in a way that maintains the identity of a pod. Every claim in
// this list must have at least one matching (by name) volumeMount in one
// container in the template. A claim in this list takes precedence over
// any volumes in the template, with the same name.
// TODO: Define the behavior if a claim already exists with the same name.
// +optional
// +listType=atomic
// 允许Pod引用的PVC列表
VolumeClaimTemplates []v1.PersistentVolumeClaim `json:"volumeClaimTemplates,omitempty" protobuf:"bytes,4,rep,name=volumeClaimTemplates"`

// serviceName is the name of the service that governs this StatefulSet.
// This service must exist before the StatefulSet, and is responsible for
// the network identity of the set. Pods get DNS/hostnames that follow the
// pattern: pod-specific-string.serviceName.default.svc.cluster.local
// where "pod-specific-string" is managed by the StatefulSet controller.
// 管理此StatefulSet的Service Name
ServiceName string `json:"serviceName" protobuf:"bytes,5,opt,name=serviceName"`

// podManagementPolicy controls how pods are created during initial scale up,
// when replacing pods on nodes, or when scaling down. The default policy is
// `OrderedReady`, where pods are created in increasing order (pod-0, then
// pod-1, etc) and the controller will wait until each pod is ready before
// continuing. When scaling down, the pods are removed in the opposite order.
// The alternative policy is `Parallel` which will create pods in parallel
// to match the desired scale without waiting, and on scale down will delete
// all pods at once.
// +optional
// StatefulSet的管理模式,可选为OrderedReady和Parallel ,默认OrderedReady
PodManagementPolicy PodManagementPolicyType `json:"podManagementPolicy,omitempty" protobuf:"bytes,6,opt,name=podManagementPolicy,casttype=PodManagementPolicyType"`

// updateStrategy indicates the StatefulSetUpdateStrategy that will be
// employed to update Pods in the StatefulSet when a revision is made to
// Template.
// 对于Template进行修改时 更新StatefulSet中Pod的策略
UpdateStrategy StatefulSetUpdateStrategy `json:"updateStrategy,omitempty" protobuf:"bytes,7,opt,name=updateStrategy"`

// revisionHistoryLimit is the maximum number of revisions that will
// be maintained in the StatefulSet's revision history. The revision history
// consists of all revisions not represented by a currently applied
// StatefulSetSpec version. The default value is 10.
// 限制StaetfulSet修订历史的数量
RevisionHistoryLimit *int32 `json:"revisionHistoryLimit,omitempty" protobuf:"varint,8,opt,name=revisionHistoryLimit"`

// Minimum number of seconds for which a newly created pod should be ready
// without any of its container crashing for it to be considered available.
// Defaults to 0 (pod will be considered available as soon as it is ready)
// +optional
// 最小就绪准备时间
MinReadySeconds int32 `json:"minReadySeconds,omitempty" protobuf:"varint,9,opt,name=minReadySeconds"`

// persistentVolumeClaimRetentionPolicy describes the lifecycle of persistent
// volume claims created from volumeClaimTemplates. By default, all persistent
// volume claims are created as needed and retained until manually deleted. This
// policy allows the lifecycle to be altered, for example by deleting persistent
// volume claims when their stateful set is deleted, or when their pod is scaled
// down. This requires the StatefulSetAutoDeletePVC feature gate to be enabled,
// which is beta.
// +optional
// 创建VolumeClaimTemplates 的持久卷生命周期
// 默认情况下持久卷生命都保留,此选项,可以设置删除声明
// 处于alpha 需要启用StatefulSetAutoDeletePVC门控
PersistentVolumeClaimRetentionPolicy *StatefulSetPersistentVolumeClaimRetentionPolicy `json:"persistentVolumeClaimRetentionPolicy,omitempty" protobuf:"bytes,10,opt,name=persistentVolumeClaimRetentionPolicy"`

// ordinals controls the numbering of replica indices in a StatefulSet. The
// default ordinals behavior assigns a "0" index to the first replica and
// increments the index by one for each additional replica requested.
// +optional
Ordinals *StatefulSetOrdinals `json:"ordinals,omitempty" protobuf:"bytes,11,opt,name=ordinals"`
}

Pod管理策略

  • OrderedReady:默认值
    • 创建副本数为N的StatefulSet,Pod按照(0,1…N-1)顺序依次创建
    • 删除副本数为N的StatefulSet,Pod按照(N-1…0)顺序依次删除
    • 在对StatefulSet进行扩容操作,新增Pod所有前序Pod必须是Running或者Ready状态
    • 终止或者删除Pod,后序所有Pod必须终止
  • Paralled:创建或者终止所有Pod

Pod更新策略

提供了俩种更新策略

  • OnDelete:更新,不会销毁任何Pod,只有手动删除一个Pod或者STS对象的时候,才会按照新的定义创建Pod
    • 会更新所有的Pod
    • 网络标识符不会变
  • RollingUpdate:逐步更新Pod,默认更新策略(更新速度慢一点)
    • 从次序大Pod,逐个更新Pod,至到最小的Pod被更新
    • 正在更新的Pod进入Running或者Ready,才继续更新前面一个Pod
    • 每一个Pod网络标识符可能会变化

⚠️使用默认管理策略OrderedReady,可能进入卡住

  • 更新Pod的Template,导致某一个Pod一致无法启动
  • STS停止滚动更新
  • 需要删除所有的使用有问题的tamplate的Pod

RollingUpdate参数

// RollingUpdateStatefulSetStrategy is used to communicate parameter for RollingUpdateStatefulSetStrategyType.
type RollingUpdateStatefulSetStrategy struct {
// Partition indicates the ordinal at which the StatefulSet should be partitioned
// for updates. During a rolling update, all pods from ordinal Replicas-1 to
// Partition are updated. All pods from ordinal Partition-1 to 0 remain untouched.
// This is helpful in being able to do a canary based deployment. The default value is 0.
// +optional
// 指示StatefulSet应该被分区进行更新的序号
Partition *int32 `json:"partition,omitempty" protobuf:"varint,1,opt,name=partition"`
// The maximum number of pods that can be unavailable during the update.
// Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%).
// Absolute number is calculated from percentage by rounding up. This can not be 0.
// Defaults to 1. This field is alpha-level and is only honored by servers that enable the
// MaxUnavailableStatefulSet feature. The field applies to all pods in the range 0 to
// Replicas-1. That means if there is any unavailable pod in the range 0 to Replicas-1, it
// will be counted towards MaxUnavailable.
// +optional
// 更新期间最大不可用Pod数量 默认1
// 属于alpha级别,需要启用MaxUnavaileableStatefulSet
MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty" protobuf:"varint,2,opt,name=maxUnavailable"`
}