Kube-controller-manager(GarbageController)

基于1.25

GarbageController负责资源对象的级联删除(GC)

  • K8s中,主要通过OwnerReference来记录资源的级联关系

OwnerReference结构定义:

  • UID:资源对象父级资源的UID

  • BlockOwnerDeletion:true表示父级资源在以Foregroud删除策略执行删除(在删除这个资源之前,不可删除父资源)

  • GC Controller支持三种删除策略

    • Orphan(孤儿删除):仅删除当前资源对象,不级联删除当前资源对象的子级资源对象
    • Foreground(前台删除):先删除当前资源对象子对象,再删除这个对象
    • Backgroud(后台删除):先删除当前资源对象,再级联删除这个资源的子级对象
    • 使用Orphan和Foreground,资源对象会被加上Finalizer,并且在删除中由GC Controller负责移除
  • Ref:https://github.com/kubernetes/apimachinery/blob/fdcfc2723dc8dbef76ea77d085e7d75e2991546f/pkg/apis/meta/v1/types.go#L291

    / OwnerReference contains enough information to let you identify an owning
    // object. An owning object must be in the same namespace as the dependent, or
    // be cluster-scoped, so there is no namespace field.
    // +structType=atomic
    type OwnerReference struct {
    // API version of the referent.
    APIVersion string `json:"apiVersion" protobuf:"bytes,5,opt,name=apiVersion"`
    // Kind of the referent.
    // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
    Kind string `json:"kind" protobuf:"bytes,1,opt,name=kind"`
    // Name of the referent.
    // More info: http://kubernetes.io/docs/user-guide/identifiers#names
    Name string `json:"name" protobuf:"bytes,3,opt,name=name"`
    // UID of the referent.
    // More info: http://kubernetes.io/docs/user-guide/identifiers#uids
    UID types.UID `json:"uid" protobuf:"bytes,4,opt,name=uid,casttype=k8s.io/apimachinery/pkg/types.UID"`
    // If true, this reference points to the managing controller.
    // +optional
    Controller *bool `json:"controller,omitempty" protobuf:"varint,6,opt,name=controller"`
    // If true, AND if the owner has the "foregroundDeletion" finalizer, then
    // the owner cannot be deleted from the key-value store until this
    // reference is removed.
    // See https://kubernetes.io/docs/concepts/architecture/garbage-collection/#foreground-deletion
    // for how the garbage collector interacts with this field and enforces the foreground deletion.
    // Defaults to false.
    // To set this field, a user needs "delete" permission of the owner,
    // otherwise 422 (Unprocessable Entity) will be returned.
    // +optional
    BlockOwnerDeletion *bool `json:"blockOwnerDeletion,omitempty" protobuf:"varint,7,opt,name=blockOwnerDeletion"`
    }

控制器初始化

仅仅跳过Event资源对象

主要执行逻辑

  1. gc.dependencyGraphBuilder.Run

    该协程负责从graphChanges工作队列中取出资源对象和事件类型,更新GC Controller记录的资源对象依赖关系图。

    如果有资源对象关系图变化,导致了一些资源对象需要被删除,会被放入到attemptOrphan或者acttemptToDelete工作队列

  2. gc.runAttemptToOrphan.Worker

    负责取出数据,进行孤儿策略删除

  3. gc.runAttemptToDelete.Worker

    负责取出数据,按照前台或者后台删除

更新资源对象依赖关系图

concurrentUIDToNode利用一个Map来记录资源对象关系图中的资源对象映射

node结构:

  • Ref:https://github.com/kubernetes/kubernetes/blob/88e994f6bf8fc88114c5b733e09afea339bea66d/pkg/controller/garbagecollector/graph.go#L42

    // The single-threaded GraphBuilder.processGraphChanges() is the sole writer of the
    // nodes. The multi-threaded GarbageCollector.attemptToDeleteItem() reads the nodes.
    // WARNING: node has different locks on different fields. setters and getters
    // use the respective locks, so the return values of the getters can be
    // inconsistent.
    type node struct {
    // 资源对象本身
    identity objectReference
    // dependents will be read by the orphan() routine, we need to protect it with a lock.
    dependentsLock sync.RWMutex
    // dependents are the nodes that have node.identity as a
    // metadata.ownerReference.
    // 当前资源对象的子级资源对象
    dependents map[*node]struct{}
    // this is set by processGraphChanges() if the object has non-nil DeletionTimestamp
    // and has the FinalizerDeleteDependents.
    deletingDependents bool
    deletingDependentsLock sync.RWMutex
    // this records if the object's deletionTimestamp is non-nil.
    beingDeleted bool
    beingDeletedLock sync.RWMutex
    // this records if the object was constructed virtually and never observed via informer event
    virtual bool
    virtualLock sync.RWMutex
    // when processing an Update event, we need to compare the updated
    // ownerReferences with the owners recorded in the graph.
    owners []metav1.OwnerReference
    }

资源对象依赖图的更新流程:

  1. 事件为Add或Uptdae且资源对象不在资源对象依赖关系图中

    1. gb.insertNode

      GC Controller生成Node,并且将其插入到资源对象关系图中。在插入过程中,不仅将node插入图中,还需要更新node的owners

    2. gb.processTransitions

      如果资源对象拥有Orphan Finalizer且DeletionTimestamp不为空,则说明当前资源对象正在被孤儿策略删除,此时node放入attemptToOrphan工作队列

  2. 事件为AddUptdae且资源对象在资源对象依赖关系图中

    1. ReferenceDiffs

      检查node中记录的原有的父级资源对象和当前资源对象的OwnerReference,找到新增的父级资源对象、减少的父级资源对象、更新的父级资源对象。对于变更,执行下下面的流程:

      • gc.addUnblockedOwnersToDeleteQueue

        对于减少的父级的资源对象,GC Controller 检查当前的资源对象是否曾经阻塞这些父级资源对象的前台删除

      • gc.addDependentToOwners

        如果资源对象新增了某些父级资源对象,则需要在资源对象依赖关系图中更新这些父级对应的node,将资源对象添加到这些父级资源对象的node的dependents

    2. markBeingDeleted

      如果资源对象的DeletionTimestamp不为空,则node的beingDeleted更新为true。该属性影响后续删除步骤的执行

    3. gc.processTransitions

      根据资源对象的DeletionTimestamp和Finalizer,将其加入attemptToDelete或attemptToOrphan

  3. 事件为Delete

    1. gb.removeNode

      GC Controoler将当前node从资源对象依赖关系中删除。与insertNode方法相反,GC Controller将当前node从资源对象依赖关系图中删除,移除node的dependents

    2. markBeingDeleted

      根据资源对象的DeletionTimestamp和Finalizer,将其加入attemptToDelete或attemptToOrphan工作队列

    3. existingNode.dependents

      将当前node的子级资源对象加入attemptToDelete工作队列

      如果一个资源对象是后台策略删除,则它的各个子级资源对象此刻被加入attemptToDelete工作队列

    4. existingNode.owners

      检查当前node的各个父级资源对象,如果父级资源对象正在执行前台删除,则父级资源对象也加入attemptToDelete工作队列

孤儿删除

孤儿删除策略在runAfftemptToOrphanWorker中完成

流程如下:

  1. gc.orphanDependents

    解除资源对象与其子级资源对象之间的关联关系。

  2. gc.removeFinalizer

    移除Orphan Finalizer。

    让GC Controller有机会在删除对象之前清理资源对象的关联关系。

    完成关联关系清理之后,GC Controller将Orphan Finalizer移除,之后资源对象会真正被删除

级联删除

级联删除执行在runAttemptToDeleteWorker中完成

主要执行流程如下:

  1. Item.isBeingDeleted() & !item.isDeletingDependents

    检查资源对象是否处于删除中状态且是否为前台删除

  2. item.isDeletingDependents

    检查资源是否处于前台删除中,如果是执行前台删除

    • Ref:https://github.com/kubernetes/kubernetes/blob/88e994f6bf8fc88114c5b733e09afea339bea66d/pkg/controller/garbagecollector/garbagecollector.go#L620

      // process item that's waiting for its dependents to be deleted
      func (gc *GarbageCollector) processDeletingDependentsItem(item *node) error {
      blockingDependents := item.blockingDependents()
      if len(blockingDependents) == 0 {
      klog.V(2).Infof("remove DeleteDependents finalizer for item %s", item.identity)
      return gc.removeFinalizer(item, metav1.FinalizerDeleteDependents)
      }
      for _, dep := range blockingDependents {
      if !dep.isDeletingDependents() {
      klog.V(2).Infof("adding %s to attemptToDelete, because its owner %s is deletingDependents", dep.identity, item.identity)
      gc.attemptToDelete.Add(dep)
      }
      }
      return nil
      }
  3. gc.classifyReferences

根据当前资源对象的OwnerReference分类处理级联删除

gc.classifyReferences主要分为三类:

  • Solid:没有在等待其他资源对象完成前台删除
  • WaitingForDependentsDeletion:在等待其他资源对象完成前台删除
  • Danging:已经不存在的资源对象