HPA 水平自动扩缩流程

概述

HPA (Horizontal Pod Autoscaler) 根据 Pod 的资源使用率（如 CPU、内存）或自定义指标自动调整 Deployment、ReplicaSet 或 StatefulSet 的副本数。

核心架构

┌──────────────────────────────────────────────────────────────────────────────┐
│                        HorizontalController                                 │
│              pkg/controller/podautoscaler/horizontal.go                    │
│                                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │ HPA Lister  │  │ Pod Lister  │  │   Queue     │  │Recommendations│     │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘     │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        ReplicaCalculator                                   │
│              pkg/controller/podautoscaler/replica_calculator.go            │
│                                                                             │
│  - 计算期望副本数                                                           │
│  - 处理缺失指标                                                              │
│  - 处理未就绪 Pod                                                           │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        MetricsClient                                       │
│              pkg/controller/podautoscaler/metrics/                         │
│                                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │Resource API │  │Pods API    │  │Container API│  │External API │     │
│  │ (CPU/Mem)   │  │(Custom)    │  │(Container)  │  │(External)   │     │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘     │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        ScaleClient                                         │
│              staging/src/k8s.io/client-go/scale/                           │
│                                                                             │
│  - 获取当前副本数                                                           │
│  - 更新目标副本数                                                           │
└──────────────────────────────────────────────────────────────────────────────┘

核心数据结构

HPA API 类型

// staging/src/k8s.io/api/autoscaling/v2/types.go

type HorizontalPodAutoscalerSpec struct {
    ScaleTargetRef CrossVersionObjectReference  // 扩缩目标
    MinReplicas    *int32                       // 最小副本数
    MaxReplicas    int32                        // 最大副本数
    Metrics        []MetricSpec                 // 指标配置
    Behavior       *HorizontalPodAutoscalerBehavior  // 扩缩行为
}

type MetricSpec struct {
    Type MetricSourceType  // Resource, Pods, Object, ContainerResource, External

    Resource         *ResourceMetricSource
    Pods             *PodsMetricSource
    Object           *ObjectMetricSource
    ContainerResource *ContainerResourceMetricSource
    External         *ExternalMetricSource
}

type MetricTarget struct {
    Type               MetricTargetType  // Utilization, Value, AverageValue
    Value              *resource.Quantity
    AverageValue       *resource.Quantity
    AverageUtilization *int32  // CPU/内存使用率百分比
}

扩缩行为配置

type HorizontalPodAutoscalerBehavior struct {
    ScaleUp        *HPAScalingRules   // 扩容行为
    ScaleDown      *HPAScalingRules   // 缩容行为
}

type HPAScalingRules struct {
    StabilizationWindowSeconds *int32       // 稳定窗口
    SelectPolicy               *ScalingPolicySelect  // 选择策略
    Policies                   []HPAScalingPolicy    // 扩缩策略
}

type HPAScalingPolicy struct {
    Type          ScalingPolicyType  // Pods, Percent
    Value         int32
    PeriodSeconds int32
}

控制器工作流程

主循环

文件: pkg/controller/podautoscaler/horizontal.go
函数: HorizontalController.Run() (第 195 行)

┌──────────────────────────────────────────────────────────────────────────────┐
│                              控制器启动                                       │
│                    HorizontalController.Run(ctx, workers)                    │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                           等待缓存同步                                        │
│                WaitForCacheSync(hpaListerSynced, podListerSynced)           │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                           启动 Worker                                        │
│                    for i := 0; i < workers; i++                              │
│                         go wait.Until(worker, time.Second)                   │
└──────────────────────────────────────────────────────────────────────────────┘

单次调谐流程

函数: HorizontalController.reconcileAutoscaler() (约 400 行)

┌──────────────────────────────────────────────────────────────────────────────┐
│                          从队列获取 HPA                                       │
│                         queue.Get() -> key                                   │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                         获取 HPA 和目标资源                                   │
│              hpaLister.Get(namespace, name)                                  │
│              scaleClient.Scales(namespace).Get(resource)                     │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                           获取指标数据                                        │
│                   metricsClient.GetMetric()                                  │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ 1. Resource 指标 (CPU/Memory)                                        │   │
│  │    -> metrics.k8s.io API                                             │   │
│  │                                                                      │   │
│  │ 2. Pods 指标 (自定义 Pod 指标)                                        │   │
│  │    -> custom.metrics.k8s.io API                                      │   │
│  │                                                                      │   │
│  │ 3. ContainerResource 指标 (容器级别)                                  │   │
│  │    -> metrics.k8s.io API (container 级别)                            │   │
│  │                                                                      │   │
│  │ 4. Object/External 指标                                              │   │
│  │    -> external.metrics.k8s.io API                                    │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                          计算期望副本数                                       │
│                   ReplicaCalculator.GetResourceReplica()                     │
│                                                                             │
│  1. 获取所有 Pod 的指标值                                                     │
│  2. 排除未就绪/终止的 Pod                                                     │
│  3. 计算使用率/平均值                                                        │
│  4. 应用扩缩公式                                                             │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                          应用扩缩策略                                         │
│                                                                             │
│  1. 计算原始期望副本数                                                        │
│     desiredReplicas = ceil[currentReplicas * (currentMetric / target)]      │
│                                                                             │
│  2. 应用 ScaleUp 策略                                                        │
│     - 限制单次扩容比例 (默认 max 100%/4个)                                     │
│     - 稳定窗口 (防止抖动)                                                     │
│                                                                             │
│  3. 应用 ScaleDown 策略                                                      │
│     - 限制单次缩容比例 (默认 max 100%)                                        │
│     - 稳定窗口 (默认 300s)                                                   │
│                                                                             │
│  4. 应用 Min/Max 限制                                                        │
│     desiredReplicas = max(minReplicas, min(desiredReplicas, maxReplicas))   │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                          更新目标副本数                                       │
│                   scaleClient.Scales().Update(replicas)                      │
│                                                                             │
│  - 更新 Deployment/ReplicaSet 的 spec.replicas                              │
│  - 由对应的控制器负责实际扩缩 Pod                                             │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                          更新 HPA Status                                     │
│                   hpaClient.UpdateStatus(hpa)                                │
│                                                                             │
│  - currentReplicas                                                          │
│  - desiredReplicas                                                          │
│  - currentMetrics                                                           │
│  - conditions                                                               │
└──────────────────────────────────────────────────────────────────────────────┘

副本数计算公式

基本公式

1	desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

CPU 使用率计算

// pkg/controller/podautoscaler/replica_calculator.go

func (c *ReplicaCalculator) GetResourceReplica(
    currentReplicas int32,
    targetUtilization int32,
    readyPodCount int64,
    metrics PodMetricsInfo,
) int32 {
    // 计算当前使用率
    utilization = sum(podMetrics) / sum(podRequests) * 100

    // 计算期望副本数
    desiredReplicas = ceil(currentReplicas * utilization / targetUtilization)

    return desiredReplicas
}

处理特殊情况

1. 缺失指标

// 使用 Ready Pod 数量作为基准
if len(metrics) == 0 {
    // 等待指标可用
    return currentReplicas, err
}

2. 未就绪 Pod

// 排除未就绪的 Pod
readyPods = filter(pods, IsReady)
// 未就绪 Pod 按比例计入
missingPods = currentReplicas - len(readyPods)
adjustedReplicas = desiredReplicas + ceil(desiredReplicas * missingPods / readyPods)

3. 初始化期间

// CPU 初始化窗口期 (默认 5 分钟)
if time.Since(podStartTime) < cpuInitializationPeriod {
    // 使用最小副本数
    return minReplicas
}

扩缩行为策略

扩容策略 (ScaleUp)

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0   # 无稳定窗口
    selectPolicy: Max               # 选择最激进的策略
    policies:
    - type: Percent
      value: 100                    # 单次最多扩容 100%
      periodSeconds: 15             # 15 秒内
    - type: Pods
      value: 4                      # 单次最多扩容 4 个 Pod
      periodSeconds: 15

缩容策略 (ScaleDown)

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 5 分钟稳定窗口
    selectPolicy: Max
    policies:
    - type: Percent
      value: 100                     # 单次最多缩容 100%
      periodSeconds: 15

指标类型详解

1. Resource 指标

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70

来源: metrics-server
计算: 所有 Pod 平均使用率

2. Pods 指标

metrics:
- type: Pods
  pods:
    metric:
      name: packets-per-second
    target:
      type: AverageValue
      averageValue: 1k

来源: 自定义 metrics adapter
计算: 所有 Pod 平均值

3. ContainerResource 指标

metrics:
- type: ContainerResource
  containerResource:
    name: cpu
    container: application
    target:
      type: Utilization
      averageUtilization: 80

来源: metrics-server
计算: 指定容器平均使用率

4. External 指标

metrics:
- type: External
  external:
    metric:
      name: queue_messages_ready
      selector:
        matchLabels:
          queue: "my-queue"
    target:
      type: AverageValue
      averageValue: 30

来源: 外部 metrics adapter
计算: 外部指标值

关键代码路径

文件	说明
`pkg/controller/podautoscaler/horizontal.go`	HPA 控制器主逻辑
`pkg/controller/podautoscaler/replica_calculator.go`	副本数计算
`pkg/controller/podautoscaler/metrics/`	指标客户端
`staging/src/k8s.io/api/autoscaling/v2/types.go`	HPA API 类型
`staging/src/k8s.io/client-go/scale/`	Scale 子资源客户端

常见问题排查

1. 指标不可用

# 检查 metrics-server 是否运行
kubectl get pods -n kube-system | grep metrics-server

# 检查指标 API
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods

2. HPA 无法获取指标

# 检查 HPA 状态
kubectl describe hpa <name>

# 常见错误
# - unable to get metric cpu: no metrics returned
# - the HPA was unable to compute the replica count

3. 扩缩频繁抖动

增加稳定窗口时间
调整容忍度 (tolerance)
使用多指标组合

4. 副本数不变

# 检查条件
kubectl get hpa <name> -o jsonpath='{.status.conditions}'

# 可能原因
# - AwaitingReplicaCountActive
# - UnableToScale
# - ScalingLimited

配置示例

完整 HPA 配置

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

面试题

基础题

1. HPA 的工作原理是什么？

参考答案：
HPA (Horizontal Pod Autoscaler) 根据 Pod 的资源使用率或自定义指标，自动调整 Deployment、ReplicaSet 或 StatefulSet 的副本数。

核心流程：

Metrics Server 收集 Pod 指标
HPA Controller 定期获取指标
计算期望副本数 = ceil(当前副本数 × 当前指标值 / 目标指标值)
应用扩缩策略（稳定窗口、速率限制）
更新目标资源的 replicas 字段

2. HPA 支持哪些指标类型？

参考答案：

指标类型	说明	数据源
Resource	CPU/内存使用率	metrics.k8s.io
Pods	自定义 Pod 指标	custom.metrics.k8s.io
ContainerResource	容器级资源指标	metrics.k8s.io
Object	外部对象指标	external.metrics.k8s.io
External	集群外部指标	external.metrics.k8s.io

3. HPA 的默认扩缩行为是什么？

参考答案：

扩容：立即执行，无稳定窗口
缩容：默认 5 分钟（300s）稳定窗口
扩容速率：单次最多翻倍或 +4 个 Pod
缩容速率：单次最多缩容 100%

中级题

4. 解释 HPA 副本数计算公式。

参考答案：

1	desiredReplicas = ceil[currentReplicas × (currentMetricValue / targetMetricValue)]

示例：

当前副本数：10
当前 CPU 使用率：70%
目标 CPU 使用率：50%

desiredReplicas = ceil[10 × (70 / 50)] = ceil[14] = 14

注意事项：

如果 currentMetric < target，可能触发缩容
结果受 minReplicas 和 maxReplicas 限制
多指标时取最大值

5. 什么是稳定窗口（Stabilization Window）？有什么作用？

参考答案：
稳定窗口防止扩缩频繁抖动，在窗口期内保持之前的决策。

配置示例：

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 缩容稳定窗口 5 分钟
  scaleUp:
    stabilizationWindowSeconds: 60   # 扩容稳定窗口 1 分钟

工作原理：

扩容时：在窗口内取最小推荐值
缩容时：在窗口内取最大推荐值

6. 如何配置扩缩策略（Scaling Policies）？

参考答案：

behavior:
  scaleUp:
    stabilizationWindowSeconds: 60
    selectPolicy: Max  # 选择最激进的策略
    policies:
    - type: Percent
      value: 100        # 单次最多扩容 100%
      periodSeconds: 15
    - type: Pods
      value: 4          # 单次最多扩容 4 个 Pod
      periodSeconds: 15
  scaleDown:
    stabilizationWindowSeconds: 300
    selectPolicy: Max
    policies:
    - type: Percent
      value: 50         # 单次最多缩容 50%
      periodSeconds: 60

策略类型：

Percent：按百分比
Pods：按绝对数量

选择策略：

Max：选择最激进的（扩容选最大，缩容选最小）
Min：选择最保守的
Disabled：禁用

7. HPA 如何处理未就绪的 Pod？

参考答案：

扩容时：

未就绪 Pod 视为使用 0% 资源
这会导致计算出的副本数更高，加速扩容

缩容时：

缺失 Pod 视为使用 100% 资源
这会阻止缩容，保护正在启动的 Pod

CPU 初始化窗口（默认 5 分钟）：

Pod 启动后 5 分钟内，使用就绪 Pod 的平均 CPU
避免冷启动导致的错误扩缩

高级题

8. 多指标 HPA 是如何工作的？

参考答案：
多指标时，HPA 为每个指标计算期望副本数，取最大值。

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 80
- type: Pods
  pods:
    metric:
      name: requests-per-second
    target:
      type: AverageValue
      averageValue: 1k

计算过程：

CPU 指标 -> 期望 15 个副本
Memory 指标 -> 期望 12 个副本
QPS 指标 -> 期望 18 个副本

最终决策：max(15, 12, 18) = 18 个副本

9. 解释 HPA 的容忍度（Tolerance）机制。

参考答案：
容忍度防止小幅波动触发不必要的扩缩。

默认值：0.1（10%）

工作原理：

// pkg/controller/podautoscaler/replica_calculator.go
if math.Abs(1.0 - usageRatio) <= c.tolerance {
    // 在容忍范围内，不触发扩缩
    return currentReplicas
}

示例：

目标 CPU：50%
当前 CPU：55%
使用率比值：55/50 = 1.1
|1.0 - 1.1| = 0.1 <= 0.1 (tolerance)

结果：不触发扩缩，保持当前副本数

10. HPA 与 VPA 可以同时使用吗？有什么限制？

参考答案：

可以同时使用，但有条件：

HPA 基于 CPU/内存：
- 不能与 VPA 同时使用
- VPA 会动态调整 requests，影响 HPA 计算
HPA 基于自定义/外部指标：
- 可以与 VPA 同时使用
- HPA 基于 QPS 等指标，不依赖资源请求

推荐做法：

# HPA 使用自定义指标
metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: 100

# VPA 管理资源请求
# (不影响 HPA 的 QPS 指标)

11. 如何实现基于 Prometheus 指标的 HPA？

参考答案：

1. 部署 Prometheus Adapter：

1	helm install prometheus-adapter prometheus-community/prometheus-adapter

2. 配置自定义指标规则：

# prometheus-adapter 配置
rules:
  custom:
  - seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'
    resources:
      overrides:
        kubernetes_namespace: {resource: "namespace"}
        kubernetes_pod_name: {resource: "pod"}
    name:
      as: "http_requests_per_second"
    metricsQuery: 'sum(rate(http_requests_total{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

3. 创建 HPA：

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

12. 解释 HPA 的 ScaleTargetRef 是如何工作的。

参考答案：
HPA 通过 Scale 子资源与目标资源交互。

1. Scale 子资源定义：

// staging/src/k8s.io/api/autoscaling/v1/types.go
type Scale struct {
    Spec: ScaleSpec {
        Replicas int32  // 期望副本数
    }
    Status: ScaleStatus {
        Replicas int32  // 当前副本数
        Selector string
    }
}

2. HPA 操作流程：

// 1. 获取 Scale 对象
scale, _ := scaleClient.Scales(namespace).Get("deployments", "my-app")

// 2. 计算新副本数
scale.Spec.Replicas = newReplicas

// 3. 更新 Scale
scaleClient.Scales(namespace).Update("deployments", scale)

3. 支持的资源：

Deployment
ReplicaSet
StatefulSet
任何实现 Scale 子资源的 CRD

场景题

13. 应用流量激增时 HPA 响应太慢，如何优化？

参考答案：

1. 调整扩容策略：

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0  # 无稳定窗口
    selectPolicy: Max
    policies:
    - type: Percent
      value: 900      # 允许一次扩容 9 倍
      periodSeconds: 15
    - type: Pods
      value: 20       # 允许一次增加 20 个 Pod
      periodSeconds: 15

2. 降低同步间隔：

1	kube-controller-manager --horizontal-pod-autoscaler-sync-period=10s

3. 使用预测性扩缩：

基于历史数据的预测
使用 KEDA 或自定义控制器

4. 预热副本：

1 2	# 设置合理的 minReplicas minReplicas: 5 # 而不是 1

14. HPA 持续扩缩容抖动，如何解决？

参考答案：

1. 增加稳定窗口：

behavior:
  scaleDown:
    stabilizationWindowSeconds: 600  # 10 分钟
  scaleUp:
    stabilizationWindowSeconds: 120  # 2 分钟

2. 调整容忍度：

1	kube-controller-manager --horizontal-pod-autoscaler-tolerance=0.2 # 20%

3. 使用多指标：

metrics:
# CPU 指标波动大
- type: Resource
  resource:
    name: cpu
    target:
      averageUtilization: 70
# 添加更稳定的指标
- type: Pods
  pods:
    metric:
      name: active_connections
    target:
      averageValue: 100

4. 调整指标采集窗口：

使用更长的采集窗口（如 2m 而不是 30s）

15. 如何实现基于队列长度的自动扩缩？

参考答案：

方案 1：使用 KEDA：

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-scaler
spec:
  scaleTargetRef:
    name: my-worker
  pollingInterval: 15
  cooldownPeriod: 30
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
  - type: rabbitmq
    metadata:
      host: amqp://guest:guest@rabbitmq:5672
      queueName: myqueue
      queueLength: "10"  # 每 10 个消息 1 个副本

方案 2：使用 External 指标：

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector:
          matchLabels:
            queue: "myqueue"
      target:
        type: AverageValue
        averageValue: 10

16. HPA 的 maxReplicas 设置为多少合适？

参考答案：

计算公式：

maxReplicas = max(预期峰值副本数, 应急扩容余量)

预期峰值副本数 = 预期峰值QPS / 单Pod处理能力
应急扩容余量 = 预期峰值副本数 × 1.5

考虑因素：

资源限制：

1 2	# 检查集群资源 kubectl describe nodes \| grep -A 5 "Allocated resources"

预算限制：
- 计算单 Pod 成本
- 设置 maxReplicas 限制最大成本
下游依赖：
- 数据库连接数
- 外部 API 限流
启动时间：
- 如果 Pod 启动需要 2 分钟
- 考虑预扩容

最佳实践：

spec:
  minReplicas: 3       # 高可用最低要求
  maxReplicas: 50      # 根据集群容量设置
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 70  # 留 30% 余量