Cloud Run / GKE 部署:云端部署全攻略
详解在 Google Cloud Run 和 GKE 上部署 ADK Go Agent——包括日志配置、Metrics、Traces、密钥管理。
Table of Contents
Cloud Run / GKE 部署:云端部署全攻略
云原生部署不仅是把容器放到云上,而是充分利用云平台的托管能力——自动扩缩容、负载均衡、服务发现、密钥管理、可观测性等。Google Cloud 提供了从 Serverless(Cloud Run)到 Kubernetes(GKE)的全 spectrum 部署选项,理解它们的差异和适用场景,才能为 Agent 系统选择最合适的架构。
本文将深入讲解 Cloud Run 和 GKE 的部署实践,包括 CI/CD 流水线、密钥管理、监控告警和成本优化。
Cloud Run 部署:Serverless 的极致 simplicity
Cloud Run 是 Google Cloud 的完全托管容器平台,它抽象了服务器管理,让你只需关注容器镜像。对于 Agent 场景,Cloud Run 的按请求计费模式和自动扩缩容特性非常有吸引力——没有请求时实例数可以缩到 0,有请求时毫秒级启动新实例。
构建并推送镜像
# 方式 1:使用 Cloud Build(推荐,无需本地 Docker)
gcloud builds submit \
--tag gcr.io/$PROJECT_ID/my-agent:v1.2.3 \
--build-arg VERSION=v1.2.3 \
.
# 方式 2:本地构建并推送(需要 Docker 和 gcloud 认证)
docker build \
--build-arg VERSION=v1.2.3 \
-t gcr.io/$PROJECT_ID/my-agent:v1.2.3 \
-t gcr.io/$PROJECT_ID/my-agent:latest \
.
docker push gcr.io/$PROJECT_ID/my-agent:v1.2.3
docker push gcr.io/$PROJECT_ID/my-agent:latest
# 方式 3:使用 Cloud Build 配置文件(支持复杂构建)
# cloudbuild.yaml
gcloud builds submit --config cloudbuild.yaml
# cloudbuild.yaml
steps:
# 构建镜像
- name: 'gcr.io/cloud-builders/docker'
args:
- 'build'
- '--build-arg'
- 'VERSION=${_VERSION}'
- '--build-arg'
- 'BUILD_TIME=${_BUILD_TIME}'
- '--build-arg'
- 'GIT_COMMIT=${SHORT_SHA}'
- '-t'
- 'gcr.io/$PROJECT_ID/my-agent:${_VERSION}'
- '-t'
- 'gcr.io/$PROJECT_ID/my-agent:latest'
- '.'
# 推送镜像
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/my-agent:${_VERSION}']
# 安全扫描
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args:
- '-c'
- |
gcloud artifacts docker images scan \
gcr.io/$PROJECT_ID/my-agent:${_VERSION} \
--remote
# 部署到 Cloud Run
- name: 'gcr.io/cloud-builders/gcloud'
args:
- 'run'
- 'deploy'
- 'my-agent'
- '--image'
- 'gcr.io/$PROJECT_ID/my-agent:${_VERSION}'
- '--region'
- 'asia-east1'
- '--platform'
- 'managed'
- '--no-traffic' # 先部署但不接收流量,等待验证
substitutions:
_VERSION: v1.2.3
_BUILD_TIME: '2026-05-29T10:00:00Z'
images:
- 'gcr.io/$PROJECT_ID/my-agent:${_VERSION}'
- 'gcr.io/$PROJECT_ID/my-agent:latest'
Cloud Run 部署配置
# 基础部署
gcloud run deploy my-agent \
--image gcr.io/$PROJECT_ID/my-agent:v1.2.3 \
--platform managed \
--region asia-east1 \
--memory 1Gi \
--cpu 1 \
--concurrency 100 \
--max-instances 10 \
--min-instances 1 \
--port 8080 \
--timeout 300 \
--set-env-vars "LOG_LEVEL=info,LOG_FORMAT=json,MAX_CONCURRENT=100" \
--allow-unauthenticated
# 使用 YAML 配置文件部署(推荐用于版本控制)
gcloud run services replace service.yaml
# service.yaml - Cloud Run 服务配置
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-agent
annotations:
run.googleapis.com/ingress: all # 允许所有入站流量
run.googleapis.com/execution-environment: gen2 # 第二代执行环境
spec:
template:
metadata:
annotations:
# 自动扩缩容配置
autoscaling.knative.dev/minScale: "1" # 最少保留 1 个实例(避免冷启动)
autoscaling.knative.dev/maxScale: "20" # 最多 20 个实例
autoscaling.knative.dev/targetConcurrency: "50" # 每个实例处理 50 并发
# 连接配置
run.googleapis.com/cpu-throttling: "false" # 始终分配 CPU(适合长连接)
run.googleapis.com/startup-cpu-boost: "true" # 启动时提升 CPU
# 云监控
run.googleapis.com/execution-environment: gen2
spec:
containerConcurrency: 100 # 单个容器最大并发数
timeoutSeconds: 300 # 请求超时 5 分钟
serviceAccountName: my-agent-sa@$PROJECT_ID.iam.gserviceaccount.com
containers:
- image: gcr.io/$PROJECT_ID/my-agent:v1.2.3
ports:
- containerPort: 8080
env:
- name: LOG_LEVEL
value: "info"
- name: LOG_FORMAT
value: "json"
- name: MAX_CONCURRENT
value: "100"
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: redis-url
key: latest
- name: GOOGLE_API_KEY
valueFrom:
secretKeyRef:
name: google-api-key
key: latest
resources:
limits:
cpu: "2"
memory: "2Gi"
startupProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 6 # 30 秒内必须就绪
livenessProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 10
failureThreshold: 3
密钥管理:Secret Manager
生产环境绝不能在代码或环境变量中硬编码密钥,应使用 Google Secret Manager:
# 创建密钥
echo -n "your-api-key" | gcloud secrets create google-api-key --data-file=-
# 创建 Redis 连接字符串密钥
echo -n "redis://10.0.0.3:6379/0" | gcloud secrets create redis-url --data-file=-
# 查看密钥版本
gcloud secrets versions list google-api-key
# 更新密钥
echo -n "new-api-key" | gcloud secrets versions add google-api-key --data-file=-
在 Cloud Run 中引用 Secret:
# 方式 1:作为环境变量注入
env:
- name: GOOGLE_API_KEY
valueFrom:
secretKeyRef:
name: google-api-key
key: latest # 或指定版本号
# 方式 2:挂载为文件(更安全,避免 env 泄露)
volumeMounts:
- name: secrets
mountPath: /secrets
volumes:
- name: secrets
secret:
secretName: google-api-key
items:
- key: latest
path: api-key.txt
在 Go 代码中读取:
import (
secretmanager "cloud.google.com/go/secretmanager/apiv1"
"cloud.google.com/go/secretmanager/apiv1/secretmanagerpb"
)
func getSecret(ctx context.Context, name string) (string, error) {
client, err := secretmanager.NewClient(ctx)
if err != nil {
return "", err
}
defer client.Close()
req := &secretmanagerpb.AccessSecretVersionRequest{
Name: fmt.Sprintf("projects/%s/secrets/%s/versions/latest", projectID, name),
}
result, err := client.AccessSecretVersion(ctx, req)
if err != nil {
return "", err
}
return string(result.Payload.Data), nil
}
// 初始化时读取密钥
func initSecrets(ctx context.Context) error {
apiKey, err := getSecret(ctx, "google-api-key")
if err != nil {
return fmt.Errorf("failed to get API key: %w", err)
}
os.Setenv("GOOGLE_API_KEY", apiKey)
redisURL, err := getSecret(ctx, "redis-url")
if err != nil {
return fmt.Errorf("failed to get Redis URL: %w", err)
}
os.Setenv("REDIS_URL", redisURL)
return nil
}
冷启动优化
Cloud Run 的冷启动是最大挑战,特别是 Agent 应用可能需要加载模型、初始化连接:
# 1. 保持最小实例数(避免完全冷启动)
gcloud run services update my-agent --min-instances 1
# 2. 使用 startupProbe 确保就绪后再接收流量
# 3. 优化容器镜像大小(越小启动越快)
# 4. 使用第二代执行环境(启动更快)
gcloud run services update my-agent --execution-environment gen2
// 优化启动时间:延迟初始化非关键组件
func main() {
// 1. 先启动 HTTP 服务器(快速就绪)
server := startServer()
// 2. 后台初始化其他组件
go func() {
if err := initLLMClient(); err != nil {
log.Printf("failed to init LLM client: %v", err)
}
}()
go func() {
if err := initRedis(); err != nil {
log.Printf("failed to init Redis: %v", err)
}
}()
// 3. 等待信号
// ...
}
GKE 部署:Kubernetes 的完全掌控
当需要更精细的控制(自定义网络、持久化存储、复杂的扩缩容策略)时,GKE 是更好的选择。
创建集群
# 创建标准集群
gcloud container clusters create my-cluster \
--zone asia-east1-a \
--machine-type e2-standard-2 \
--num-nodes 3 \
--enable-autoscaling \
--min-nodes 2 \
--max-nodes 10 \
--enable-autorepair \
--enable-autoupgrade \
--disk-size 100GB \
--disk-type pd-ssd \
--workload-pool=$PROJECT_ID.svc.id.goog # 启用 Workload Identity
# 获取凭证
gcloud container clusters get-credentials my-cluster --zone asia-east1-a
# 验证连接
kubectl get nodes
部署配置
# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: agent-system
labels:
istio-injection: enabled # 启用 Istio 服务网格
---
# k8s/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-agent
namespace: agent-system
annotations:
iam.gke.io/gcp-service-account: my-agent-sa@$PROJECT_ID.iam.gserviceaccount.com
---
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-agent
namespace: agent-system
labels:
app: my-agent
version: v1.2.3
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 升级时最多多 1 个 Pod
maxUnavailable: 0 # 升级时保持全部可用
selector:
matchLabels:
app: my-agent
template:
metadata:
labels:
app: my-agent
version: v1.2.3
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: my-agent
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
# 亲和性:分散到不同节点
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- my-agent
topologyKey: kubernetes.io/hostname
# 优雅关闭
terminationGracePeriodSeconds: 60
containers:
- name: agent
image: gcr.io/$PROJECT_ID/my-agent:v1.2.3
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
env:
- name: PORT
value: "8080"
- name: LOG_LEVEL
value: "info"
- name: LOG_FORMAT
value: "json"
- name: MAX_CONCURRENT
value: "200"
- name: MAX_SESSIONS
value: "50000"
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: agent-secrets
key: redis-url
- name: GOOGLE_API_KEY
valueFrom:
secretKeyRef:
name: agent-secrets
key: google-api-key
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
# 健康检查
startupProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 12 # 60 秒启动时间
livenessProbe:
httpGet:
path: /health
port: http
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: http
periodSeconds: 5
failureThreshold: 3
# 安全上下文
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# 挂载临时目录
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir:
sizeLimit: 100Mi
---
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-agent
namespace: agent-system
labels:
app: my-agent
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- name: metrics
port: 9090
targetPort: metrics
protocol: TCP
selector:
app: my-agent
---
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-agent
namespace: agent-system
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-agent
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-agent
namespace: agent-system
annotations:
kubernetes.io/ingress.class: gce
kubernetes.io/ingress.global-static-ip-name: my-agent-ip
networking.gke.io/managed-certificates: my-agent-cert
networking.gke.io/v1beta1.FrontendConfig: my-agent-frontend-config
spec:
rules:
- host: my-agent.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-agent
port:
number: 80
---
# k8s/certificate.yaml
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
name: my-agent-cert
namespace: agent-system
spec:
domains:
- my-agent.example.com
---
# k8s/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: agent-secrets
namespace: agent-system
type: Opaque
stringData:
redis-url: "redis://10.0.0.3:6379/0"
google-api-key: "your-api-key"
部署命令
# 应用所有配置
kubectl apply -f k8s/
# 查看部署状态
kubectl get deployments -n agent-system
kubectl get pods -n agent-system -w
# 查看 Pod 日志
kubectl logs -f deployment/my-agent -n agent-system
# 查看 HPA 状态
kubectl get hpa -n agent-system
# 手动扩缩容
kubectl scale deployment my-agent --replicas=5 -n agent-system
# 滚动更新
kubectl set image deployment/my-agent agent=gcr.io/$PROJECT_ID/my-agent:v1.2.4 -n agent-system
# 回滚
kubectl rollout undo deployment/my-agent -n agent-system
# 查看 rollout 历史
kubectl rollout history deployment/my-agent -n agent-system
监控配置:云原生可观测性
Cloud Logging(结构化日志)
import (
"cloud.google.com/go/logging"
"cloud.google.com/go/logging/logadmin"
)
func setupCloudLogging(ctx context.Context, projectID string) (*logging.Logger, error) {
client, err := logging.NewClient(ctx, projectID)
if err != nil {
return nil, err
}
logger := client.Logger("my-agent", logging.CommonResource(
&monitoredres.MonitoredResource{
Type: "cloud_run_revision",
Labels: map[string]string{
"service_name": "my-agent",
"revision_name": os.Getenv("K_REVISION"),
},
},
))
return logger, nil
}
// 使用 Cloud Logging
func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
// 记录结构化日志
h.logger.Log(logging.Entry{
Severity: logging.Info,
Payload: map[string]interface{}{
"message": "request received",
"request_id": r.Header.Get("X-Request-ID"),
"user_id": r.Header.Get("X-User-ID"),
"path": r.URL.Path,
"method": r.Method,
"remote_addr": r.RemoteAddr,
},
Trace: r.Header.Get("X-Cloud-Trace-Context"), // 关联 Trace
})
// ... 处理请求 ...
}
Cloud Monitoring(Metrics)
import (
"cloud.google.com/go/monitoring/apiv3/v2/monitoringpb"
"google.golang.org/protobuf/types/known/metricpb"
"google.golang.org/protobuf/types/known/timestamppb"
)
func recordMetric(ctx context.Context, client *monitoring.MetricClient, value float64) error {
req := &monitoringpb.CreateTimeSeriesRequest{
Name: fmt.Sprintf("projects/%s", projectID),
TimeSeries: []*monitoringpb.TimeSeries{
{
Metric: &metricpb.Metric{
Type: "custom.googleapis.com/agent/request_latency",
Labels: map[string]string{
"service": "my-agent",
},
},
Resource: &monitoredres.MonitoredResource{
Type: "global",
},
Points: []*monitoringpb.Point{
{
Interval: &monitoringpb.TimeInterval{
EndTime: timestamppb.New(time.Now()),
},
Value: &monitoringpb.TypedValue{
Value: &monitoringpb.TypedValue_DoubleValue{
DoubleValue: value,
},
},
},
},
},
},
}
return client.CreateTimeSeries(ctx, req)
}
Cloud Trace(分布式追踪)
import (
"contrib.go.opencensus.io/exporter/stackdriver"
"go.opencensus.io/trace"
)
func initTracing(projectID string) error {
exporter, err := stackdriver.NewExporter(stackdriver.Options{
ProjectID: projectID,
})
if err != nil {
return err
}
trace.RegisterExporter(exporter)
trace.ApplyConfig(trace.Config{
DefaultSampler: trace.ProbabilitySampler(0.1), // 10% 采样率
})
return nil
}
// 在请求处理中使用
func handleRequest(ctx context.Context, req *Request) {
ctx, span := trace.StartSpan(ctx, "agent.handleRequest")
defer span.End()
span.AddAttributes(
trace.StringAttribute("request_id", req.ID),
trace.StringAttribute("user_id", req.UserID),
)
// 子 span:LLM 调用
ctx, llmSpan := trace.StartSpan(ctx, "llm.call")
resp, err := llmClient.Generate(ctx, req.Input)
llmSpan.End()
if err != nil {
span.SetStatus(trace.Status{Code: trace.StatusCodeInternal, Message: err.Error()})
return
}
span.AddAttributes(trace.Int64Attribute("response_length", int64(len(resp))))
}
常见问题深度排查
Q:Cloud Run 冷启动慢
根本原因分析:
- 容器镜像太大,拉取时间长
- 应用初始化逻辑太重
- 没有配置 min-instances,请求到达时才启动
- 使用了第一代执行环境
优化方案:
# 1. 配置最小实例数(成本 vs 延迟的权衡)
gcloud run services update my-agent --min-instances 1
# 2. 使用第二代执行环境
gcloud run services update my-agent --execution-environment gen2
# 3. 优化镜像大小(使用 distroless)
# 4. 延迟初始化非关键组件
# 5. 使用 startupProbe 确保就绪
成本影响:
min-instances=0:无请求时 0 成本,但有冷启动延迟(5-30s)min-instances=1:始终有 1 个实例运行,无冷启动,但持续计费min-instances=3:适合高可用场景,成本是min-instances=1的 3 倍
Q:GKE Pod 无法调度
诊断:
# 查看 Pod 状态
kubectl describe pod my-agent-xxx -n agent-system
# 查看事件
kubectl get events -n agent-system --sort-by='.lastTimestamp'
# 查看节点资源
kubectl top nodes
kubectl describe node
# 查看 Pending Pod 的原因
kubectl get pods -n agent-system -o wide
常见原因与解决:
| 原因 | 症状 | 解决 |
|---|---|---|
| 资源不足 | Insufficient cpu / Insufficient memory | 扩容节点池或降低 requests |
| 镜像拉取失败 | ImagePullBackOff | 检查镜像名、权限、网络 |
| 节点污点 | PodToleratesNodeTaints | 添加 tolerations |
| 亲和性冲突 | MatchNodeSelector | 调整 affinity 配置 |
| PVC 未绑定 | UnboundImmediatePVC | 检查 StorageClass |
Q:密钥怎么安全传递
方案对比:
| 方案 | 安全性 | 复杂度 | 适用场景 |
|---|---|---|---|
| 环境变量 | 低 | 低 | 开发测试 |
| Kubernetes Secret | 中 | 中 | GKE 基础 |
| Secret Manager | 高 | 中 | Cloud Run 推荐 |
| Workload Identity | 最高 | 高 | GKE 生产 |
| HashiCorp Vault | 最高 | 高 | 多云环境 |
Workload Identity 配置(GKE 最佳实践):
# 1. 创建 GCP 服务账号
gcloud iam service-accounts create my-agent-sa \
--display-name="My Agent Service Account"
# 2. 授予 Secret Manager 访问权限
gcloud secrets add-iam-policy-binding google-api-key \
--member="serviceAccount:my-agent-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
# 3. 绑定 K8s ServiceAccount 到 GCP ServiceAccount
gcloud iam service-accounts add-iam-policy-binding \
my-agent-sa@$PROJECT_ID.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:$PROJECT_ID.svc.id.goog[agent-system/my-agent]"
# 4. 在 K8s 中配置 ServiceAccount(见上面的 serviceaccount.yaml)
成本优化策略
Cloud Run 成本优化
# 1. 设置最大实例数限制
gcloud run services update my-agent --max-instances 10
# 2. 优化并发数(每个实例处理更多请求)
gcloud run services update my-agent --concurrency 100
# 3. 合理配置内存(不要过度配置)
gcloud run services update my-agent --memory 512Mi
# 4. 使用请求超时控制(防止长时间占用)
gcloud run services update my-agent --timeout 300
GKE 成本优化
# 使用 Spot 实例(可节省 60-90%)
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
nodeSelector:
cloud.google.com/gke-spot: "true"
tolerations:
- key: cloud.google.com/gke-spot
operator: Equal
value: "true"
effect: NoSchedule
# 启用集群自动扩缩容
gcloud container clusters update my-cluster \
--enable-autoscaling \
--min-nodes 1 \
--max-nodes 10
# 使用 e2-medium 机器类型(性价比最高)
gcloud container node-pools create spot-pool \
--cluster my-cluster \
--machine-type e2-medium \
--spot \
--num-nodes 1 \
--enable-autoscaling \
--min-nodes 0 \
--max-nodes 10
小结
模块 7 完成。学习了:
- Agent Runtime 架构的深层机制
- CLI 部署的完整生产实践
- Web 界面部署的高可用方案
- Docker 容器化的安全与优化
- Cloud Run / GKE 的云端部署策略
接下来进入模块 8:A2A 协议——Agent 之间如何通信。
想跟着学更多 Go ADK 实战?关注「全栈之巅-梦兽编程」公众号,每周更新 Go / AI 编程实战干货。
