OpenTelemetry Provider (OTLP Traces + Metrics) 该 Provider 基于 OpenTelemetry Go SDK,初始化全局 Tracer 与 Meter,支持 OTLP(gRPC/HTTP) 导出,并收集运行时指标。 配置(config.toml) ``` [OTEL] ServiceName = "my-service" Version = "1.0.0" Env = "dev" # 导出端点(二选一) EndpointGRPC = "otel-collector:4317" EndpointHTTP = "otel-collector:4318" # 认证(可选) Token = "Bearer " # 也可只填纯 token,Provider 会自动补齐 Bearer 前缀 # 安全(可选) InsecureGRPC = true # gRPC 导出是否使用 insecure InsecureHTTP = true # HTTP 导出是否使用 insecure # 采样(可选) Sampler = "always" # always|ratio SamplerRatio = 0.1 # Sampler=ratio 时生效,0..1 # 批处理(可选,毫秒) BatchTimeoutMs = 5000 ExportTimeoutMs = 10000 MaxQueueSize = 2048 MaxExportBatchSize = 512 # 指标(可选,毫秒) MetricReaderIntervalMs = 10000 # 指标导出周期 RuntimeReadMemStatsIntervalMs = 5000 # 运行时指标读取周期 ``` 启用 ``` import "test/providers/otel" func providers() container.Providers { return container.Providers{ otel.DefaultProvider(), } } ``` 使用 - Traces: 通过 `go.opentelemetry.io/otel` 获取全局 Tracer,或使用仓库提供的 `providers/otel/funcs.go` 包装。 ``` ctx, span := otel.Tracer("my-service").Start(ctx, "my-op") // ... span.End() ``` - Metrics: 通过 `otel.Meter("my-service")` 创建仪表,或使用 `providers/otel/funcs.go` 的便捷函数。 与 Tracing Provider 的区别与场景建议 - Tracing Provider(Jaeger + OpenTracing)只做链路,适合已有 OpenTracing 项目; - OTEL Provider(OpenTelemetry)统一 Traces+Metrics,对接 OTLP 生态,适合新项目或希望统一可观测性; - 可先混用:保留 Jaeger 链路,同时启用 OTEL 运行时指标,逐步迁移。 快速启动(本地 Collector) 最小化 docker-compose: ``` services: otel-collector: image: otel/opentelemetry-collector:0.104.0 command: ["--config=/etc/otelcol-config.yml"] volumes: - ./otelcol-config.yml:/etc/otelcol-config.yml:ro ports: - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP ``` 示例 otelcol-config.yml: ``` receivers: otlp: protocols: grpc: http: exporters: debug: verbosity: detailed processors: batch: service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [debug] metrics: receivers: [otlp] processors: [batch] exporters: [debug] ``` 应用端: ``` [OTEL] EndpointGRPC = "127.0.0.1:4317" InsecureGRPC = true ``` 故障与降级 - Collector/网络异常:OTEL SDK 异步批处理,不阻塞业务;可能丢点/丢指标; - 启动失败:初始化报错会阻止启动;如需“不可达也不影响启动”,可加开关降级为 no-op(可按需补充)。