OpenTelemetry Observability

Karrio includes built-in support for OpenTelemetry, providing distributed tracing, metrics collection, and log correlation across all services. This enables comprehensive monitoring and debugging of your shipping operations.

Overview

OpenTelemetry automatically instruments Karrio to provide:

Distributed Tracing: Follow requests across API and worker services
Performance Metrics: HTTP response times, database query performance, error rates
Log Correlation: Logs automatically include trace and span IDs for easy debugging
Custom Spans: Business logic operations like rate calculations and carrier API calls

Supported Backends

Karrio can export telemetry data to various OpenTelemetry-compatible backends:

Distributed Tracing Systems

Jaeger: Popular open-source distributed tracing system
- gRPC endpoint: http://jaeger:4317
- HTTP endpoint: http://jaeger:4318
Zipkin: Another popular distributed tracing system
- Endpoint: http://zipkin:9411/api/v2/spans
Grafana Tempo: Cloud-native distributed tracing backend
- OTLP endpoint: http://tempo:4317
- Native OTLP support with excellent Grafana integration

Metrics and Monitoring

Prometheus: Time-series metrics database via OTLP metrics export
- Use with OpenTelemetry Collector configured for Prometheus export
- Endpoint: http://otel-collector:4317 → Prometheus scrape endpoint
Grafana: Visualization platform supporting multiple data sources
- Connect to Prometheus for metrics visualization
- Connect to Jaeger/Tempo for distributed tracing
- Unified observability dashboard

Logging Systems

Loki: Log aggregation system by Grafana Labs
- Collects logs with trace correlation via OpenTelemetry logging instrumentation
- Perfect integration with Grafana for log visualization
Elasticsearch/ELK Stack: Enterprise logging solution
- Use OpenTelemetry Collector with Elasticsearch exporter
- Full-text search and log analytics

Error Tracking and APM

Sentry: Error tracking and performance monitoring
- Already integrated in Karrio + supports OTLP traces
- Automatic error correlation with distributed traces
OTLP Collector: OpenTelemetry Collector for routing to multiple backends
- Central hub for telemetry data routing and processing
- Default endpoint: http://otel-collector:4317

Cloud Providers

AWS:
- AWS X-Ray: Use OTLP collector with X-Ray exporter
- CloudWatch: Metrics and logs via OTLP collector
Google Cloud:
- Cloud Trace: Configure with appropriate headers
- Cloud Monitoring: Metrics via OTLP protocol
Azure:
- Application Insights: Use connection string for OTLP export
- Azure Monitor: Full observability suite

Quick Setup Options

Option 1: Jaeger (Simple Tracing)

Perfect for development and getting started with distributed tracing.

1. Start Jaeger:

1docker compose -f docker-compose.yml -f docker-compose.otel.yml up -d

2. Configure Karrio in .env:

1OTEL_ENABLED=true
2OTEL_SERVICE_NAME=karrio-api
3OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
4OTEL_EXPORTER_OTLP_PROTOCOL=grpc
5OTEL_ENVIRONMENT=development

3. Access Jaeger UI:

URL: http://localhost:16686
Select service: karrio-api
View traces and performance data

Option 2: Complete Grafana Stack

Full observability solution with metrics, logs, and tracing.

1. Start the full stack:

1docker compose -f docker-compose.yml -f docker-compose.observability.yml up -d

2. Configuration (automatic via Docker Compose):

OpenTelemetry Collector routes data to all backends
Karrio automatically configured to send to collector

3. Access the dashboards:

Grafana: http://localhost:3000 (admin/admin)
Prometheus: http://localhost:9090
Tempo: http://localhost:3200
Loki: http://localhost:3100

Configuration Examples

Basic Configuration

Enable OpenTelemetry with Jaeger:

Required settings
1OTEL_ENABLED=true
2OTEL_SERVICE_NAME=karrio-api
3OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317
4
5# Optional settings
6OTEL_EXPORTER_OTLP_PROTOCOL=grpc
7OTEL_ENVIRONMENT=production
8OTEL_RESOURCE_ATTRIBUTES=team=logistics,region=us-west-2

Grafana Stack Configuration

Route all telemetry through OpenTelemetry Collector:

OpenTelemetry with Grafana Stack via OTLP Collector
1OTEL_ENABLED=true
2OTEL_SERVICE_NAME=karrio-api
3OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
4OTEL_EXPORTER_OTLP_PROTOCOL=grpc
5OTEL_ENVIRONMENT=production
6OTEL_RESOURCE_ATTRIBUTES=team=logistics,region=us-west-2

Metrics-Only Configuration

Send only metrics to Prometheus:

Direct Prometheus metrics export via OTLP Collector
1OTEL_ENABLED=true
2OTEL_SERVICE_NAME=karrio-api
3OTEL_EXPORTER_OTLP_ENDPOINT=http://prometheus-otel-collector:4317
4OTEL_METRICS_EXPORTER=otlp
5# Disable traces if only metrics needed
6OTEL_TRACES_EXPORTER=none

Grafana Tempo Configuration

Direct tracing to Grafana Tempo:

Direct to Grafana Tempo
1OTEL_ENABLED=true
2OTEL_SERVICE_NAME=karrio-api
3OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317
4OTEL_EXPORTER_OTLP_PROTOCOL=grpc
5OTEL_ENVIRONMENT=production

Sentry with OpenTelemetry

Combine existing Sentry integration with OTLP traces:

Sentry already configured + OTLP traces
1SENTRY_DSN=https://your-dsn@sentry.io/project-id
2OTEL_ENABLED=true
3OTEL_SERVICE_NAME=karrio-api
4OTEL_EXPORTER_OTLP_ENDPOINT=https://your-org.sentry.io/api/project-id/envelope/
5OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer your-auth-token

Cloud Provider Configuration

Example for cloud providers requiring authentication:

AWS X-Ray via OTLP Collector
1OTEL_ENABLED=true
2OTEL_EXPORTER_OTLP_ENDPOINT=http://aws-otel-collector:4317
3OTEL_RESOURCE_ATTRIBUTES=aws.region=us-west-2,service.namespace=karrio
4
5# Google Cloud Trace
6OTEL_ENABLED=true
7OTEL_EXPORTER_OTLP_ENDPOINT=https://cloudtrace.googleapis.com:443
8OTEL_EXPORTER_OTLP_HEADERS=x-goog-api-key=your-api-key
9
10# Generic cloud provider with authentication
11OTEL_ENABLED=true
12OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.your-provider.com:4317
13OTEL_EXPORTER_OTLP_HEADERS=api-key=your-api-key,tenant-id=your-tenant

What Gets Traced

When OpenTelemetry is enabled, Karrio automatically instruments:

HTTP Operations

Incoming API Requests: All REST and GraphQL endpoints
Response Times: Request duration and status codes
Client IP and User Agent: Request metadata
Error Tracking: Failed requests with stack traces

Database Operations

SQL Queries: PostgreSQL queries with execution time
Connection Pool: Database connection metrics
Query Parameters: Parameterized queries (sanitized)
Transaction Tracking: Database transaction spans

Cache Operations

Redis Commands: Cache hits/misses and command execution
Cache Keys: Key patterns and access frequency
Performance Metrics: Cache response times

Background Tasks

Task Processing: Background job execution times
Queue Metrics: Task queue depth and processing rates
Error Tracking: Failed background tasks
Context Propagation: Traces follow from API to worker

External API Calls

Shipping Carriers: API calls to UPS, FedEx, USPS, etc.
Rate Shopping: Multiple carrier rate requests
Webhooks: Outbound webhook delivery
Third-party Services: External service integrations

Business Logic

Rate Calculations: Shipping cost computation spans
Address Validation: Address verification operations
Document Generation: Label and invoice generation
Workflow Processing: Multi-step shipping operations

Viewing and Using Traces

Jaeger UI

Service Selection: Choose karrio-api or karrio-worker
Operation Filtering: Filter by endpoint or operation type
Time Range: Select time period to analyze
Trace Analysis: Click traces to see detailed span breakdown
Performance Insights: Identify slow operations and bottlenecks

Grafana Dashboard

Service Overview: High-level service health metrics
Distributed Tracing: Tempo integration for trace exploration
Log Correlation: Loki logs automatically linked to traces
Custom Dashboards: Create business-specific monitoring views
Alerting: Set up alerts on SLA violations or error rates

Troubleshooting with Traces

High Response Times:

Identify slow database queries
Find external API bottlenecks
Analyze request processing pipeline

Error Investigation:

Trace error propagation across services
Correlate errors with specific user actions
Identify root cause in distributed operations

Performance Optimization:

Find most expensive operations
Identify N+1 query problems
Optimize carrier API usage patterns

Production Considerations

Sampling

For high-traffic deployments, configure sampling to reduce overhead:

Sample 10% of traces in production
1OTEL_TRACES_SAMPLER=traceidratio
2OTEL_TRACES_SAMPLER_ARG=0.1

Resource Attributes

Add meaningful metadata for production monitoring:

1OTEL_RESOURCE_ATTRIBUTES=\
2service.namespace=karrio,\
3service.instance.id=api-01,\
4deployment.environment=production,\
5team=logistics,\
6region=us-west-2,\
7cluster=prod-west

Security

Ensure secure transmission of telemetry data:

Use TLS for production endpoints
1OTEL_EXPORTER_OTLP_ENDPOINT=https://secure-collector.company.com:4317
2
3# Include authentication headers
4OTEL_EXPORTER_OTLP_HEADERS=authorization=Bearer your-secure-token

Performance Impact

OpenTelemetry is designed to have minimal performance impact:

CPU Overhead: < 5% in typical workloads
Memory Overhead: ~50MB for instrumentation libraries
Network: Batched export reduces bandwidth usage
Sampling: Reduces overhead in high-traffic scenarios

This comprehensive OpenTelemetry integration provides full visibility into your Karrio deployment, enabling proactive monitoring, fast troubleshooting, and performance optimization of your shipping operations.

Getting Started

Deployments

Guides

OpenTelemetry Observability

Overview

Supported Backends

Distributed Tracing Systems

Metrics and Monitoring

Logging Systems

Error Tracking and APM

Cloud Providers

Quick Setup Options

Option 1: Jaeger (Simple Tracing)

Option 2: Complete Grafana Stack

Configuration Examples

Basic Configuration

Grafana Stack Configuration

Metrics-Only Configuration

Grafana Tempo Configuration

Sentry with OpenTelemetry

Cloud Provider Configuration

What Gets Traced

HTTP Operations

Database Operations

Cache Operations

Background Tasks

External API Calls

Business Logic

Viewing and Using Traces

Jaeger UI

Grafana Dashboard

Troubleshooting with Traces

Production Considerations

Sampling

Resource Attributes

Security

Performance Impact

Getting Started

Deployments

Guides

OpenTelemetry ObservabilityCopy for LLM

Overview

Supported Backends

Distributed Tracing Systems

Metrics and Monitoring

Logging Systems

Error Tracking and APM

Cloud Providers

Quick Setup Options

Option 1: Jaeger (Simple Tracing)

Option 2: Complete Grafana Stack

Configuration Examples

Basic Configuration

Grafana Stack Configuration

Metrics-Only Configuration

Grafana Tempo Configuration

Sentry with OpenTelemetry

Cloud Provider Configuration

What Gets Traced

HTTP Operations

Database Operations

Cache Operations

Background Tasks

External API Calls

Business Logic

Viewing and Using Traces

Jaeger UI

Grafana Dashboard

Troubleshooting with Traces

Production Considerations

Sampling

Resource Attributes

Security

Performance Impact

OpenTelemetry Observability