Skip to content

Observability

cronix trigger emits an OpenTelemetry trace per fire. The trace shape is locked by D-037; this page is the operator-facing reference. Wire any OTLP backend (Honeycomb, Tempo, Datadog, Jaeger, an OpenTelemetry Collector) and you get a coherent picture of fire-and-handler behavior without writing any glue.

Enabling traces

Terminal window
cronix trigger --otel <app>.<job>

The flag opts into OTel emission. Configuration follows the standard OTel SDK environment variables:

Terminal window
export OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.example.com
export OTEL_EXPORTER_OTLP_HEADERS="authorization=Bearer ..."
export OTEL_SERVICE_NAME=cronix-trigger
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=prod,service.namespace=ops"

Defaults are sane: if the endpoint env var is unset, --otel becomes a no-op and the shim runs as if the flag weren’t there.

Trace shape

Every fire produces one root span and N child spans, where N is the number of HTTP attempts the retry policy ran (1 for success, up to max_attempts for retries-exhausted).

cronix.trigger.fire (root)
├─ cronix.trigger.lock (only if concurrency_scope: global)
├─ cronix.trigger.attempt (attempt 1)
├─ cronix.trigger.attempt (attempt 2 — if retried)
└─ cronix.trigger.attempt (attempt N)

Host-scope flock acquisitions add a span event (cronix.lock.acquired) to the root span instead of a child span — they’re <1ms and don’t warrant the storage overhead.

cronix.trigger.fire — the root span

Covers the full fire, lock acquisition through final HTTP response (or retry exhaustion).

AttributeTypeNotes
cronix.appstringApp ID from the job spec
cronix.jobstringJob name from the job spec
cronix.run_idstringUUIDv7, constant across retry attempts (matches the Cronix-Run-Id HTTP header)
cronix.schedulestringThe 5-field cron expression that fired
cronix.intended_fire_timeRFC3339When the host scheduler intended to fire
cronix.actual_fire_timeRFC3339When the shim actually started
cronix.backendstringcrontab / systemd-timer / kubernetes / aws-scheduler / vercel
cronix.timeout_secondsintPer-attempt HTTP timeout
cronix.concurrency_policystringAllow / Forbid / Replace
cronix.concurrency_scopestringhost / global
cronix.max_attemptsintFrom the retry policy
cronix.outcomestringsuccess / app_rejected / retries_exhausted / lock_contended / internal_error (set on span end)
cronix.attempts_madeintFinal attempt count

Status: OK if outcome=success, otherwise ERROR.

cronix.trigger.attempt — one per HTTP attempt

AttributeTypeNotes
cronix.attemptint1-indexed
http.request.methodstringPer HTTP semconv
url.fullstringPer HTTP semconv
http.response.status_codeintOnce the response arrives
cronix.retry_reasonstring5xx / network / timeout — set when this attempt was followed by a retry
cronix.backoff_secondsfloatSleep before this attempt; 0 for attempt 1

Status: OK for 2xx, ERROR otherwise. 4xx is ERROR because it terminates the fire (no retry).

cronix.trigger.lock — only when concurrency_scope: global

AttributeTypeNotes
cronix.lock.backendstringredis (v1; pluggable in v2)
cronix.lock.scopestringalways global
cronix.lock.keystringcronix:lock:<app>:<job>
cronix.lock.outcomestringacquired / contended
cronix.lock.ttl_secondsintSet to the job’s timeout_seconds

Status: OK on acquired, ERROR on contended (propagates to root as outcome=lock_contended).

Span events

Short-lived steps that don’t warrant child spans:

EventWhereAttributes
cronix.lock.acquiredroot span (host scope only)cronix.lock.scope=host, cronix.lock.duration_ms
cronix.sign.completedeach cronix.trigger.attempt span(none — signing is consistently <1ms)
cronix.secrets.resolvedroot spancronix.secrets.count

Propagation

Every outbound HTTP attempt carries a W3C traceparent header. If your handler is OTel-instrumented, its spans chain naturally off cronix.trigger.attempt:

cronix.trigger.fire
└─ cronix.trigger.attempt ← your handler's incoming-request span chains here
└─ your handler's app spans
├─ db.query
└─ external.api.call

The shim does not extract a traceparent from anywhere — the host scheduler isn’t OTel-aware, so there’s no inbound trace context to propagate.

Querying examples

The attribute set is designed so adopters can answer common ops questions with one or two filters.

“Show me every failed fire for app billing-service in the last hour.”

{ cronix.app = "billing-service" AND cronix.outcome != "success" }

“Which jobs are getting lock_contended consistently?”

{ cronix.outcome = "lock_contended" } | group by cronix.app, cronix.job, count()

“Which retry policies are inadequate?” (jobs hitting retries-exhausted regularly)

{ cronix.outcome = "retries_exhausted" } | group by cronix.app, cronix.job, count()

“Trace this specific fire end-to-end” (from the structured log’s run_id)

{ cronix.run_id = "<uuid-from-log>" }

The cronix.run_id is the join key between structured logs and OTel traces. If you’re using the same observability platform for both, set up a derived link from log lines containing run_id=<value> to the trace search above.

Cross-language consistency

The attribute set defined here is the contract that every cronix-compatible SDK must use when emitting traces. The TypeScript SDK (@awbx/cronix-sdk) emits identical attributes from its in-process trigger path; future Rust / Python / Java SDKs do the same. A query filtering on cronix.app = "billing-service" finds fires regardless of which SDK / language emitted them.

Implementation status

Shipped in cronix trigger --otel since v0.11.0. Pass --backend <name> to populate the cronix.backend attribute; the host scheduler invoking the shim should set it (e.g., the Helm chart sets --backend=kubernetes, the systemd unit sets --backend=systemd-timer).

The TypeScript SDK’s in-process trigger path emits the same trace shape — adopters running cronix as a library, not a separate process, get identical attributes and naturally chain into the same OTel pipeline.

Going deeper