Observability
The **Health** console (App → Health) aggregates metrics and SLA for each provisioned Project. Project, Organization, and platform expose health endpoints with the same semantics.
Health endpoints
- Project ·
/api/projetos/{id}/saude— Project metrics and SLA. - Organization ·
/api/tenants/atual/saude— aggregate of the Organization's Projects. - Platform ·
/api/admin/saude— consolidated view. Restricted to Tesserra operators.
Exposed metrics
Available metrics (typical 60 min window unless noted): cpu (%), ram (%), rps (req/s), p50/p95/p99 (ms), erros (%), sla_mes (%), sla_24h (%), uptime (ratio).
Current telemetry is derived from the provisioning plan and external probes. Prometheus and OpenTelemetry integration for metrics directly from Compute is planned.
Reading example
curl https://tesserra.io/api/tenants/atual/saude \ -H "Authorization: Bearer $TESSERRA_TOKEN"
{
"tenant": "acme",
"projetos": [
{
"id": "...",
"nome": "Loja Acme",
"estado": "provisionado",
"saude": {
"cpu": 32.4,
"ram": 51.2,
"rps": 142,
"p50": 28, "p95": 96, "p99": 220,
"erros": 0.12,
"sla_mes": 99.97,
"uptime": 1.0
},
"zonas": ["uk-wlv-1", "de-ham-1"]
}
],
"agregado": {
"incidentes_abertos": 0,
"ultima_atualizacao": "2026-05-28T20:31:14Z"
}
}Alerts
Insight resources with alertas: true email Organization managers when:
- errors > 1% for more than 5 minutes;
- p95 above 2× the week's baseline, sustained for 10 minutes;
- a Compute is unavailable for 60 consecutive seconds;
- monthly SLA falls below the plan commitment.
Custom webhooks (Slack, Discord, Opsgenie, PagerDuty) are planned.
Audit
Sensitive actions are stored in auth_eventos (timestamp, user, IP, detail). Managers export CSV at App → Audit; platform admin at Admin → Audit.