REFERENCE ARCHITECTURE
AI inference workload
Serve models via Compute (self-hosted) or Tesserra managed agents — cache with Datastore Redis, audit in Postgres, datasets in Archive.
AdvancedPlan: Business+ / EnterpriseResilience: multi-zona
Diagram
Use cases
- Customer support copilot
- Document Q&A
- Vertical regulated agents
Components
| Layer | Role | Product |
|---|---|---|
| Perimeter | Prompt injection / abuse filtering | Sentinel |
| Edge | Public chat endpoint | Gateway |
| Inference | Self-hosted model serving | Compute + Autoscale |
| Managed AI | Tesserra agent API | Compute (agents) |
| Network | Private east-west | Fabric |
| Cache | Prompt/response cache | Datastore (Redis) |
| Audit | Compliance log store | Datastore (PostgreSQL) |
| Knowledge | RAG document store | Archive |
| Operations | Token and latency SLOs | Insight |
Products
Compute
View product docs →Datastore
View product docs →Archive
View product docs →Gateway
View product docs →Autoscale
View product docs →Insight
View product docs →When to use
- Customer-facing chat, copilots, document Q&A.
- Vertical agents (health, legal, fiscal) with audit requirements.
- GPU-backed inference on dedicated Enterprise nuclei.
When to avoid
- Batch-only offline training — use Archive + external training cluster.
- Sub-100ms latency without edge cache and warm replicas.
Design notes
Two inference paths
Option A: your model in a Compute container (vLLM, Ollama, TGI). Option B: Tesserra agents via API key with token billing.
Cache layer
Datastore Redis keyed by (model, prompt_hash) cuts cost and latency for repeated queries.
Compliance
Log every prompt/response to a dedicated Datastore. Set Insight retention to match regulatory requirements.
Blueprint
Use this JSON as a starting point when creating a project via the Tesserra API or console. Replace image URLs, domains, and resource references with your values.
json
{
"nome": "AI Chat",
"recursos": [
{
"tipo": "compute",
"nome": "inference",
"config": {
"subtipo": "api",
"porta": 8000,
"cpu": "2.0",
"memoria_mb": 4096,
"replicas_min": 2
}
},
{
"tipo": "datastore",
"nome": "cache",
"config": {
"engine": "redis",
"tamanho_gb": 5
}
},
{
"tipo": "datastore",
"nome": "audit",
"config": {
"engine": "postgres",
"tamanho_gb": 50,
"ha": true
}
},
{
"tipo": "archive",
"nome": "datasets",
"config": {
"subtipo": "arquivos",
"tamanho_gb": 100,
"publico": false
}
},
{
"tipo": "gateway",
"nome": "dns",
"config": {
"dominio": "chat.acme.io",
"tls": true,
"alvo_recurso_id": "inference"
}
},
{
"tipo": "autoscale",
"nome": "scale",
"config": {
"alvo_recurso_id": "inference",
"metrica": "requests",
"limiar": 70
}
},
{
"tipo": "insight",
"nome": "obs",
"config": {
"retencao_dias": 90,
"alertas": true
}
}
]
}Related
Event-driven integration
Asynchronous workflows with Conduit (Kafka), Lane (RabbitMQ), and Beacon (NATS) — decouple producers from consumers and scale workers independently.
Multi-region active-active
Compute replicas across continental Zones, Gateway geo-routing, asynchronous or synchronous Datastore replication — Business and Enterprise tiers.
Secure edge ingress
Defense-in-depth at the perimeter — Sentinel WAF, Gateway TLS termination, Fabric isolation, and Insight security monitoring.