REFERENCE ARCHITECTURE

AI inference workload

Serve models via Compute (self-hosted) or Tesserra managed agents — cache with Datastore Redis, audit in Postgres, datasets in Archive.

AdvancedPlan: Business+ / EnterpriseResilience: multi-zona

Diagram

Use cases

Customer support copilot
Document Q&A
Vertical regulated agents

Components

Layer	Role	Product
Perimeter	Prompt injection / abuse filtering	Sentinel
Edge	Public chat endpoint	Gateway
Inference	Self-hosted model serving	Compute + Autoscale
Managed AI	Tesserra agent API	Compute (agents)
Network	Private east-west	Fabric
Cache	Prompt/response cache	Datastore (Redis)
Audit	Compliance log store	Datastore (PostgreSQL)
Knowledge	RAG document store	Archive
Operations	Token and latency SLOs	Insight

Products

Compute

View product docs →

Datastore

View product docs →

When to use

Customer-facing chat, copilots, document Q&A.
Vertical agents (health, legal, fiscal) with audit requirements.
GPU-backed inference on dedicated Enterprise nuclei.

When to avoid

Batch-only offline training — use Archive + external training cluster.
Sub-100ms latency without edge cache and warm replicas.

Design notes

Two inference paths

Option A: your model on a Compute Linux instance (vLLM, Ollama, TGI). Option B: Tesserra agents via API key with token billing.

Cache layer

Datastore Redis keyed by (model, prompt_hash) cuts cost and latency for repeated queries.

Compliance

Log every prompt/response to a dedicated Datastore. Set Insight retention to match regulatory requirements.

Blueprint

Use this JSON as a starting point when creating a project via the Tesserra API or console. Replace image URLs, domains, and resource references with your values.

json

{
  "nome": "AI Chat",
  "recursos": [
    {
      "tipo": "compute",
      "nome": "inference",
      "config": {
        "subtipo": "api",
        "porta": 8000,
        "cpu": "2.0",
        "memoria_mb": 4096,
        "replicas_min": 2
      }
    },
    {
      "tipo": "datastore",
      "nome": "cache",
      "config": {
        "engine": "redis",
        "tamanho_gb": 5
      }
    },
    {
      "tipo": "datastore",
      "nome": "audit",
      "config": {
        "engine": "postgres",
        "tamanho_gb": 50,
        "ha": true
      }
    },
    {
      "tipo": "archive",
      "nome": "datasets",
      "config": {
        "subtipo": "arquivos",
        "tamanho_gb": 100,
        "publico": false
      }
    },
    {
      "tipo": "gateway",
      "nome": "dns",
      "config": {
        "dominio": "chat.acme.io",
        "tls": true,
        "alvo_recurso_id": "inference"
      }
    },
    {
      "tipo": "autoscale",
      "nome": "scale",
      "config": {
        "alvo_recurso_id": "inference",
        "metrica": "requests",
        "limiar": 70
      }
    },
    {
      "tipo": "insight",
      "nome": "obs",
      "config": {
        "retencao_dias": 90,
        "alertas": true
      }
    }
  ]
}

Event-driven integration

Asynchronous workflows with Conduit (Kafka), Lane (RabbitMQ), and Beacon (NATS) — decouple producers from consumers and scale workers independently.

Multi-region active-active

Compute replicas across continental Zones, Gateway geo-routing, asynchronous or synchronous Datastore replication — Business and Enterprise tiers.

Secure edge ingress

Defense-in-depth at the perimeter — Sentinel WAF, Gateway TLS termination, Fabric isolation, and Insight security monitoring.