REFERENCE ARCHITECTURE

AI inference workload

Serve models via Compute (self-hosted) or Tesserra managed agents — cache with Datastore Redis, audit in Postgres, datasets in Archive.

AdvancedPlan: Business+ / EnterpriseResilience: multi-zona

Diagram

Use cases

  • Customer support copilot
  • Document Q&A
  • Vertical regulated agents

Components

LayerRoleProduct
PerimeterPrompt injection / abuse filteringSentinel
EdgePublic chat endpointGateway
InferenceSelf-hosted model servingCompute + Autoscale
Managed AITesserra agent APICompute (agents)
NetworkPrivate east-westFabric
CachePrompt/response cacheDatastore (Redis)
AuditCompliance log storeDatastore (PostgreSQL)
KnowledgeRAG document storeArchive
OperationsToken and latency SLOsInsight

Products

When to use

  • Customer-facing chat, copilots, document Q&A.
  • Vertical agents (health, legal, fiscal) with audit requirements.
  • GPU-backed inference on dedicated Enterprise nuclei.

When to avoid

  • Batch-only offline training — use Archive + external training cluster.
  • Sub-100ms latency without edge cache and warm replicas.

Design notes

Two inference paths

Option A: your model in a Compute container (vLLM, Ollama, TGI). Option B: Tesserra agents via API key with token billing.

Cache layer

Datastore Redis keyed by (model, prompt_hash) cuts cost and latency for repeated queries.

Compliance

Log every prompt/response to a dedicated Datastore. Set Insight retention to match regulatory requirements.

Blueprint

Use this JSON as a starting point when creating a project via the Tesserra API or console. Replace image URLs, domains, and resource references with your values.

json
{
  "nome": "AI Chat",
  "recursos": [
    {
      "tipo": "compute",
      "nome": "inference",
      "config": {
        "subtipo": "api",
        "porta": 8000,
        "cpu": "2.0",
        "memoria_mb": 4096,
        "replicas_min": 2
      }
    },
    {
      "tipo": "datastore",
      "nome": "cache",
      "config": {
        "engine": "redis",
        "tamanho_gb": 5
      }
    },
    {
      "tipo": "datastore",
      "nome": "audit",
      "config": {
        "engine": "postgres",
        "tamanho_gb": 50,
        "ha": true
      }
    },
    {
      "tipo": "archive",
      "nome": "datasets",
      "config": {
        "subtipo": "arquivos",
        "tamanho_gb": 100,
        "publico": false
      }
    },
    {
      "tipo": "gateway",
      "nome": "dns",
      "config": {
        "dominio": "chat.acme.io",
        "tls": true,
        "alvo_recurso_id": "inference"
      }
    },
    {
      "tipo": "autoscale",
      "nome": "scale",
      "config": {
        "alvo_recurso_id": "inference",
        "metrica": "requests",
        "limiar": 70
      }
    },
    {
      "tipo": "insight",
      "nome": "obs",
      "config": {
        "retencao_dias": 90,
        "alertas": true
      }
    }
  ]
}
Documentation · Tesserra