Agent Infrastructure
as Code.

Orloj is an open-source runtime for AI agent systems. Define agents, tools, permissions, and workflows in YAML. Deploy them with governance built in.

git clone https://github.com/OrlojHQ/orloj.git
The Problem

Production agents need governance.

Today
  • Agents call tools they shouldn't touch
  • Token costs spike with no budget controls
  • Failures are silent. No retries, no dead-letters
  • Multi-agent wiring is bespoke glue code
  • No audit trail when things go wrong at 3am
With Orloj
  • Tool permissions enforced at the execution layer
  • Token caps and model whitelists per agent
  • Lease-based retry, idempotent replay, dead-letter handling
  • Declarative YAML graphs with fan-out and join gates
  • Full task trace and message lifecycle logging
Why Orloj
01

Agents as declarative manifests, not programs.

Version-controlled manifests for agents, tools, models, and workflows. Apply them with one command. Diff them in PRs. Roll back with git revert.

02

Governance enforced at the execution layer.

Policies, roles, and tool permissions are checked before every agent turn and tool call. Unauthorized actions fail closed with audit trails. These are hard constraints, not prompt instructions.

03

Production reliability built into the runtime.

Lease-based task ownership, capped exponential retry with jitter, dead-letter handling, fan-out/fan-in orchestration, cron scheduling. The primitives your agent system is missing.

See It

One command. Full agent system.

orlojctl apply -f ./your-system/ creates every resource (agents, graph, governance, task) in a single declarative pass.
Step 1
Define an agent
Agents declare their model, tools, permissions, and execution limits. Data, not code.
apiVersion: orloj.dev/v1 kind: Agent metadata: name: research-agent spec: model_ref: openai-default prompt: | You are a research assistant. Produce concise, evidence-backed answers. tools: - web_search - vector_db roles: - analyst-role limits: max_steps: 6 timeout: 30s
Step 2
Compose agents into a workflow
Agent Systems compose agents into directed graphs: pipelines, hierarchies, or swarm loops. The runtime handles message routing, fan-out, and join gates.
apiVersion: orloj.dev/v1 kind: AgentSystem metadata: name: report-system spec: agents: - planner-agent - research-agent - writer-agent graph: planner-agent: next: research-agent research-agent: next: writer-agent
Step 3
Enforce governance
Policies are checked before every agent turn and tool call. This is a runtime gate, not a prompt instruction. Blocked calls return structured errors with full audit traces.
apiVersion: orloj.dev/v1 kind: AgentPolicy metadata: name: cost-and-security-policy spec: apply_mode: scoped target_systems: - report-system max_tokens_per_run: 50000 allowed_models: - gpt-4o blocked_tools: - filesystem_delete
Capabilities

Built for production. Not prototypes.

DAG-based orchestration

Pipeline, hierarchical, and swarm-loop topologies. Fan-out/fan-in with join gates. Turn-bounded loops for iterative agent coordination.

Model routing

Bind agents to any provider (OpenAI, Anthropic, Gemini, Ollama, etc.) through ModelEndpoint resources. Swap models without changing agent manifests.

Tool isolation

Four isolation backends: direct, sandboxed, container, and WASM. Configurable per tool based on risk level. Read-only filesystems, no-network, memory-capped by default for high-risk tools.

Fail-closed governance

AgentPolicy, AgentRole, and ToolPermission are evaluated inline during execution. Unauthorized actions are denied and logged, never silently ignored.

MCP support

Connect MCP servers as native tool types. The McpServer controller auto-discovers tools and makes them available to agents with full governance applied.

Observability

Task trace, message lifecycle tracking, per-agent metrics, and live event streaming through the built-in web console.

Architecture

Server. Workers. Governance.

Orloj runs as a server/worker architecture that scales from a single process to distributed deployments. Governance is enforced inline at the worker layer.
Serverorlojd
API Server · REST, watch, web console
Resource Store · mem or Postgres
Task Scheduler · assignment, cron, webhooks
Services · reconciliation loops per resource
assigns tasks ↓
Workersorlojworker
Model Gateway · OpenAI, Anthropic, Ollama
Tool Runtime · sandboxed, container, WASM
Message Bus · mem or NATS JetStream
Task Worker · lease-based, concurrent
Governanceenforced inline at every step
AgentPolicyAgentRoleToolPermission

Development mode

Single process. In-memory storage. Sequential execution. No external dependencies.

orlojd --embedded-worker --storage-backend=memory

Production mode

Postgres state. NATS JetStream messaging. Horizontal worker scaling. Parallel fan-out.

orlojd --storage-backend=postgres
orlojworker --agent-message-bus-backend=nats-jetstream
Templates

Starter templates for real operational workflows.

Each template is a ready-to-deploy Orloj manifest for a common infrastructure task. These are on the roadmap and community contributions are welcome.
Coming soon

Incident response triage

Webhook-triggered. Agents pull logs, correlate metrics, check recent deployments. Read-only tool permissions mean investigation agents can look but can't roll back infrastructure.

Coming soon

Compliance evidence collector

Pipeline agents check contracts against regulatory requirements. Model whitelists keep sensitive content off unapproved providers. Every finding is traced and auditable.

Coming soon

CVE investigation pipeline

Researcher, analyst, and editor stages in a hierarchical agent system. The researcher can query CVE databases; only the editor can write to the output. Token budgets enforced per run.

Coming soon

Secret rotation auditor

Agents scan infrastructure for stale or exposed secrets using WASM-isolated tools. Metadata-only access patterns let agents audit secrets without reading secret values.

20 templates planned. See the full roadmap → or contribute a template →

Get Started

Running in five minutes.

1

Clone and start

git clone https://github.com/OrlojHQ/orloj.git && cd orloj
go run ./cmd/orlojd \
  --storage-backend=memory \
  --embedded-worker \
  --model-gateway-provider=mock
2

Deploy a pipeline

orlojctl apply -f examples/blueprints/pipeline/
3

Check results

orlojctl get task bp-pipeline-task
# → Status: Succeeded
Three starter blueprints included: pipeline, hierarchical, and swarm-loop. Connect a real model provider in minutes. Read the full quickstart →
Community

Built in the open. Contribute from day one.

Orloj is Apache 2.0. The full runtime is open source: governance, orchestration, scheduling, observability. Enterprise features (SSO, compliance packaging, hosted cloud) will be built on top.

GitHub

Star the repo, read the source, open an issue.

github.com/OrlojHQ/orloj →

Discord

Ask questions, share what you're building, join weekly community calls.

discord.gg/orloj →

Contribute

Good first issues labeled. Architecture docs available. PRs welcome.

Contributing guide →

Stop wiring. Start declaring.

Define your agents, enforce your policies, and ship to production.
Get Started →Read the Docs
git clone https://github.com/OrlojHQ/orloj.git