Obversary-OS · runtime / orchestration¶

Obversary-OS is the higher-level orchestration / runtime layer of the stack — agents, tools, workflows, APIs. It is the layer meant to sit above the two memory repos (earth-database as the local canonical core and memory-dropbox as the event-sourced substrate experiment), but the current public repo is still a prototype with its own event-memory store. Its job is one specific thing:

Decide what to do when something happens, and write down what you decided so the substrate can remember it.

That’s the whole shape. The rest is implementation detail.

Repository: github.com/obversary/Obversary-OS

Why a runtime layer at all¶

Most agent prototypes I’ve read collapse seven things into one object: prompt construction, task routing, tool use, memory, execution, evaluation, persistence. That works for a demo. It falls apart the moment you want to ask a basic question of the system — why did you pick that tool — because there’s no place where the picking was a first-class event with its own log line.

Obversary-OS forces those responsibilities into separate layers so each decision can be observed. Signal detection is its own module. Role selection is its own module. Workflow assembly, execution, event emission, memory updates — each is its own module with its own surface. None of them know about each other except through events.

That’s what I mean when I say modular. Not “split into files.” Split into things that don’t share state except through the event bus. The substrate underneath only works if the runtime above it is willing to be that disciplined.

The core loop¶

        flowchart LR
  signal[signal received] --> role[role selected]
  role --> workflow[workflow assembled]
  workflow --> task[task executed]
  task --> event[event recorded]
  event --> memory[(memory updated)]
  memory -. context for next signal .-> signal

Every arrow is an event on the bus. That means every arrow can be subscribed to, inspected, or used as input to the next decision. When something fails, you can trace the failure back to a specific arrow instead of pulling apart one big function. The dotted return line is the intended closed loop — today the prototype records that history locally; wiring it into the memory substrates is the next step before I call the loop complete.

Request path¶

End-to-end, the way a real HTTP request flows through the runtime:

        flowchart LR
  subgraph api["api/"]
    POST["POST /roles/run"]
    GETe["GET /memory/events"]
  end

  subgraph core["core/"]
    TM[TaskManager]
    EB[Event Bus]
  end

  subgraph cog["cognition/"]
    AGT["selected role<br/>(planner / executor / critic)"]
  end

  subgraph mem["memory/"]
    MS[Memory Store]
  end

  POST --> TM
  TM --> AGT
  TM --> EB
  EB --> MS
  GETe --> MS

POST /roles/run enters the task manager, which fires task.submitted on the bus, dispatches to the selected role, and continues to fire lifecycle events (task.started, task.completed, task.failed) as the run progresses. The memory store subscribes to all events. GET /memory/events reads from that store. That’s why the runtime is auditable at prototype scale — recent recorded decisions are on one endpoint, in order, ready to inspect.

The smallest meaningful behavior¶

The whole idea — publish an event on a topic, have a subscriber receive it — is in examples/minimal_demo.py:

import asyncio
from api.event_bus import event_bus

async def main() -> None:
    received: list[dict] = []

    async def handler(event: dict) -> None:
        received.append(event)

    await event_bus.subscribe("demo.minimal", handler)
    await event_bus.publish("demo.minimal", {"message": "hello"})

    assert received[0]["topic"] == "demo.minimal"
    assert received[0]["payload"]["message"] == "hello"

That’s it. That’s the kernel. Every layer above this is just more topics, more handlers, more structure on the payloads. If you want to know whether the runtime works, you run that demo first and look at it as one event going from publisher to subscriber. Everything else is the same shape, repeated.

Running it¶

There are two surfaces — a no-server demo and a real HTTP layer.

Minimal demo (no HTTP server):

python examples/minimal_demo.py

API server:

uvicorn api.main:app --reload

Submit a task:

curl -X POST http://127.0.0.1:8000/roles/run \
  -H "Content-Type: application/json" \
  -d '{"role_name":"default","task_id":"123"}'

Read recent events from memory:

curl "http://127.0.0.1:8000/memory/events?limit=20"

The /memory/events route is the part that matters most, because it’s the thing that makes the runtime’s behavior legible from outside. In the current prototype, every decision the runtime records is visible there in order, with timestamps and payloads. That is the surface a real substrate adapter should preserve, not magic away.

What ships in the repo¶

The packages, in roughly the order they fire during a request:

api/ — FastAPI app. Routers for roles, tasks, memory. The HTTP surface.
schemas/ — pydantic-style shapes for Task, Signal, Event, Role, Workflow.
cognition/ — signals.py, signal_detector.py, roles.py, role_templates.py (planner / executor / critic patterns), planner.py, critic.py, synthesizer.py. The decision layer.
workflows/ — assembler.py, patterns.py. Composing roles into multi-step flows.
core/ — registry.py, orchestrator.py, execution_engine.py, event_bus.py, router.py. The central nervous system.
memory/store.py — the in-memory event history view. The connection point to the substrate.
tools/runner.py and models/ — tool execution and model adapters. Plug points.
tests/ — test_minimal_flow.py, test_event_bus.py, test_agents_api.py, test_task_manager.py, test_imports.py. Coverage of the loop.

The longer architecture write-up lives in docs/ARCHITECTURE.md in the repo, with a Mermaid diagram of the request path. That doc is the one I’d read second, after examples/minimal_demo.py.

What this connects to¶

Obversary-OS doesn’t exist for its own sake. It’s a runtime for something, with a clear line between what exists now and what it is meant to connect to next:

It currently writes events to its own small memory/event view. The intended substrate adapters are earth-database when the target is the local canonical core, or memory-dropbox when the target is the larger event-sourced substrate experiment.
It can hand off to ingestion lanes like PDF Intelligence Core when the request involves real-world input, but the handoff should be treated as an integration boundary until it is wired and tested.
It creates the kind of routing and tool-use fields that structured failure traces need later. That is not the same thing as a finished trace pipeline; it is the runtime-side shape that makes one possible.
It’s the layer failure-induced benchmarks should eventually feed back into, when the failures from yesterday’s runs become the harder questions for tomorrow’s.

The thesis (Why memory is the substrate) treats modular ingestion as the scalability mechanism. Obversary-OS is the layer that makes modular ingestion possible in practice — because if the runtime isn’t disciplined about emitting events, no substrate underneath can compose anything from the pieces.

Honest scope¶

It’s a prototype. Roles are pluggable but the ones in cognition/role_templates.py use a mock LLM — swap a real client in when you’re ready. The orchestrator is in-process and asyncio-based, not distributed. There’s no auth, no production task queue, no SLO. It’s a research-shaped scaffold for studying what happens when you take “every decision is an event” seriously, not a platform.

That’s enough for the question I’m trying to answer with this layer. The harder questions — distributed orchestration, persistent event store, durable replay — are real, and they’re worth doing only after the in-process version shows the discipline is worth preserving.