Projects overview¶
This is the map. If you’ve already read Why memory is the substrate, this is the page that tells you which articles correspond to which part of that bet, in roughly the order I’d recommend reading them.
The work splits into a few layers, and they’re meant to be read as layers — not as separate products.
The substrate — two sibling memory repos¶
The bottom of the stack. Two deliberate siblings, not “primary and backup”:
earth-database · local memory core — the canonical substrate. Local, embedded, SQLite with FTS5, provenance, deterministic trust utilities, and a JSONL event log that makes decisions inspectable. The smallest honest version of the idea.
Memory Dropbox · event-sourced substrate experiment — the larger sibling. Postgres, Redis, Qdrant, worker. Where derived memories, observation memories, and agent-facing memory experiments get tested at larger scope than fits in one embedded database.
What memory substrates are for — the research-speed page. Six domains treated as experiment directions where either substrate could make patterns easier to preserve, inspect, and replay: crime scene intelligence as the load-bearing stress test, plus research operations, engineering incident memory, legal matters, clinical operations, and creative production.
The runtime¶
The thing that decides what to do, given the substrate.
Obversary-OS — modular runtime for roles, workflows, tools, and model interfaces. It is the layer meant to sit above both memory repos. The current prototype makes routing and task decisions visible through an event-memory surface; the substrate wiring is the next boundary, not a finished integration.
Applied ingestion¶
The first real test of whether the substrate works on messy input.
PDF Intelligence Core — runnable pipeline: PDF → validated Markdown + audit → chunks + traces → vectors → deterministic graph with provenance. Every stage writes inspectable artifacts to disk.
PDF to Markdown Tools for AI Pipelines — the tool landscape. Which extractor for which kind of document, why no single converter wins, how to evaluate them on your own files. Read this if you’re starting a document pipeline from zero.
PDF staging workspace — directory convention for keeping inbox, intermediates, and outputs separated.
Evaluation and failure¶
The argument that failure is the second memory.
Evaluation Systems — the lane hub.
Memory-guided evaluation — evaluation that keeps routing-time decisions and learning-time updates separate so the system can be inspected instead of hand-waved.
Structured failure traces — schema for trajectories that ended in failure, so failures become comparable objects instead of one-off log lines.
Failure clusters as interventions — turning recurring failure shapes into system changes.
Failure-induced benchmarks — using clustered failures to build harder questions instead of running yesterday’s static sets forever.
Failure-sliced eval — measuring across slices so an aggregate score can’t smother a real problem.
Foundations¶
The translation layer between math and code, and between code and behavior.
Math Boundaries for AI Systems — where math actually buys you something in an AI system, and where engineering judgment has to take over. I wrote this for myself first.
Agent / RL foundations¶
Smaller pedagogical pieces. Useful if you’re building from scratch and want to see the shape of an agent before the libraries do the showing for you.
Blank RL Agent Template — minimal
Agentclass, thelearn()progression from memory tally to tabular Q to neural batch.RL Agent Skeleton — short hub linking the two RL articles.
PyTorch DQN Agent Walkthrough — the same scaffold with PyTorch and a non-empty
learn().
Experiments¶
Focused studies that don’t fit cleanly into a layer above.
Failure discovery on binary reasoning — small controlled experiment in whether failure clusters can recover known reasoning categories from failure data alone.
Security research¶
The lane where the rest of the stack meets the adversary. Privacy as freedom, memory as substrate, and security architecture as the practical enforcement of both.
Security Research — the lane hub. What this area covers, how it ties back to the rest of the stack, and why AI-era security belongs on a research site.
Prompt injection, and why “just sanitize the input” isn’t enough — plain-English walkthrough of the vulnerability class, direct vs indirect, the CyberScoop-reported incidents (Google Antigravity, OpenAI Atlas, the Five Eyes guidance), and the layered defense doctrine.
earth-database · trust-aware memory substrate — the repo that carries the working version of the doctrine. SQLite-backed, WAL mode, FTS5 search, JSONL event log, plus deterministic trust-boundary utilities for classification, injection scanning, policy checks, and retrieval wrapping.
How to read the work, if you’re new here¶
If you’re another research engineer thinking about memory, evaluation, or agent runtimes, this is the order I’d recommend:
Why memory is the substrate — the bet.
earth-database — the local canonical memory core, including the trust-boundary work. This is the smallest honest version of the substrate idea.
Memory Dropbox — the event-sourced substrate experiment at larger scope.
Obversary-OS — the runtime prototype that makes decisions inspectable before the substrate integration is treated as real.
PDF Intelligence Core — real input turned into inspectable artifacts the substrate can later preserve.
Memory-guided evaluation and Structured failure traces — failure as the second memory.
Everything else is a deeper cut on one of those.
Status¶
These are early-stage. The goal isn’t a finished platform — it’s to make the system-building process visible while it’s still in progress. If something works, the repo is live and the article says so. If something is still a bet, I say that too.