← The Series/Part 1 · Architecture

The company that runs itself: an architecture overview

I run a niche vertical-data product that is operated, day to day, by a fleet of LLM agents organized like a company. A CEO, domain managers, workers, and auditors. This is the opening map of how that works, and where it breaks.

There is a business I run that, on most days, I barely touch. It scrapes a steady stream of upstream sources, cleans and links the data, scores it with a stack of proprietary models, ships the results to a small set of customers, and writes about itself on social media. The work that keeps it alive (pulling fresh data, catching regressions, deploying fixes, drafting outreach) is done by a fleet of language-model agents. I am the single human in the loop, and most of what reaches me is a one-page brief each morning.

I want to be precise about what this is and is not. It is not a magic self-improving machine. It is a messy, opinionated, occasionally self-sabotaging org chart made of prompts and cron jobs, held together by a shared database and a handful of hard-won doctrines. It works. It also fails in ways that are funny in hindsight and expensive in the moment. This series is about how it is built: the patterns that generalize to any business, not the vertical I happen to be in. This first piece is the map.

The shape of the thing

The system is organized like a company on purpose. The metaphor is not decoration; it is load-bearing. When you have dozens of agents firing on schedules, "who is responsible for this, and who do they answer to" is the only thing that keeps the whole thing from dissolving into noise.

The structure is a tree:

Human (owner)
└── CEO agent (runs a few times a day, synthesizes one brief for the human)
    ├── Domain Manager (data)
    │   ├── Worker  (does the labor)
    │   └── Auditor (verifies the worker's output, escalates on failure)
    ├── Domain Manager (scraping)
    │   ├── Worker
    │   └── Ingest worker
    ├── Domain Manager (build / code)
    │   ├── Worker
    │   └── Auditor (auto-deploys when checks pass)
    ├── Domain Manager (research)
    ├── Domain Manager (content / social)
    └── standalone auditors (docs, "audit of audits", memory)

Four kinds of agents, four jobs:

  • Workers do the labor: scrape a source, dedup a table, write a piece of code, draft a post.
  • Managers own a domain. They don't do much hands-on work; they monitor their workers, verify outputs at a higher level, and close each run by writing a note to the CEO's inbox.
  • Auditors sit beside workers and check their output. An auditor is not a manager. It has no people; it has a job, which is suspicion. A failed audit escalates.
  • The CEO is the single human-facing throat. Every manager's note and every escalation lands in one place, and the CEO synthesizes them into the brief I read with coffee.

The doctrine I enforce hardest: every agent reports to someone, and every scheduled job is watched by a named agent. An agent with no manager, or a cron with no watcher, is a finding: a bug to be fixed, not a quirk to be tolerated. I have a literal page that exists only to make that auditable, and it regularly turns up orphans. More on those later.

The brain: one shared database as memory

None of these agents talk to each other directly. They coordinate through a shared Postgres database (I call it the brain), and almost everything flows through two tables.

The first is a key/value store of long-lived knowledge: operating rules, schema patterns, role definitions, the documentation of how every part of the system works. An agent's first act in any session is to read its own role and the current project status out of the brain:

-- every session starts here
SELECT content FROM brain WHERE file_key = 'SKILL_<MY_ROLE>';
SELECT content FROM brain WHERE file_key = 'PROJECT_STATUS';

The second is a single task queue. There is no per-domain queue: one table, filtered by mode (data, scrape, build, research, content). Agents claim work from it, mark it in progress, and close it out. The status vocabulary is small and strict, because the subtle bugs live here. A careless query that treated "superseded" tasks as workable once left hundreds of dead rows looking claimable, and agents happily re-did finished work.

SELECT * FROM task_queue
WHERE mode = 'data'
  AND status NOT IN ('complete','failed','superseded','stale')  -- all four terminal states
ORDER BY priority, created_at;

The brain is also where agents log to each other across time. Sessions crash. A worker that discovers something the next session needs to know writes it down immediately, not at the end, because there might not be an end. This sounds obvious until you have lost a chain of reasoning to a crashed session for the third time. The rule became doctrine: log when something major happens, not when you're done.

A whole article in this series is about this shared memory: why a boring SQL database beats anything fancier, and how the schema shapes agent behavior more than the prompts do.

Schedules, not a server

The system has no central loop. It is a collection of agents on cron schedules: a data worker every hour, scrapers every few hours, managers a couple of times a day, auditors interleaved, the CEO a handful of times daily. Each fire is a fresh, mostly stateless session that boots up, reads the brain, does a bounded chunk of work, writes its results back, and exits.

This is the single best decision in the architecture and also the source of the most exotic failures. The good part: agents are cheap, isolated, and restartable. A crash costs one fire, not the company. The bad part: scheduled agents are invisible to each other and easy to get almost right. I have had jobs that were marked active but never actually fired. Jobs that fired but silently logged nothing. A detector cron whose job was to catch silent failures that itself silently stopped.

The lesson, learned the hard way over a multi-week stretch where a whole class of scoring jobs failed with nobody watching: a detector is an alarm, not a responder. Every schedule needs a named agent that confirms the schedule actually fired and escalates when it didn't. The scheduling article gets into the watcher-of-watchers problem in detail.

The immune system

Because the workers are fallible and the schedules are leaky, a surprising fraction of the fleet does nothing but check the rest of the fleet. There are auditors for data quality, for the documentation drifting out of sync with reality, for cron health, and (my favorite) an "audit of audits" that runs weekly and verifies the auditors themselves are doing their jobs.

The design principle here is separation of doing and checking. The agent that writes the data is not the agent that blesses it. A read-only doc auditor flags drift but is forbidden from fixing it: flagging and fixing are different jobs held by different agents, on purpose, so that "I'll just quietly patch this" can't hide a systemic problem. There is also a structural backstop below the agents: a database function that runs daily and asks, bluntly, is data still flowing through every pipeline? If a canonical pipeline goes silent, it appends a line to the CEO's inbox before any human notices. The auditing layer is its own article, because building an immune system for an autonomous system turns out to be most of the work.

Memory and learning

Stateless sessions have no memory by default, which means the fleet would make the same mistakes forever. So there is a memory worker: a nightly agent that reads the last several session logs for each agent, distills recurring failure patterns into a per-agent memory row, and files root-cause tasks when a mistake keeps recurring. The next time that agent boots, it reads its accumulated scar tissue along with its role.

This is the closest thing the system has to learning, and it is deliberately crude: synthesis of past logs into durable notes, not weight updates. It is also where the most interesting behavior emerges, and it gets its own piece.

The pipelines underneath

All of the above is the operating system. The actual product is a set of per-domain pipelines: a scrape layer that pulls from upstream sources through rotating infrastructure and lands everything in staging tables; a promotion step that moves staged data into canonical tables only after it passes checks (nothing writes to canonical directly, and that rule alone has prevented countless bad-data incidents); a scoring layer that blends several weighted sub-scores into the composite signals customers actually pay for; and a delivery layer that turns all of it into briefs and dashboards. Each pipeline is a domain with its own manager, workers, and auditors, and each gets its own article in this series.

What this is honestly like

It is not a utopia. On any given week something is subtly broken: an agent escalating a question it should have just answered, a manager whose status write is malformed, a role defined in the brain that maps to no living agent, a cron that's been firing into the void. The doctrines in this series (every agent reports to someone, every job has a watcher, everything stages before it goes canonical, log immediately, investigate before you escalate) are not best practices I read somewhere. Each one is a scar from a specific outage.

But it runs. A small data business does the work of a team, most of the cost is tokens, and the human surface area is one brief a day plus the occasional decision only I can make. The rest of this series takes the map apart one region at a time: the shared brain, agent memory, the org chart, scheduling, the immune system, and the pipelines. Start anywhere. They all connect back here.

Takeaways

  • Pick an org metaphor and enforce it. Workers do, managers own, auditors check, one agent talks to the human. "Who is responsible and who do they report to" is the structure that keeps a fleet from becoming noise.
  • Coordinate through one boring shared store. A plain database for durable knowledge plus a single status-typed task queue beats clever inter-agent messaging, and the status vocabulary will matter more than you expect.
  • Separate doing from checking. The agent that produces output should never be the one that approves it, and your detectors need their own watchers.
  • Assume sessions die. Make every fire stateless and restartable, log knowledge the moment you learn it, and never trust that a scheduled job ran just because it was scheduled.
  • Every doctrine should be a scar. Don't adopt rules for elegance; adopt them when a specific failure forces you to, and write down which failure each one prevents.

Get the next one

New pieces on building autonomous systems, every few days.