Building Anvil Registry With Codex
Building Anvil Registry With Codex
Anvil Registry started as a fairly simple
irritation: npm install is an enormous amount of trust squeezed into one command.
That is not a new observation. The JavaScript ecosystem has spent years collecting examples of why dependency installs deserve more scrutiny: lifecycle scripts, package confusion, typo-squatting, unexpected maintainer changes, compromised releases, and the occasional package that appears to have been assembled in a room with no windows and a lot of confidence.
The useful question was smaller: what would I actually want in front of install traffic?
The answer became Anvil: an open-source npm registry gateway and Node devcontainer harness for safer dependency installs. The gateway sits between package managers and upstream registries, proxies metadata and tarballs, caches artefacts, applies deterministic policy, queues analysis, and gives reviewers a place to understand decisions. The Node Base image gives unknown repositories a safer local install path, with safe mode, observed mode, reports, and explicit handling for packages that need lifecycle scripts.
It is alpha software. Very deliberately alpha software. The point of open sourcing it now is to make the shape inspectable while the decisions are still malleable, not to pretend a security tool is finished because a README learned to stand upright.
The spec came first
I wrote the spec before building the system.
That sounds obvious, but with AI coding tools it is dangerously easy to skip. You can ask for a thing, get a tree of files back, and only later notice that the model has built a pleasant-looking alternate universe. Technically impressive. Operationally not your problem.
So I used plan mode first.
The spec described the actual product boundary: registry gateway, policy engine, analysis worker, admin surfaces, CLI, docs, local Docker path, AWS/SST deployment path, and the companion Node Base image. It also named the non-negotiables:
- deterministic policy is the enforcement layer
- optional LLM review can add context, but does not get to be the authority
- package identity and cached artefacts matter
- overrides need reasons and expiry
- private package metadata should not wander off for AI review unless explicitly enabled
- local development should be possible without pretending every user has production infrastructure
That gave Codex something better than a prompt. It gave it constraints.
Then I used /goal on high ...for 15 hours
Once the spec was written, I used Codex with /goal on high and let it carry the larger delivery
loop.
The useful part was not that it generated code quickly, although it did. The useful part was that it could keep returning to the goal: read the existing files, make the next small change, run the checks, inspect the failure, patch the gap, update the docs, then keep going.
That loop matters on a project like this because the work crosses a lot of surfaces:
- Fastify services for gateway, admin, and worker behaviour
- shared TypeScript packages for config, policy, registry clients, persistence, queueing, analysis, provenance, and types
- a CLI that has to be scriptable enough for local and CI use
- Docker Compose for local trials
- SST infrastructure for deployment experiments
- public docs and repo docs that need to match the code
- Node Base helper scripts for safer install workflows
None of those pieces is impossible alone. The difficulty is keeping them aligned. A registry gateway with stale docs is a puzzle box with ports. A policy engine with unclear override behaviour is decorative security. A CLI that cannot explain its decisions is just another command that fails with an attitude.
Codex was useful because it could hold more of that coordination work at once.
The human job did not disappear
The human job moved up a level.
I still had to decide what counted as the product, what belonged in the first public version, what was too grand for alpha, and where the security posture needed to be boring rather than clever. I also had to review the result like someone who might have to maintain it later, because future me is not an abstract stakeholder. Future me has a calendar and a temper.
The biggest judgement calls were about authority.
LLM review is useful as structured reviewer context. It can summarize suspicious patterns, flag interesting changes, and help a human focus. It should not be the thing that allows a package through the gate. Deterministic policy owns that decision. That boundary is not anti-AI. It is basic systems hygiene.
The same applied to the open-source posture. The project needed to be public enough to inspect, run, and critique, but honest about being rough alpha software. Security tooling that oversells itself is how teams end up trusting the brochure instead of the behaviour.
What exists now
The current public version includes the main project shape:
- a registry gateway for npm-compatible install traffic
- policy decisions before tarball access
- metadata and tarball proxying
- cached analysis keyed around package identity
- worker-based static analysis
- CLI commands for explain, scan, warm, queues, reports, overrides, and policy tests
- admin review surfaces
- Docker Compose local infrastructure
- SST deployment scaffolding
- a Node 22 devcontainer base image with safe and observed install modes
- public docs at anvil-registry.vercel.app
It is not the final form. Good. Final forms are suspicious this early.
The useful version of open source is not "look, a repo". It is a working invitation: inspect the architecture, run the quickstart, disagree with the policy defaults, test the install path, and point at the parts that are still too trusting.
What I learned
Codex is strongest when it is not asked to improvise the product.
Writing the spec in plan mode first changed the whole build. It made the agent more useful because
the work had edges. /goal on high then turned the build into an extended delivery loop rather than
a series of disconnected prompts.
That is the pattern I trust more now:
- make the product boundary explicit
- write down the boring constraints
- let Codex inspect before editing
- keep changes tied to a goal
- run the checks
- update the docs while the code is still warm
- review the output with human judgement, because syntax is not strategy and confidence is not correctness
Anvil Registry exists because dependency installs are too important to leave entirely to vibes.
Codex helped build it because the work was specified, constrained, checked, and reviewed. That is the version of AI-assisted development I can get behind: not magic, not replacement, just a very capable collaborator doing useful work inside a well-defined box.