Vertical Software Factory

I’ve had a vague sense that something was wrong with how unreliable the best coding agents are, without knowing what to do about it. Then I came upon this post by Mary Rose Cook on X. She had found a way to get reliable output from AI coding agents by constraining the domain. The agent gets a framework and tools scoped to one specific kind of software and produces it reliably. A nice side effect is that it’s cheaper since it requires less code to be generated. This is a vertical software factory and I think it’s the best way to reliably build high-quality software with AI coding agents.

I’ll walk through the post in sections with my commentary, using a randomly selected industry that I just asked ChatGPT about to explore what it might look like in a different domain.

A constrained domain

An arcade game with vector graphics on a mobile touch device browser in a portrait 16x9 aspect ratio. That’s my target domain. Even within these extremely tight constraints, there are infinite possibilities for expression. But now, the chances of success are much higher. The framework can make decisions upfront so I can reduce model drift whilst imposing minimal limits on the possibilities. And the framework can supply built-ins that are highly likely to be useful.

This is the 3rd paragraph in the post, but the most important. The constrained domain makes everything else possible. It lets you provide a framework where most of the decisions about data models, workflows, use cases and their implementations are already made.

I randomly picked project management for mid-sized construction companies that specialize in restaurants.

A framework that supplies decisions and built-ins

Every game is given a game framework upfront. This framework fixes many decisions. That game entities have a certain, fixed data shape. That behavior abstraction is done with prototypal inheritance. That the coordinates of an entity represent its top left. All of this keeps the code generation aligned. And this framework includes generally useful built-ins. An update/event/draw loop. A WebGL canvas render surface. A collision detection and resolution system. A particle system. A system to detect input. All of this reduces the amount of code that must be generated.

This is what makes production reliable. Most decisions are already made. The agent has the tools it needs to build the kind of software being asked of it. It does not have to generate that much.

For mid-sized construction companies that specialize in restaurants, ChatGPT says the system should model: commercial relationships, scope and documents, schedules and workflows, cost and change controls, procurement paying special attention to long-lead items, field execution, inspections and closeout.

A vertical SaaS company ships one piece of software with all those parts assembled one fixed way. A vertical software factory has all the parts ready to be assembled around the specific workflows of each company. The cost of that customization becomes the cost of the tokens to put it together.

Authentic verification

When generating code, it’s becoming common practice to build a means of verification into every prompt. Telling the model to write tests. Telling the model to drive the browser to try a new feature. Naturally, for code generation that just works, we also require verification of every change. The framework has a little test harness that takes a stream of inputs and some frame numbers indicating when screenshots should be captured. The game is then run heedlessly—several seconds of gameplay in just a few tens of milliseconds—and the screens are captured. Then the model can do what it does best: make an interpretation of whether some criteria have been achieved. The model can, for example, build a speedboat that sprays water behind it, run the game, send input to make the speed boat go, grab a screenshot and decide: does that look like a boat spraying water?

A constrained domain lets you constrain what needs to be verified. The framework and built-ins are already verified. You just need to verify this specific arrangement of them.

For Cook that’s a test harness that runs the game. For the construction company it might be simulating a whole project from start to finish, day by day, week by week. You could even use data from a past project to replay with your software to show how much time and money they could have saved. A constrained domain means a domain-specific test harness coupled with AI agents so you can verify the software at a level of detail that would have been impractical before.

A manipulable artifact

Prompting can be tiresome. Language is ambiguous. The model can interpret a prompt in a way the game designer didn’t intend, and make the wrong change. Language is clumsy. It’s hard to precisely indicate any point in a continuum. A color. A point. An amount. But there’s a solution. Do it the old way. Give the game developer a user interface through which to express their intent. A color picker to choose the color of the water. A slider to select the density of the spray ejected by a speedboat. A drag and drop interaction to move game entities into the right place. Some of these are framework built-ins. Every game needs to position entities. But many are generated dynamically by the model, based on the exact needs of just this game.

With the constrained domain you can allow the software to be safely manipulated within the constraints of the industry and even the specific company. You can allow a project manager to build different versions of the app for each project, adding or removing parts, prompting a one-off component, reshaping the software as their process changes. It becomes realistic to envision Super Mario Maker for mid-sized construction companies specializing in restaurant build outs.

Published: 2026-04-19

Last Edited: 2026-04-19