Simulation. What nobody tells you.
Simulation is one of the things we noticed early being a golden squeeze when working with LLMs.
When building any tool that's fundamentally powered by an LLM, "what can be simulated?" is probably the first question we ask. always.
Anyone building serious AI products will say the same. But they might not publicly disclose because a lot of the moat lies within these simulated environments.
You might ask, why simulate at all? Is the LLM not going to give me the same information/output anyways?
Well, that depends on the complexity of the task. On the context window currently loaded and being manipulated.
For us, it's always been about context engineering. To get to a better contextual understanding of a task.
Simulation is a cheat code. Secondly your simulations become useless if you can't score them appropriately. How you do evaluation matters.
You want to be able to evaluate your outputs and engineered context blobs to be as deep and useful as possible.
The reverse of this is SLOP. Zero shot nonsense that every now and then amazes, but the moment you want a robust system it all fails at a rate too large to make any sense at all.
We want less slop and more greatness.
Software is moving closer to gaming in this sense. Games inherently run in game engines, so by definition they run in simulated environments. Then to go one step further, simulation category games are where I see the most overlap/signal.
Now jigs.
In woodworking when you want to be able to do the same part over and over again. You don't measure every piece, you push it into a jig.
Removing heaps of friction and repetitive work in the process. That otherwise can lead to failures (yeah measuring is extremely prone to errors).
Obviously jigs are used in all sorts of manufacturing. But in software we kinda forgot the importance of this.
If you're building a complex agentic system/behaviour, not using a jig. Will mean that there are far too many points to measure manually. That regardless of your skills will lead to you making measurement errors that then lead to unreliable outputs.
Constructing a jig around what you're building to be able to tweak and adjust parameters/prompts/engineered-context-blobs, on both low and high level gives you the necessary levers to build, not something good, but something fantastic.
Loads of creative legacy software is built around this idea already. Any node based system is fundamentally a jig. There can be non-visual parts to the jig too.
If you're building agentic systems, being able to simulate tens of thousands of pathways, measure, analyze, tweak, re-run, will make your product stand out, excel where others fall short.
At Flocurve we built a jig around our growth and outreach agents.
100+ parameters. Signal matching on one end, context analysis on the other. Everything in between is a lever.
Tweak one, watch the others move. Run it across thousands of prospects, see what holds.
What comes out is consensus. Not a single model guessing. A system that has triangulated the same prospect from 100+ angles and agreed.
That's how you go from "this looks like a lead" to actual matchmaking. Customer targeting that doesn't degrade the moment you scale.
It's funny because in training LLMs this is largely what's going on. Tweaking, simulating, pre-runs, re-runs, emulate, simulate and evaluate.
But somehow the successful builders on the inference end of this are not being very vocal about how they are doing the same.
Training does this in public.
Inference does it in private.
The builders who win at inference will be the ones treating it like training.