Trying something new: agentic engineering

I'm probably late to this, but here's what shifted for me: I stopped just letting Claude Code work and started thinking more about how to set it up so I still make the important calls while staying mostly hands-off on the implementation. Turns out that distinction matters.

Fixing an error - with agents

Here is a familiar situation: looking at Sentry we see an error, in this case in a Blade file for an email. In fact this is being worked on as I write this, so now comes the part that is new (for me):

Earlier I was trying the Sentry CLI and saw an error on listing issues that seemed like it might be up for a quick fix - an issue in Blade file for an email notification. So I opened Solo, which is where the agent orchestration part matters: one lead agent can keep a scratchpad and todos, dispatch bounded worker agents through Solo MCP, watch their progress, and integrate their handoffs. I used a skill - basically a reusable instruction file for that workflow (add link) - to tell it to take a look at this error ID in the CLI. From there the orchestration more or less took over. One agent read the error and wrote down its findings, then another picked up the implementation. That one did not just patch the view; it started with a regression test and worked from there.

What struck me was how thoroughly the agents treated this, even though it was buried in a less critical part of the app. I might honestly have added a quick ? and called it done. Instead, the agent went for the actual underlying cause.

It even went one further: at the end of this Claude dispatched a QA agent which then found that there was a second, very similar, notification that would have the same bug - so where otherwise a "quick fix" would've just resulted in adding a "?" in the code we instead ended up with some actual improvements plus the tests around them.

The PR ended up at +386/-19 lines, which is funny considering I still think a person might have looked at the same bug and just added a single ?. But the interesting part is that 183 and 184 of those added lines were in two test files. I remember hearing somewhere that one aspect of agentic engineering is that it is slower than running one agent in one main Claude window, and it burns a lot more tokens, but what you get back for that cost is quality. This felt like a very literal version of that trade: more time and tokens, but also a more complete fix with tests around it.

That seems to be the important shift for me: agentic engineering is not about throwing more agents at a problem. It is about keeping control of the shape of the work while delegating bounded pieces of it. The orchestration is the work.

Tools

As mentioned earlier a big role in this comes to Solo: it is essentially a terminal with an MCP which allows Claude or Codex to read other terminal tabs and thus spawn instances of themselves or even each other, send prompts, inspect output, and adjust settings. It also provides the option to create scratchpads (think: Confluence page) and todos (Jira tickets) which can be used to provide context to the agents and steer them. Agents can leave comments on the todos and in that way keep track of their findings. Right now I am additionally trying out something called mulch for generating long-lasting memories.

Achieving reliable results

None of this is magic, which is the encouraging part. LLMs seem to need the same basic primitives any team does: a place to write things down, a way to track what's next, some form of memory, and a few fresh eyes when the context gets long. Solo and mulch are my current attempt at giving them that.