Putting My AI-Augmented Workflow Through Its First Real Test

Luiz Cavalieri

Apr 28, 2026 • 10 min read

In Part 1 I described the setup I'd been trialling on side projects: Claude Code as the engine, Conductor to run multiple agents in parallel, RPI/spec-driven development on top to keep the whole thing from going sideways. That post was a theory. This one is what happened the first time I pointed the whole stack at something I actually cared about — my own homepage, which I use as my own lab for experiments like this.

I gave myself the Anzac Day long weekend. It took less than 24 hours, between lunch with friends, a few shopping mall runs and some episodes of "From" and "Game of Thrones" – yes, it's 2026, and it's the first time I'm watching GOT.

When the dust settled, there were 17 sibling branches, 18 task briefs, 29 commits, one umbrella PR, and a follow-on spec for a feature I hadn't even planned to scope yet. Below is how that worked, what surprised me, and the two things that didn't work the way I expected.

First, the brief

The old luizcavalieri.dev was a competent, generic dark-teal engineering portfolio. Cards, gradients, the usual. The new design — internal name "Storyline" — is something else: cream-and-ink editorial palette, hazard-yellow accents, Bricolage Grotesque display set against Fraunces italics, magazine-style top strip with mono labels. Same content, completely different voice. The point was to make the site feel like a publication someone runs, not a CV with extra CSS.

The brief Claude Design captured (I had to replace Rugby with Soccer)

Scope: home page only. About and Projects can defer. The new chrome (navbar, footer, layout) wraps only the home route — the existing chrome keeps serving the other pages until I get round to them. That single constraint — don't touch what you don't have to — turned out to be load-bearing for everything that came next.

From design file to dependency graph

The first two hours weren't coding. It was specifying.

I started in a fresh Conductor session (Claude Code under the hood) with two inputs: the Storyline design HTML (one big file with all the new sections) and a zip of assets. I asked Claude to plan the migration under my existing Agent OS scaffold. What came back was "a plan" in the chat sense, and the main piece was the suggestion of a folder with all the spec files for each one of the tasks:

agent-os/specs/2026-04-25-storyline-redesign/
├── spec.md             # the why + scope + what's deferred
├── branch-strategy.md  # wave/branch topology
├── design-map.md       # line-range index into the design HTML
└── tasks/              # 17 self-contained briefs, one per branch
    ├── 1a-tokens.md
    ├── 1b-favicons.md
    ├── 1c-lc-mark.md
    ├── 1d-meta-html.md
    ├── 2a-navbar.md
    ├── 2b-footer.md
    ├── 2c-root-layout.md
    ├── 3a-home-scaffold.md
    ├── 4a-hero.md
    ├── 4b-achievements.md
    ├── 4c-expertise.md
    ├── 4d-pull-quote.md
    ├── 4e-path.md
    ├── 4f-writing.md
    ├── 5a-mobile.md
    ├── 5b-a11y-focus.md
    └── 5c-perf-fonts.md

Each task brief is the load-bearing piece. They follow the same shape:

Branch name and PR target.
Wave + dependencies — which other briefs must merge first, which it can run in parallel with.
Files I own (only touch these) and files I read but do not modify.
Standards to apply — explicit pointers into agent-os/standards/index.yml, so the agent only loads the ~3 standards files it needs, not the whole library.
Visual structure — line-range pointers into the design HTML, so the agent reads its slice of the design file, not the whole 600-line document.
Concrete implementation notes — content data inlined, exact class names and clamp values, edge cases called out.
Verification — what command to run, what to check.
Out of scope — a hard fence.

Here's the opening of 4a-hero.md — the brief for the hero section, one of the more complex tasks — to make this concrete:

markdown

# Task 4A: sl-hero

**Branch:** `luizcavalieri/sl-hero` → PR target
`luizcavalieri/storyline-redesign`
**Wave:** 4 · **Depends on:** 3A merged
**Parallel-safe with:** 4B, 4C, 4D, 4E, 4F

## Objective
HeroSection renders the Storyline editorial hero: hot-red eyebrow,
massive Bricolage display headline with hazard highlight + Fraunces
italic accents, Fraunces standfirst, LC mosaic mark on the side with
a hard 8px shadow, mono `by`-block, and a hero-foot meta strip.

## Files I own (ONLY touch these)
- src/containers/HeroSection/HeroSection.tsx
- Any co-located CSS under src/containers/HeroSection/

## Files I read but do not modify
- .context/attachments/design-bundle/sites/storyline.html
  lines 254–274 (hero HTML) and 63–80 (hero CSS)
- src/assets/lc-mosaic.svg (added in 1C)
- agent-os/standards/index.yml

That's the first of the 17 briefs. Each one is similar: tight, scoped, self-contained. Small enough to fit in a fresh Claude session with room to spare. No "what was the other agent doing" cross-talk. No spec drift.

The thing I want to draw a circle around is what those briefs aren't. They aren't tickets in a backlog. They aren't loose acceptance criteria. They aren't "hey, build the hero section." They are surgical — they tell the agent which lines of the design file to read, which standards to load, which files it can touch, and which files it must leave alone. The agent doesn't have to make scoping decisions because I've already made them.

That work — the spec folder — is the bottleneck. Everything else falls out of it.

The wave plan

Once the briefs were in place, the dependency graph wrote itself. Here's the topology I shipped against:

WAVE 1 — Foundation       4 in parallel
  sl-tokens · sl-favicons · sl-lc-mark · sl-meta-html
                         ↓ all four merge into umbrella
                         
WAVE 2 — Shell            2 in parallel
  sl-navbar · sl-footer
                         ↓ both merge
                         
WAVE 2.5 — Layout         1 (sequential)
  sl-storyline-layout
                         ↓ merges
                         
WAVE 3 — Scaffold         1 (sequential)
  sl-home-scaffold
                         ↓ merges
                         
WAVE 4 — Sections         6 in parallel
  sl-hero · sl-achievements · sl-expertise
  sl-pull-quote · sl-path · sl-writing
                         ↓ all six merge
                         
WAVE 5 — Polish           3 in parallel
  sl-mobile · sl-a11y-focus · sl-perf-fonts
                         ↓ all three merge
                         
                         ↓
                  Umbrella PR → main

Five waves. Maximum 6-way parallelism. The shape was: parallelise where files don't overlap, sequence where they do. Wave 4 was the payoff — six section rebuilds, six branches, six PRs, one afternoon.

The PR topology mirrored the wave structure. One umbrella branch — luizcavalieri/storyline-redesign — with one draft umbrella PR (#26) targeting main. Every sibling PR targeted the umbrella, not main. That gave me a stack of small, reviewable PRs without the merge-queue gymnastics you usually need for stacked work:

main
 └─ luizcavalieri/storyline-redesign  (umbrella PR #26 — draft)
     ├─ luizcavalieri/sl-tokens         → PR → umbrella
     ├─ luizcavalieri/sl-favicons       → PR → umbrella
     ├─ luizcavalieri/sl-lc-mark        → PR → umbrella
     ├─ luizcavalieri/sl-meta-html      → PR → umbrella
     ├─ luizcavalieri/sl-navbar         → PR → umbrella
     ├─ luizcavalieri/sl-footer         → PR → umbrella
     ├─ luizcavalieri/sl-storyline-layout → PR → umbrella
     ├─ luizcavalieri/sl-home-scaffold  → PR → umbrella
     ├─ luizcavalieri/sl-hero           → PR → umbrella
     ├─ luizcavalieri/sl-achievements   → PR → umbrella
     ├─ luizcavalieri/sl-expertise      → PR → umbrella
     ├─ luizcavalieri/sl-pull-quote     → PR → umbrella
     ├─ luizcavalieri/sl-path           → PR → umbrella
     ├─ luizcavalieri/sl-writing        → PR → umbrella
     ├─ luizcavalieri/sl-mobile         → PR → umbrella
     ├─ luizcavalieri/sl-a11y-focus     → PR → umbrella
     └─ luizcavalieri/sl-perf-fonts     → PR → umbrella

When the umbrella PR finally lands on main. It lands as a single cohesive change with 17 reviewable PRs in its history. The reviewer — currently still me, but the model holds for a team — gets to look at any wave in isolation.

Conductor as the dispatcher

This is where Conductor stopped being a nice idea and started carrying its weight.

Each task brief became one Conductor workspace. Each workspace is its own isolated git checkout, its own Claude Code session, its own trusted shell. The prompt I pasted into each one was four lines:

Read agent-os/specs/2026-04-25-storyline-redesign/tasks/<brief>.md
and execute it. Do not read other task briefs or the umbrella plan.
When done, run yarn build, commit, push, open a PR with base
luizcavalieri/storyline-redesign.

That's it. That's the whole instruction.

It works because the brief is doing all the heavy lifting. The prompt doesn't need to explain the goal, the constraints, or the standards — those are all in the file the agent is about to read. The prompt's only job is to point the agent at the right brief, fence it off from the others, and tell it how to ship.

When Wave 4 came around — the six-section rebuilds — I opened six Conductor workspace side by side. Pasted the four-line prompt into each. Watched them work. Three were done in five minutes. The longest took about 15 minutes. None of them stepped on each other because the briefs had been engineered so they couldn't.

The git log for that wave looks like this:

7642837 feat(writing): rebuild WritingSection as Storyline proj-list
4309971 feat(path): rebuild LeadershipStorySection as sl-path grid
9885c4e feat(skills): rebuild SkillsSection as Storyline tile grid
5579863 feat(achievements): rebuild ProjectsSection as 3-tile grid
70fd5bb feat(hero): implement Storyline editorial HeroSection
864d0ac feat(pull-quote): implement PullQuote editorial section

Six commits, six branches, six PRs, no rebases, no conflicts.

The bit that surprised me most

Two things surprised me. One good. One annoying.

The good one: the bottleneck wasn't writing the code. The bottleneck was the spec. Wave 4's six parallel sections were built green on the first try because every meaningful decision had already been made — what classes, what content, what files, what standards. The time I'd spent sharpening briefs paid back at roughly 6× in the implementation phase. If anything, I under-invested in the spec for Wave 1 (briefs needed a couple of revisions in the comments before agents shipped them) and over-invested by Wave 4 (briefs went straight through). More spec, fewer corrections downstream. It's the same dynamic as code review — every minute spent reviewing a doc is twenty minutes saved fixing the code it would have produced.

The annoying one: I tried to be too clever in the orchestration. My initial idea was to have one Claude Code session orchestrate all four Wave 1 agents itself, using the harness's Agent isolation:worktree tool to spawn parallel subagents. Single window, one prompt, fanned out. It didn't work. The subagents inherited a sandboxed permission set that blocked git, yarn, and gh, with no clean way to elevate them mid-flight.

It took me a bit to realise that this was a Claude Code harness limitation, not a Conductor one. Conductor handles parallelism cleanly because each workspace is its own real shell with its own trusted permissions. The "smart" path was over-engineered. The dumb path — open six windows, paste six prompts — was the right one.

When a tool is built for parallelism, use its parallelism. Not a clever simulation inside a different tool.

The polish round

Around 18 hours in, the umbrella PR was a coherent thing. Build green. Tests passing. Type-checking clean.

It also looked broken in the browser.

Not unusably broken — the structure was right, the colours were right, the typography was right. But hero highlights were bleeding behind text on tablet widths, the LC logo was pushing the headline off-grid at desktop widths, the headline copy needed one more pass, and there was leftover Vite-scaffold CSS in index.css that no agent had thought to remove because no agent had been asked to.

That became a separate task brief — x1-hero-fix.md — and a follow-up commit run on the umbrella. The git log for that round:

7efff38 fix(hero): fix highlight bleed, grid squash, and overflow
        + purge Vite-scaffold CSS
0627970 feat(hero): switch standfirst to editorial-voice copy
767aae6 style(hero): top-align grid so LC mark sits at top right
f645c3e fix(hero): cap LC mark to its grid column so the headline
        isn't covered
bc3a507 style(hero): left-align headline and standfirst on desktop,
        center on mobile
c494203 style(hero): update headline copy and cap desktop font size
        to 110px
20aa5d7 style: widen root max-width to 1600px and update writing
        section copy

Each one is a small editorial fix in a real browser. The lesson is one I'd written down before, but apparently needed to relearn: build-green isn't ship-green. Twenty minutes in a real browser, scrolling through real breakpoints, viewing the page as a reader rather than a reviewer, is not optional. The agents can't do that part for you, because it isn't a coding problem.

The byproduct

Around mid-project, I stopped to ask Claude an unrelated question: how much work would it be to set up per-PR preview deployments — preview-pr26.luizcavalieri.dev, that kind of thing — so I could share the umbrella branch with someone before merging it.

The answer was longer than the question deserved. Two GitHub Actions workflows, some Terraform/IAM additions, an architecture diagram, a cost estimate (~$0.20–1.20 per PR), an effort estimate (~4–6 hours). Useful enough that we agreed the answer should become its own spec under agent-os/specs/2026-04-27-pr-preview-deployments/spec.md, with a roadmap entry pointing at it for whenever I get round to that work.

That was the part of the workflow I didn't expect to value as much as I do. The spec folder isn't just instructions for agents right now — it's a running record of decisions and their reasoning, building up naturally as a side effect of working this way. Every spec is a small, opinionated design doc. Six months from now, when I come back to this site, I won't have to reconstruct what I was thinking. The folder will tell me.

The numbers

For the people who like the numbers:

1 umbrella PR (#26)
17 sibling branches, 17 PRs targeting the umbrella
18 task briefs (17 + the post-ship hero-fix)
29 commits on the umbrella branch
5 waves
Max 6-way parallelism in Wave 4
< 24 hours end-to-end, including the polish round
1 follow-on spec produced as exhaust

What I'd take into Part 3

If Part 1 was the philosophy and Part 2 is the proof, Part 3 — whenever it lands — will probably be about pushing this from "personal site" to "actual team workflow," because the bits that obviously need to scale are visible from here.

The thing I'd say if you're trying any of this: don't start from the agents. Start from the dependency graph. The model in Part 1 was that the senior skill is writing precise specs and decomposing complex changes into well-bounded pieces. This redesign is what that looks like in practice — a folder of seventeen briefs, a wave plan that tells you which six can run in parallel, and a four-line prompt template that points each agent at exactly the slice of the work it owns.

The agents were the easy part. The hard part — and the part I'd happily redo before writing a single line of code next time — was the dependency graph. That's the artifact. The code falls out of it.

This is Part 2 of an ongoing series. Part 1 — My AI-Augmented Coding Workflow sets up the tools (Claude Code, Conductor, RPI) and explains why specs matter more than agents.