I’ve been leading software engineering teams for 15+ years, and frankly, the playbook has stayed pretty much the same. Already back then, the good companies knew how to organize around outcomes and customer value, iterate quickly, empower teams, build tooling and automation, and always improve their ways of working.
However, many things are different now. More has happened in 2026 than in my entire career before that. Our team has doubled their output this spring, with no extra headcount.
And I’ve gone from shipping an occasional DevEx improvement (note: I have a day job as the CEO) to crunching features and migration projects between meetings, delivering 100+ pull requests to production. I felt I had to get hands-on to try to understand the organizational implications of these tools.
After reading Steve Yegge’s post about Anthropic’s ways of working, things clicked for me. Adding agents on top of existing processes doesn’t work. The organization itself has to change.
If you want the full gains from AI coding tools, you need to change your ways of working. It’s not enough that you get your engineers to install Cursor and Claude Code and start producing code faster.
Speeding up the coding step is a local optimization, and local optimizations won’t get you the maximum benefit. To illustrate my point about organization-level efficiency, let’s consider a (simplified) example of what happens when a customer reports a bug:
The organization could easily spend 10+ hours on this bug — 8 people in a team meeting spending 15 minutes already contributes 2 hours to that.
The agentic organization, on the other hand, has a completely different approach:
The total time spent is a few minutes from the support engineer and a few minutes from the engineer. Most notably, there’s no coordination required.
Of course, not every bug is this straightforward. Many require more investigation — figuring out expected behavior, checking external APIs, understanding the blast radius, deciding who needs to be informed before anything gets changed. Those design and organizational questions still take significant human time, and today’s agents don’t help much with them. But if an increasing number of the simpler fixes can be routed through agents, the queue of bugs that need that full coordination loop gets shorter — and the team gets more done overall.
That’s why local optimizations don’t get you very far. Despite the engineer getting a 3x productivity boost from AI tools, the first organization only gained 2-5%. The agentic organization, on the other hand, really could become 10x more productive.
The job of an engineer used to be mostly about delivering a product. The new job is to deliver a system that delivers a product.
Parts of the system are the same as before, but coding agents bring some new requirements:
A team that takes care of their delivery system will regularly ship small improvements to their platform. Did you just spend an extra 2 hours guiding your coding agent around your codebase? Document it in an agent skill, so that your colleague will have an easier time.
In the most advanced cases, this can be packaged into your own agentic development environment. Ramp built Inspect, and Stripe has Minions. Providing excellent defaults and all the right context means that you only need to prompt what you want, and the agent can figure out how to get there.
The types of work software development teams do haven’t changed much. There’s still planned roadmap work, exploratory sessions when you’re figuring something out, and a steady stream of smaller maintenance items. What changes in an agentic organization is how you run each one — and in particular, how much human coordination each requires.
The most common type of work is still roadmap work. There’s existing customer demand and a well-validated idea of what should be built. There might be more questions about how it should be built now, though.
But the same dynamic still applies — you want to split work into reasonably sized increments, and you want to use the team’s expertise to create a good plan. What’s “reasonably sized” is probably a bit different in the AI age, as some previously “one-day tasks” might get done accidentally as part of prompting something bigger.
We’ve found that talking is very expensive, and all teams are encouraged to start every feature planning session with a prototype. Someone from the team will spend 2-4 hours to prompt an end-to-end solution before the planning meeting.
In the meeting, we can now make decisions like
When there are more unknowns, we might consider a special type of synchronous session. We call them Campfires (after Steve Yegge’s post), and it’s like a planning meeting, but instead of a plan, we ship the first iteration of the feature.
These tend to be meetings of 3-5 people (product, design, engineers) taking 2-4 hours. Someone’s “driving” and sharing their screen, while others are commenting and helping with the code review.
Mob programming was invented a while ago, but I never warmed up to it. I always felt that typing code together in a meeting room focuses our efforts on the trivialities, rather than higher-level architectural and product decisions.
One of the biggest productivity gains from AI tools is the flexibility of the implementation strategy. You can try three different paths and pick the one that seems the easiest way forward, as opposed to committing to a strategy before starting the implementation. I’ve noticed changing my mind about the implementation path far more frequently.
It’s important to realize that shaping a feature is only the first part. You should assume that building the 1.0 is maybe 50% of the work (and again, five people in the room means that the clock is ticking faster), and there’s going to be more bug fixes and improvements as you ship it to production and get to try it with customers.
It’s better to dedicate some time for the refinement, without a strict backlog. Instead, try to think of an exit condition. “Do we feel good about pushing this to all of our customers? If not, what’s the biggest blocker?” Rinse and repeat.
We’re regularly shipping features that would have been considered “two-week user stories” for the whole team.
When you’re shipping features faster, you’re signing up for more maintenance. This is a losing proposition, unless you can design the system to do something about that.
While it’s very useful to get the whole team’s input on designing a completely new feature area, the smaller work items benefit from LESS communication. Going back to our “customer reports a bug” example, communication and coordination can be the majority of time spent on these things.
Once you’ve established a “service level” of what kind of things your teams will try to cover, you should build systems to make this as easy as possible.
You could have automations creating pull requests based on:
Optimally, simple fixes will be just approved and merged by an engineer. When something turns out to be more complicated, we might just wait for the issue to re-occur before deciding to prioritize it.
Some of these changes are not automatically created by a bot, but rather prompted by an engineer using a cloud agent. These might be off-the-shelf tools or something custom-built to allow everyone to make changes by prompting a bot in Slack.
It’s critical to maintain a level of quality in your work. If every feature generates 10x the maintenance work compared to implementation, and all of it arrives randomly over time, no amount of AI automation is enough to manage that complexity.
The signal that your stream is working: an engineer can approve and merge a fix in under five minutes without context-switching into the problem. When that stops being true — when simple fixes turn into investigations — it’s time to move the issue into the roadmap.
It’s incredibly easy to be “productive” if you ignore quality. Agents will certainly generate as much code as they can, because that’s what they were built to do.
AI is an amplifier, for better and worse. Any existing ownership gaps will be filled with low-quality slop, but also your best people can now be several times more productive.
One reason AI tools have a bad reputation is because we all see the bugs as a result of using them. I believe the number of defects is influenced by four main factors:
If we’re able to complete two months of work in one week, we’re also introducing a volume of changes that’s guaranteed to have bugs. Two months’ worth of bugs surfacing in one week will feel like everything’s falling apart, but objectively, it seems like a great deal to spend one week building and one week stabilizing something that used to take 8 weeks. Especially when many of those bugs wouldn’t have surfaced pre-production no matter how much time we spent on the implementation.
The second factor is less obvious. AI also helps us move beyond our own comfort zone. Maybe you’re not a mobile engineer, but with some encouragement from AI you can still create a PR in the iOS app. Breaking silos is a great benefit that eliminates dependencies, but it also means you’re now working on a codebase and technology stack you’re not as familiar with — so of course, it’s going to have quality implications.
Engineering skill matters because AI is a multiplier. A senior engineer reviewing AI-generated code will catch what a junior approves without reading. This is not an argument against juniors using AI tools — it’s an argument for investing in their growth, and being realistic about review quality in the meantime.
If you’ve managed to build a culture of quality and ownership, people will challenge each other in planning, code review, and outside of any formal process. The team is comfortable discarding code that doesn’t clear the bar — “works for one use case” isn’t sufficient if the overall shape is wrong.
You also need to address the code review bottleneck, which is inevitable when more code is generated. Just getting rid of code reviews without any consideration is not the right path, but neither is sticking to all of your old ways.
Taking real responsibility over your code is critical. If you’re just presenting AI slop for your team to review without reading it yourself, code review will become an extremely frustrating process. In addition to reading what you produce, you should use AI agents to challenge your own creations — both by instructing them manually, and by implementing automated code review bots.
The fourth factor — the underlying platform — is your safety net. Fast CI, good tests, and observability are what let you absorb a high volume of changes without incidents compounding. If your test suite takes 30 minutes and flakes 20% of the time, agents will find creative ways to work around it.
Just like linters helped us avoid a category of nitpicking in PR reviews, reviewer agents will help us move from manual verification to focusing on higher-level topics. We have a separate PR reviewer agent to look at database migrations, because they are the most common source of production downtime. Now we can move the discussion to “is this the optimal data model?” and “does this feature actually make sense?”
Of course, you don’t just want to run the system on good vibes — you need to know how it’s working, and what kind of debt you might be accumulating.
If you’re delivering amazing features but your bug backlog is doubling every month, your system will grind to a halt very soon.
Some things to pay attention to:
Turns out, these are all existing things covered under the umbrella of engineering effectiveness.
These signals give you a hint whether you’re investing properly in the system that delivers your product, or if you’re just trying to extract all the nice features AI can give you.
All of this is likely to cause an identity crisis with your engineers. Many of us got into tech because we enjoyed crafting code, while testing and code reviews were the downside of the job that we just accepted.
As a result of adopting coding agents, delegating work on a task-by-task level rarely makes sense. What does make more sense is delegating ownership, and it’s something engineers need to come to terms with — that the old way of taking tickets is better suited for agents now.
Being able to absorb this level of ownership means we need to be close to customers and understand the “why” behind features and architectural decisions. This has always been valuable, but previously the levels of seniority indicated your readiness to carry this responsibility. Now we need new hires to demonstrate these skills quite early.
From an employee’s perspective, these things make the work more chaotic. There are more things in flight, more uncertainty about the direction, more reviewing and managing instead of uninterrupted coding time.
And yet, for an organization that’s able to accept the new reality, the upside is the difference between capturing the 10x productivity gain or the 5%.
Subscribe to our newsletter
Get the latest product updates and #goodreads delivered to your inbox once a month.
