Develop, deploy, operate: The three stages of delivering business value

Get the new 2025 DORA State of AI-Assisted Software Development report →

Product
Product overview
Get to know Swarmia
Business outcomes
Align engineering with the business
Developer productivity
Speed up feature delivery
Developer experience
Get feedback from engineers
Data platform
Reliably measure and export your data
Integrations
See all the systems we support
Features
AI impact
CI visibility
Developer overview
DORA metrics
Engineering metrics
Initiatives
Investment balance
Notifications
Signals
Software capitalization
Sprints
Surveys
Work log
Working agreements
Swarmia for startups Swarmia for enterprises Security
Changelog
Pricing
Customers
Learn
Blog
Insights for software leaders, managers, and engineers
Help center↗
Find the answers you need to make the most out of Swarmia
Podcast
Catch interviews with software leaders on Engineering Unblocked
Benchmarks
Find the biggest opportunities for improvement
Book: Build
Read for free, or buy a physical copy or the Kindle version
Read for free→Buy on Amazon→
About us
Careers

Jul 31, 2025 · Episode 26

Develop, deploy, operate: The three stages of delivering business value

In this episode, Rebecca talks with Titus Winters — a Senior Principal Scientist at Adobe — about his new research paper titled Develop, Deploy, Operate: A holistic model for understanding the costs and value of software development.

Transcript

Show notes

Titus Winters brings infrastructure thinking to software economics — revealing why manual deployment processes are pure overhead and how to quantify the true cost of defects with a (sometimes) simple formula. His framework splits software into three phases: creative development, factory-like deployment, and operational maintenance, arguing that humans belong only in the first.

Drawing from 13 years at Google where he led C++ infrastructure and shaped how millions of lines of code get written, Titus shares hard-won insights: platform teams need 10-20% of engineering capacity, scheduling developers at 100% guarantees disaster, and that “important but not urgent” 20% buffer determines whether teams accelerate or stagnate.

Watch the episode on YouTube →

Timestamps

(0:00) Introductions
(3:58) Why Titus is drawn to this space
(6:21) How do you measure the impact of preventing a bug?
(8:36) How this paper came to be
(10:45) Why is it so hard for non-technical leadership to understand engineering?
(12:42) How relevant is the framework for smaller organizations?
(13:52) Estimating the true cost of defects
(17:40) Measure systems, not individuals
(18:25) Why humans in the deployment loop are pure toil
(22:17) Why DORA doesn’t measure the squishy, creative part of software development
(27:26) Infrastructure investment below 10% kills companies
(29:39) Changes in regulations that affect software companies
(31:36) How much should organizations spend on ‘platformization’
(36:00) Why teams should run at 70% capacity, not 100%
(40:15) Product vs engineering responsibilities
(42:04) Titus’ hot take on AI
(44:55) Where to find the paper

Links and mentions

Follow Titus on LinkedIn
Follow Rebecca on LinkedIn

Transcript

Titus: That’s a recipe for burnout and disaster because if anything goes wrong, you have nothing to spare. And so the lesson that I was given 10 years ago by directors that I respect completely at Google was: you want to schedule your teams at 70, maybe 80% of planned work, and then you have a budget of 20 or 30% of the time for important but not urgent stuff.

Rebecca: I’m Rebecca Murphy, and this is Engineering Unblocked. Engineering Unblocked is brought to you by Swarmia, the engineering intelligence platform that’s trusted by some of the best software companies in the world, including startups like Superhuman, scale-ups like Miro and Honeycomb, and companies from the Fortune 500. On today’s show, I’m talking to Titus Winters. He’s a senior principal scientist at Adobe (we’re gonna talk about what that is) and one of the authors of a new paper that’s pretty cool called “Develop, Deploy, Operate: A holistic view of commercial software development.” So Titus, welcome. I was so excited to get a hold of you once I saw this paper come across my screen. So thanks for being here.

Titus: Thanks for having me.

Rebecca: What should people know about you? Besides that you were an author on this paper and you work at Adobe?

Titus: Right. So people in the C++ community probably know me. I did 13 years at Google, eventually growing into roughly running the C++ monorepo codebase there. So that’s style guide, education, training, the low-level libraries. Basically, all of the nuts and bolts, and in most lines of code that come out of people’s fingers at Google, had some relevance to what me and my team were attached to.

Rebecca: Yes.

Titus: I spent a few years on the C++ standards committee. I basically chaired the design of the C++ 20 era changes. We stepped away from all of that post-COVID, I left Google and about a year and a half ago, I joined up at Adobe as senior principal scientist. I’m roughly senior principal scientist for developer experience. I like to say developer experience is a polite fiction. We’re in the job of productivity. We just recognize that you can’t easily measure that, and that the single best proxy that we have is: do you hate the tools that we have given you to do your job? So it’s not give everyone a donut and a pony. It’s make sure that they have the knowledge, expertise, skills, tools, platform to be effective in the work that we’re asking them to do.

Rebecca: So maybe give everyone an M3 though. We did that at Stripe, not — it was just M1s then, but…

Titus: I mean, there’s a lot of argument for penny-wise dollar foolishness when it comes to hardware acquisitions.

Rebecca: So I’m imagining you’re not the only senior principal scientist at Adobe.

Titus: No.

Rebecca: So what is this kind of the new term for architect? What would you call this?

Titus: No, so this is strictly a job level rank. It’s one step down from VP, it’s like senior director equivalent for individual contributors. And I think there are some people that have senior architect as the title, but it’s kind of the same thing.

Rebecca: Got it. So you’ve had the interesting experience of creating software because C++ is software. But you’ve obviously also created software that is for writing software. And so I’m curious, having seen both sides of software development, is that an advantage for you? How does that shape how you think about software?

Titus: What I find I’m just drawn to is I keep stumbling towards a problem and being like, “No, no, no, no. We can’t fix that here. We have to fix that at a lower level.” And over and over again, I just keep getting trapped by, “Nope, there’s a lower level fix for that, to do this properly.” And I just wind up in the basement.

Rebecca: I think that any roles where you’re writing software for people who write software brings an additional perspective to software.

Titus: And I think one of the questions that I wind up asking people when they’re interviewing for teams with me is: are you motivated by seeing users out in the wild go use your beautiful new thing and be able to point your parents at it and be like, “Look, mom, look at the thing I did”? Or are you motivated by “I hate this problem, I want to see it go away. I don’t want anyone ever to have this again”? And some people do both, but I find a lot of people are majority one side or the other.

Rebecca: I thought I wanted to do the first and then I discovered the power of doing the second. And the joy of code archaeology. Maybe not joy in the moment, but eventual joy. So this paper — you told me a little bit when we were talking earlier about how this came to be. But I’m gonna give you my bad, very bad one-sentence summary of the paper, and we’re gonna talk for the next 45 minutes about what it’s actually about. So it basically offers a framework for understanding and optimizing the commercial value of software development. And I really like that term “commercial” because why else are we doing this, right?

Titus: Right, this is business.

Rebecca: We’re not doing it for fun. Maybe we have fun, but that’s not why we’re doing it. And more importantly, it challenges traditional productivity measurement approaches and challenges typical traditional ways of thinking about how software creates value and how engineers create value. What experiences were you having or what things were you seeing that kind of led you to believe this framework needs to be out in the world?

Titus: This has been a thing that I’ve been pushing towards for many years. And I think it stemmed from being in an infrastructure sort of role at Google. And contrary to everyone’s belief, Google’s not infinite headcount, but especially not if you’re down in the basement. Like it’s not shiny to be like, “Hey, yeah, let’s teach everyone how to be a better engineer.” It makes sense, but it’s somehow not compelling. And so I just kept having to refine the argument of what’s the value of better infrastructure. And how do you look at, and how do you measure the impact of preventing a bug or refactoring a crummy API? Because that was a lot of what we were doing.

But historically the orgs that I was attached to luckily also had the people that were doing fleet efficiency optimization work, and that’s a space where you can measure things really, really well. Like, how many QPS do we get per unit of compute? What’s the overall cost of memcopy or the memory allocator? The seminal papers on stuff like the data center tax, those people were all cousins of mine. And I was always furious. I love those guys, they’re great. But I was always furious at how much they could just measure in many, many dollars what the value of improvement to the memory allocator strategy looks like, whereas trying to draw an equivalent, even a rough estimate of what’s the value of education, bug prevention, whatever, just was so hard.

And I eventually finally started having, I felt like some traction on it during the pandemic. So there’s an ACCU keynote for me in 2021 or maybe 2022 that starts getting at some of these ideas. And then the problem being is that I can’t just point someone to an hour-long talk for explaining how they’re thinking about all of this wrong.

And so eventually those ideas, you repeat long enough, they start to get condensed. And so I think I had it down to about eight, 10 pages by the time I left Google and it was like: this is still kind of a mess. These are, there’s a lot of really big ideas in here. This is maybe the most important thing I’ve written and I’ve written some things.

But that was kind of where it was. I made sure to hand that off to people that I trusted and that had been thinking about stuff with me before I left. And to especially her great credit, Leah Rivers was the PM director that I was working with at the time. She’s a brilliant thinker. I love her work. And she picked that up and kept tinkering with it nights and weekends for more than a year, and then eventually reached out and she’s like, “I think we have this in a shape that it could be published.” And I was like, that’s amazing, because I cannot tell you how often I need this thing as a citation.

Rebecca: You also talked about the need for — and just to paraphrase what I think you just said — it’s a communication tool for decision makers, for communicating to decision makers about why, about how to think correctly about the commercial value of the work that they’re doing.

Why is that so hard?

Titus: I don’t know, exactly.

Rebecca: A senior leader can’t be as unfamiliar with marketing or finance or sales as they are allowed to be around technology and the business of technology. And that’s not my original insight, Christine Miao wrote about that the other day. What is the instinct that you’re fighting against when you need this paper?

Titus: Well, it depends on the leader. In tech, especially in bigger tech, you’re gonna eventually wind up with executives that may or may not have actually had a tech background. They may be professionally executives and that’s great in some respects. We need the business acumen, you need the decision making, you need all of that stuff. Managing a large org is a distinct skill from programming.

But some of the time they just don’t have that background at all. Some of the time I think it goes to: this is still just a very young industry, right? The term software engineering is just over 50 years old. I was talking with Nicole Forsgren of DORA and DX a couple years ago, and I still wish that I had a written version of this quote because I say it all the time. So Nicole, if you’re listening…

But she said to me something like, “Accountants still have annual meetings to talk about best practices in accounting. And accountants are mentioned in the Bible. We’re 50 years old. Give us a minute?” Right? And my point there is if you’ve been out of the trenches for 10 years, that’s 20% of the history of our field, right? Of course the way that you did it then is not necessarily quite the right model for now.

And we have — Adobe grew up as a shrink-wrapped software company and still a lot of our money makers like Photoshop makes a lot of money comparatively. And there’s still echoes of the shrink-wrapped mindset in a development practice over there. Rebecca: I’m sure. How could there not be? Because Photoshop, how old is Photoshop?

Titus: 30 plus? No, it’s gotta be more than that. Probably 40 yeard old. And so I think a lot of it is just they don’t have to have this nuanced sort of understanding of what the optimization and investment strategy looks like for what do we currently understand to be inherent complexity versus accidental rework and toil. And just trying to get some of those axioms out I think is valuable on its own.

But then building on top of that and getting to the point of here is a napkin sketch mechanism where you could simulate the whole software process for a large team and start looking at this as cost minimization, which — it’s a commercial enterprise, right? This is not for funsies. This is a business.

Rebecca: That brings up an interesting point. How relevant is this framework for an engineering org of 30 versus, obviously Google is a little bit bigger than that. And it’s obviously valuable — I’ve been in organizations much smaller than Google where this is also obviously valuable. But I’m also constantly reminded in my job that there are a ton of software engineering organizations with 30 or 40 people that work, and especially in the age of AI, maybe that’s even gonna be more — but we’ll talk about that later. But anyway, how do you see this as applicable across the entire spectrum of engineering, software engineering companies, of size?

Titus: I think that all of the core principles here should generalize nicely. It has been 20 years since I was working in a 50-person company. So admittedly my background on that is out of date by 40% of the history of our field. But I go to the example of what’s the value of having caught this bug early, right? And one of the points in the paper is you effectively can’t answer this. You might, if it really, really, really mattered, be able to put together an approximate answer for one case. But if you’re trying to automate this for all bugs, it’s just a nightmare. It’s just the wrong question.

Instead, a very slight tweak on it is: let’s estimate the cost of our software process as we are executing it and see if there’s any systemic mechanism to reduce that. Right? And my favorite — and this has stuck for most of five years now, I think — my favorite rule of thumb for estimating the cost of a defect when you do find and fix it is number of engineers times amount of time since that was written into a buffer, right?

And this sounds simple, but has really, really nice properties because if you talk about auto formatting and white space fixes, right? Most of that is being done in your IDE right now, and it’s one developer times half a second. That is about how much I’m willing to pay for that, right? I’m not actually gonna block a release for something — the cost of that would be incommensurate with blocking the release, right? We have an instinct for that. So the model isn’t capturing how important is your defect, it’s just what is the cost of a defect that you catch at this particular time.

And one of the really important parts of that goes to your what’s the size of the organization is just the population size, right? If I catch a bug in unit testing before I send it out for code review, that’s one engineer times however long I’ve been working on this PR. Once I send it for code review, it’s two engineers times slightly longer than that because now we have to synchronize our brains on what’s actually going on here and communicate and all of that.

And maybe it’s not actually twice as expensive, but it’s kind of twice as expensive, right? And if you scale it all the way up to like log4j, well that was 10 million Java developers times a decade, which yeah, is approximately the cost of a Black Swan event. And there’s gotta be a discount factor, some constant in front of all of this, but within a constant factor, this actually gives us a lot of guidance on these things.

Rebecca: Well, and you’re bringing up a really important point. We see a lot of people gravitating, a lot of leaders gravitating toward measurement of individual outcomes, right, or of individual performance or delivery impact, whatever. And you bring up the point that I very much agree with that that’s rarely your problem. And that the systemic fixes are much more — not only do they solve the problem for today, but they solve the problem for forever. Or some version of forever. That’s probably five to 10 years. So I think that’s such an important observation here. And then I love how much you talk about systems thinking and fixing the system.

So let’s talk a little bit about what’s in the framework. First and foremost, the title of the paper is “Develop, Deploy, Operate.” And you talk about those as the three phases of delivering value through software. So can you just talk me through what each of those are?

Titus: So I really like to look at this as everything that happens up through merge, committing your PR, is the develop phase. And this is fundamentally human and bespoke and arts. And this is the craft of software, right? And it’s really challenging to get useful signal on your productivity is not the number of PRs. Your productivity is not the number of lines of code you produced. We’ve known that for a long, long time. Because every individual change is different and bespoke. Right. And you can get some sort of aggregate signals of how many changes are we making? But it all feels a little apples and oranges. It’s real squishy.

Once you merge, and this is one of the axioms in the paper, once you merge, there is no theoretical reason for a human to be in the loop, right? This is where we do automated testing and all of the pieces that are ready for can we assemble a release candidate, right? So this is all CI or this is all of post-merge CI at least. And the more that we can get the word out to the industry that humans in the loop at this point are toil, right? It’s toil, it’s overhead, it’s pure rework. If you’re looking for impact, driving down the human involvement in that phase is purely a win, so long as you don’t make your deployment stability worse.

Rebecca: Not only is it just — we all know that having a human phase in your deployment is bad from a getting code out into the world standpoint. But also somebody just reminded me today, well, that engineer is waiting for their last change to go out in QA, they’re starting five new things, or 10, by the time that that QA process is done. And so you’re creating bottlenecks, you already have a bottleneck and you’re creating more flow straight into that bottleneck.

I still have conversations with leaders who believe that they cannot deliver software without a manual QA process because they’re in a highly regulated environment or something. This paper is a tool that a leader in that environment could use with their leadership to say we need to get out of this environment. What other advice would you give that person who’s trying to advocate for this change?

Even if you take it as a given that, okay, you’re in a regulated environment, some amount of manual QA is required — I don’t know if that’s true, but for the sake of argument — figure out what the minimum amount of manual QA that’s required is, whether by regulation or just in practice to ensure that you don’t ship something bad. And try to drive it down to that minimum amount, right? Free up engineering time for other things.

It is very likely that you’re spending more on engineer paychecks than any other cost in your company by a substantial margin, unless you are Google or Facebook or OpenAI, right? And so just drive down that human wasted cost so that they’re free to do other things with that capacity. So…automate.

And all of this rhymes with everything that DORA has been saying for more than a decade now, and I don’t think that it’s a coincidence that all of the DORA metrics done well are specifically measuring the software deployment capability. Right? It’s from the point that you merged. It’s a factory process and therefore we can compare apples to apples. Right. And DORA isn’t saying anything about the engineering efficiency process. The creative part is just outside of what DORA talks about.

Rebecca: This is kind of a tangent, but there’s a project that my team did a couple of jobs ago where we basically made it so something that you used to need an engineer to do, now you didn’t. How would this framework identify an opportunity like that?

Titus: So, I mean, going back to what’s the cost of our defects? If you’re doing this as QA after it got merged and the whole team’s been exposed to it and you’re hours, days, weeks after your changes went in, and then you’re spending actual human effort on it, that sounds substantially more expensive to me, both by our rule of thumb framework or just human involvement after merge is toil, than it would be if you spent a small amount, say a month of that toil budget to build the appropriate unit test framework to make sure that all of those strings are present and that you can AB test or opt in the users in appropriate locales or whatever the thing that you have to do is.

Rebecca: So you’re talking about develop, we’ve talked about deploy. What’s operate?

Titus: And then operate is release, and SRE, and all of that stuff. And I would say for operate it is clear we’ll never drive SRE need down to zero, no matter how good we get. I think the same is practically true in deploy as well. We want to reduce that as far as we can without impacting stability and software quality. So theoretical zero, practical small. But we want the vast majority of our human involvement to be in the develop part of all of this.

And the overarching framework that I really like, which I think got a little bit lost in some of the edits, although it’s sprinkled in there — I think that the correct model for thinking about software is a car company. And that the dealership is much like our operate, right? We can measure actual customer success. How many cars are driven off the lot? We run the mechanic shops and we see how many things are coming back. These are user-reported defects and how much this is costing us and all that stuff. It affects brand reputation when you ship a car and then have to do a recall. These are expensive and should be avoided.

The automotive manufacturing factories are largely automated, probably not fully, but are really focused on efficiency and making sure that everything is basically treated the same and we just have a full pipeline. Some of the same lessons apply going back to early DevOps stuff, the Phoenix Project, et cetera, right? You want to reduce the amount of work in progress. If it takes six weeks from the point where you start building one car to it actually leaves the factory, that’s probably not great. And if you can get that down to a smaller amount of time, that’s probably more efficient.

And then in all of this model, humans are the design part of this, right? How do you design a new car to ship that blueprint to the factory? Right? And if you think about what automotive design teams would find more effective, right? It’s what’s the design vocabulary, common languages and infrastructure, common library stuff, what are reusable components so that they don’t have to build out every wiring harness and new — I don’t know, transmission, wheels. I’m not a car guy.

But find the software analog of those, and just as importantly, if you’re thinking about how would you measure the impact of that part? It’s not how many cars are shipped. It’s not dollars, right? You gotta look at something squishier because it’s the human and creative part.

Rebecca: And there’s a taste aspect of that too, right? You have to understand your user and stuff for sure. And that’s not necessarily a straightforward, straight line process.

You mentioned that you want everyone to be, or you want as many of your engineers to be in the develop mode as possible and not in the deploy and operate. But to get there requires developers to work on the deploy and the operate until they are no longer, or the systems that power deploy and operate until those systems are self-sustaining, which they probably never are if you’re continuing to write code — but they can become self-sustaining on a weekly and monthly cadence, just not on a forever cadence.

So how do you talk to leaders about the need for that? How do you convey the value that they could get if they spend 10% of their engineers on this versus 20% of their engineers on this? Or do you look at it like we should solve this problem and this problem and this problem and we should just make sure those problems are staffed when we want to solve them?

Titus: I’ve done both. I don’t know that I’ve been successful yet, but these are certainly arguments that I’ve raised. I think that both lines are really valuable in different ways. I have seen research, I’ve seen deep case studies that I wish had actually gotten published, but they haven’t made it out yet. I’ve seen case studies from partners and coworkers that no large organization that has invested less than 10% of their engineering effort in infrastructure has survived longer than a handful of years. You will nosedive at less than 10. I don’t know what the number that makes things happy and sustainable is, but I know less than 10 doesn’t work.

I think one of the problems is it’s much easier to go to finance with a coherent, “here is the package of investment and what you’re likely to get out of it”. But that requires being able to speak non-engineer and in a dollars and business sense style about the nature of this problem.

And that’s again, part of why I think this paper was necessary.

The other thing that doesn’t make it into the paper but is relevant these days is just the amount of government regulation on software that’s coming around. In the production services space, we get FedRAMP and things like that. And the Europeans just passed a Cyber Resilience Act, which I’m shocked that we’re not talking about yet.

I’m not a lawyer, but my reading of what just happened in November last year is all EU member states within two years must pass regulations saying software providers are financially liable for the results of their defects. And if the courts find that they’re not following best practices for whatever that means in software engineering and security, they may get additional 2.5% global revenue fines. So hey, time to take this all seriously. Right?

Rebecca: That’s exactly the kind of thing where, nevermind that just basic security and compliance today requires investing, or like GDPR. How many companies had to spin up a platform team just to do that? And they probably should have had one before.

Titus: And the side effect is it’s increasingly not just production software, it’s stuff on consumer devices, it’s desktop, it’s everything. And I don’t anticipate that that’s going to get smaller or easier. And if we don’t have a framework for figuring out what a reasonable investment looks like, or how to measure the impact of engineers that are working in the platform team, or maybe not measure, but at least reason about the impact of those, right? What do you even do? And so I don’t know if this is the right answer, but I don’t have anything better and I haven’t — so here’s my attempt.

Rebecca: So I’ve made many attempts to math this out. And Claude and I have made many, many, many React simulations for different spend and different outcomes that you could see. My personal hunch is that the correct number stabilizes somewhere between 15 and 20. But there may be a time that you may have to go up as high as 25%, especially if you haven’t been doing the work along the way. And that 25% may not look like it’s all on a platform team, but it may be that 25% or even more of your engineering organization is focused on platformization stuff, which does not look at all like product stuff.

Rebecca: Well, you just brought out the word impact, so it’s almost like you knew what we were going to talk about next. Because that’s another thing that you talk about in this paper, which is the four forms of impact: project success, hardware resource efficiency, engineering capacity, and strategic capabilities.

I love this because I think that companies know this, but haven’t necessarily articulated it quite so clearly and well, right? We know in our hearts that strategic capabilities are good, but in our capitalist brains, we say we have to build more product, build more product, build more product, make more money.

So I really like the divisions that you’ve laid out here. Is this something where you need to pick one? Is this something where it’s more just a language for discussing the kinds of impact that you can have?

Titus: I think it’s the language. This is really trying to introduce vocabulary there. And to your point about strategic impact, I was having a discussion today or yesterday about having better statistics data about engineering process and outcomes internal to Adobe.

And one of the points that I was making there is there are things that we could be continually measuring that might only need to be looked up once a quarter by some exec to make an important decision, but the ability to make that decision faster or better may pay for all of this.

You’re never going to actually be able to put a specific bulletproof dollar value on it, but it may still be the most strategically valuable enablement around. Right. And so the strategic value thing is kind of the misc bucket. And the best I’ve got for that is you gotta have people that are willing to put their reputation on the line for “this matters for these reasons, and if I’m wrong, you should fire me because I don’t have good taste.”

Rebecca: I think strategic capabilities are, again, everybody knows that they need them but don’t necessarily have a language for them. Also, like you said, can’t measure the impact of them.

We can acknowledge that there is impact, but one of the challenges that I’ve seen on some platform engineering teams is that, yeah, they have a really hard time talking about their impact. I did some work that made developers really, really happy and I’m confident that it was correct work. I am confident that it delivered value to the company, but the most measurable thing I have is it made developers really, really happy that we were using modern technology in this thing instead of this really cobbled together system that had not evolved well over the years.

So number one, I think in that organization I was relatively fortunate that there was language for appreciating that kind of impact. But there isn’t always language for appreciating that kind of impact or weighing it against, well this person exceeded expectations because their product initiative raised our KPI by 10%. In platform we can talk about dollars saved around the hardware, right?

Titus: Yeah. Kind of.

Rebecca: We can talk about time saved, I guess, but that’s squishy as well.

Titus: Yeah. It’s very squishy and so we specifically phrase it as engineering capacity, not engineering productivity or time. Because although engineer paychecks is probably your biggest expense and therefore engineer time is dollar denominated at some level, if I save all of your engineers 10% of their time in a week, you don’t get to transfer that into a different account, right? It doesn’t show up on a spreadsheet, right? What we have given you is the capacity to do more other engineering, but by and large, absent wild swings in hiring or layoffs, you basically know how much you’re gonna pay for engineering for the year. And the question is, how much return do you get on that predictable cost?

At big tech, at least, cloud spend or data center spend is roughly the same sort of scenario, right? You know how many machines you have, you know how much the cost is to operate those, right? It’s a question of efficiency, right? The machine efficiency argument leads to an interesting thought, though, which is if you run your data centers at 95% capacity, they’re falling apart. Right. You want an individual machine or data center at like 80%, 85%, right? Go talk to SRE or queueing theorists, right? You start to panic when you’re getting up into those nineties.

This drives me nuts when we talk about task assignment and human productivity, right? You cannot schedule your developer teams at 100% capacity. That’s a recipe for burnout and disaster because if anything goes wrong, you have nothing to spare. And so the lesson that I was given 10 years ago by directors that I respect at Google was: you want to schedule your teams at 70, maybe 80% of planned work, and then you have a budget of 20 or 30% of the time for important but not urgent stuff. Going back to Eisenhower Matrix terminology, right?

It’s not for playing ping pong, it’s for team bonding, education, paying off tech debt, having meetings, communication overhead, figuring out things, experiments. It’s for all of those things that are important but not urgent and aren’t gonna make it onto your product roadmap.

And at basically all levels of the hierarchy, fractally, you need to have some of that capacity in reserve so that when it is a crisis you could re-pivot and all hands on deck to go actually fix it for a couple weeks and then go back to normal. Otherwise you screwed up.

Rebecca: And also the normal work that that team is doing will be delivered more predictably if they are at 70% capacity than if they’re at 100% capacity.

Titus: They’ll be happier and growing and getting better. They will be accelerating. And the results will actually be predictable and probably better.

Rebecca: Right, right. I think going back to your car analogy, you see people treating the software component, the development component, the develop piece as a different part of the manufacturing process — as the manufacturing process, not as the design process.

Rebecca: Continuing your analogy, if in theory it takes a design team two months to design the next car, then they clearly should be doing six every year and there’s no time for anything else. And the first time that one of those goes over budget, you’ve just destroyed the whole cycle. No, don’t do this.

Well, that’s an interesting, because obviously GM doesn’t need six new cars this year. Probably. And so you could respond to that by saying, oh, we’ll just hire the designers when we need them, right?

Titus: ‘Cause institutional knowledge and ramp up don’t exist. We’ll just move people over to this team when we need them.

Rebecca: Yeah, I have worked with product managers who have intuitively understood this and they understand the value of doing KTLO work and tech debt reduction and all of that. And they do make time for it. And I’ve worked with product managers who maybe don’t do that. So what experience do you have in talking to product leaders about these topics?

Titus: When I look at product director and above, I have never met anyone that didn’t actually agree, which says to me either I’ve been lucky (possible) or even success in the product management hierarchy does require you to be wise enough to understand the reality of, yeah, this is gonna be a balance. Down at the frontline PM level, then it’s a little bit squishier. I say that all wisdom in tech is either pithy anecdote or aphorism or war story. And there’s a quote from Rob Sargeant — I think he’s a director in Photoshop these days. I love this quote. He says, “It’s product’s job to build the right thing. It’s engineering’s job to build the thing right.”

They can tell you, go build this thing. But the iron triangle of project management still exists — and I hate that that’s the Wikipedia page I cite the most often. You don’t get to pick all of feature set, staffing, and delivery date, right? And so they get to tell you what to work on. You get to tell them how long it’s gonna take and if something unexpected happens, they should be planning for it.

Rebecca: Titus, this has been fascinating, number one, and we are running out of time. So I’m going to condense all the other questions that I was going to ask you into one question, which is: with all of this experience, with all the perspective from developing the ideas in this paper, what hot take do you have about the current state of the software engineering industry?

Titus: Okay. It’s hot take, it’s 2025. I guess it has to be AI, right?

Rebecca: Doesn’t have to be! Your hot take could be it’s not AI.

Titus: I am a professional skeptic. I have been a curmudgeon forever. If too many people are going one way, I will very aggressively go the other way. It’s also probably part of why I wind up in infrastructure roles.

But my stance here is, yes, everyone should be using those tools if you can get over the ethical and environmental issues around it, et cetera, which I certainly hope get better. But none of this is obviating the need for engineers or humans in the loop. The next thing that gets vibe-coded and posted to a million people — this never works. It’s just all bad.

And the one piece of statistics from a high-rank tech exec that I really liked was there was a podcast with Sundar from Google like a month ago. And he said their estimate is 10% productivity gain. And that’s for Google-class engineers where they have gone to great lengths to put it in small places all throughout the workflow, right? 10%. That is a number that I actually believe. I keep estimating we get 5 or 10% for production quality software.

If it’s just programming and vibe coding, a nephew project, yeah, sure. 10 times. You can do that stuff much better. Rebecca: I had a blast spending $500 with Claude for projects.

Titus: Absolutely. But all of this goes back to — it starts echoing with my favorite definition of the difference between programming and software engineering. Which is: it’s programming when “clever” is a compliment, and it’s software engineering when “clever” is an accusation. And if you understand how deep that cut is, then you’re probably a software engineer.

Rebecca: Yes. I love it. Well, we’ll have to record another podcast to talk all about AI because even those three minutes I have so much to say. But we are of the same heart in this topic.

So wrapping up here. How can people find this paper and how can they use it?

Titus: The paper is on ACM Queue. That’s Q-U-E-U-E. If you search for “Develop, Deploy, Operate,” that should probably pop up. Certainly, I’ve posted about it on LinkedIn a bit. You could probably find me there too. And yeah, I hope that you find that to be a useful citation. I would give just about anything for that to be downloaded into every executive brain in the world right now. ’Cause I just want it to be rational.

Rebecca: Seems a lot more important than AI adoption, doesn’t it?

Titus: Yeah. I do not understand why our choice of IDE is suddenly a VP level strategy discussion, but that’s just me.

Rebecca: Here we are. All right. Well, on that note, Titus, it has been excellent talking to you. Really, I am so glad that this artifact exists. As soon as I saw it, I was like, I need to talk to this guy. So thank you again for coming on. I appreciate you being here.

Titus: Thank you so much for having me.

Rebecca: Well, that’s the show. Engineering Unblocked is brought to you by Swarmia, the engineering intelligence platform that’s trusted by some of the best software companies in the world.

Follow Engineering Unblocked on Apple Podcasts, Spotify, or in your favorite podcast app.

Have a question or feedback?
Drop an email to rmurphey@swarmia.com

More episodes

Managing up, down, and the robots · 49 mins

→

How to hire normal engineers and help them do great work · 36 mins

→

Growing engineering headcount in the world of AI · 45 mins

→