Cybersecurity programs are infamous as gatekeepers, power tripping in their virtual CAT machines to dump roadblocks and jackhammer potholes on software delivery. This draws ire from nearly every other software delivery stakeholder but is often justified due to the “fact” that cybersecurity reflects a uniquely complex set of challenges.
This is not, in fact, a fact. It’s closer to propaganda. Cybersecurity isn’t that special. And cybersecurity shouldn’t be that special if we want to minimize the damage of cyberattacks in our software systems.
In this post, I’ll excoriate special snowflake security programs then offer opportunities for how we can make our programs more constructive rather than constrictive.
Is cybersecurity special?
For the sake of honest discourse, let’s consider some of the reasons cyber thought leaders often cite why cybersecurity is a special snowflake – and debunk them:
1. It’s hard to prove our value because it’s based on counterfactuals.
Site reliability engineering (SRE) teams face the same challenge, but mewl much less about it.
2. There are so many software changes across so many teams and how could we possibly make improvements to those activities?
This is literally the mission of platform engineering teams, and many can even provide evidence for the value of their improvements.
3. Attackers are especially clever humans whose purpose in life is to harm us.
Attackers can be clever, but often aren’t – and also the reliability failures we see from “especially clever” developers/teams can foment far more harm than attackers could ever dream. Interns were destroying data long before ransomware gangs were.
4. We’re seen as a cost center!
Again, talk to your SREs or platform engineers or, if you want to hear what it’s really like to be continually dismissed, your D&I colleagues.
5. Software systems are so complex, how could we ever hope to understand them enough to secure them?
Ask basically anyone in your engineering org whether software complexity makes their job harder, especially anyone who mumbles about Lamport clocks during architecture reviews. You think having a nation state as an adversary is bad? They’re fighting against the fundamental laws of physics.
With all due respect
In sum, cybersecurity really isn’t as esoteric and arcane a problem as we believe. Yet, I’m often met with disbelief by cyberppl that things are really that hard for their platform/infra/devops/SRE colleagues. Sometimes it feels like cybersecurity leaders and engineers think that all these teams were automagically bequeathed respect – autonomy, budget, authority – by the business.
And, in that vein, I’ve heard CISO described as “the hardest job in corporate America” and maybe it is if you’re scrambling to cover up felonies, but nearly every problem I hear security leaders complain about is mirrored on the infra and platform and SRE side of things1.
This respect must be earned. These other teams earn it by solving reliability and developer productivity challenges in clever ways. They do the hard work of thinky thinky and buildy buildy rather than foisting cumbersome policies and tools on software engineers in what I call the SSB model (for “sink or swim, bitch”)2. They don’t carve 100 security commandments into Confluence; they build patterns, frameworks, and tooling that encode the right requirements to make the better way the easier, faster way for software engineers.
If cybersecurity wants to earn similar respect, it can’t keep roadblocking and gatekeeping software. It can’t pretend like security failure is so distinct in importance and impact that it requires completely separate workflows, stacks, reviews, tooling, design, and basically everything else. Attackers accessing our systems without our consent is one type of failure, but not the only kind. Reliability failures are arguably both more frequent and more damaging when they occur; developer productivity failures can mean the difference between successful market differentiation and losing market share.
Resilience, not roadblocks
This is precisely why I emphasize software resilience, because it encompasses our reliability and cybersecurity concerns. It’s about our goal outcome: we want systems that can adapt to failures and opportunities alike in an ever-changing world. Those failures can be borne by cyberattackers or by performance bugs or a broken developer experience (DX) – the difference really doesn’t matter as much as we think.
The common “enemy” is unintended behavior. Indeed, it is our ancient archnemesis, the eternal foe that formal methods could not vanquish. Having separate pipelines, observability stacks, or review processes for every contributing factor to unintended behavior would be an operational disaster, and yet cybersecurity insists on precisely this for itself.
It really doesn’t make sense for cybersecurity to have its own special snowflake process for things. It does not make sense operationally, philosophically, or socially. It does not make sense to sustain systems resilience. And it even does not make sense for software security.
If we want to sustain resilience at scale, then we should brainstorm strategies to improve system resilience across failures. In a practical sense, it doesn’t matter whether a service goes down due to a performance bug or an attacker exploiting a security bug; the outcome to the business is lost revenue either way.
For example, a queue or message broker could help us restore service health in either scenario3. Should we ignore this design-based solution in favor of the security team needing to review and approve every single software release because it potentially has exploitable bugs and now the backlog is at least six months long which leads to larger batch sizes by software engineers which leads to higher failures rates and by the Eight Divines4, why does anyone actually think this is an okay state of affairs? The empire building may feel good, but it’s bad for everyone.
By now, there are some security leaders5 fuming that I’m telling them to rip out their precious pet process. This is about the time when, in an IRL convo, they usually retort, “Well, what are you saying, that we shouldn’t care about cybersecurity at all?” That is never what I’m saying. We should care about cybersecurity but we should not silo it or treat its concerns as separate because it actually worsens the outcomes we purportedly care about long-term.
So, what do we do instead of roadblocking? There are a ton of opportunities cybersecurity and platform engineering teams can pursue to achieve the end goal we seek – and far more effectively than heavy-handed cyber design / code reviews. Let’s go through some.
What to do instead of roadblocking
1. Self-certification to guidelines. Let product engineering teams self-certify based on (relatively) complete guidelines we’ve written beforehand. Some cybersecurity teams claim they already do this, but software engineers often find these guidelines too vague and will thus work around them. Typically, this is because the cybersecurity team operates with a “I know it when I see it” mentality on how “secure” looks. And because the cybersecurity team does not have software engineering expertise, it’s often divorced from how software delivery actually works.
2. Follow Platform/SRE’s lead. Integrate security design reviews into the general design/architecture review process (and same with code reviews). Collaborate with site reliability engineering (SRE) or platform engineering teams on this; learn how they ensure their reliability requirements are considered during design review and… do exactly that, basically. Ultimately, both stakeholders (reliability and cybersecurity) want similar outcomes, so the more we can find design opportunities that eliminate or reduce hazards in the system – towards resilience and security by design – the safer and more reliable our code will be (i.e. higher quality).
3. Build standardized patterns. Build standard solutions for “cross-cutting concerns” that apply to all services and software (also called “paved roads” or “patterns”). These are typically built by platform engineering teams for cross-cutting concerns related to performance and reliability. Where platform engineering teams have built them for cybersecurity concerns, it is because they grew frustrated by the cybersecurity team’s inertia and inefficacy, and thus built the standard solutions themselves.
There are a few cybersecurity teams who already build these standard solutions in practice (like at Netflix, Block, and others who I wish would publicize their efforts, hint hint). But I also still hear CISOs who claim doing so is “impossible” (it is objectively not).
For instance, we can create architectural or coding patterns that make it easier to write safer code (or, conversely, make it harder to introduce certain classes of bugs). We can provide standardized libraries for middleware that’s fraught with hazards, like authentication. Product engineering teams thereby save time by not having to implement authN themselves and we gain confidence that there aren’t a bunch of potentially jank, ad-hoc authN disasters lurking.
4. Abandon the perimeter model. Abandon the perimeter, moat-and-castle model that requires all software to be “secure” always and forever to uphold the security properties we want – because guess what, that assumption will inevitably fail. Many organizations already have dev, test, or staging environments that allow us to evaluate our assumptions about how our software works. These environments also serve as a pattern for creating isolated environments that can contain failure impact (see #7).
If your organization doesn’t already have a test or staging environment, that’s a great starter project for your cybersecurity team to make an impact; while you’re at it, advocate for integration testing, too.
5. Advise, don’t dictate. Provide an advisory service where software engineers can ask us for assistance or guidance on security-related matters. It’s better if designers and engineers can balance security tradeoffs with the others they face such as reliability, maintainability, and time-to-market instead of the security team dictating design requirements. This is a key way to align the cybersecurity program with the business.
6. Ask platform teams to integrate security. Work proactively with infrastructure and platform teams to integrate security use cases into their designs, so that the wider product engineering community can delegate security concerns to the libraries and templates they’re using.
7. Provide isolation patterns. Provide software engineering teams with mechanisms to isolate COTS software. The goal is to, at a minimum, make it compliant and reduce incident impact by design. That way, our engineering teams don’t have to beg vendors to implement our organization’s special snowflake security requirements (what the rest of the world would consider “bizarre”) or fork open-source software (OSS) to implement those requirements ourselves.
8. Conduct user research.6 Make the more secure / safer way the more delightful way, too: faster, easier, simpler. By conducting user research – taking the time to understand our users’ goals, constraints, workflows, and emotional journeys in a given activity – we can tailor the solutions we build and implement to help them achieve what they want in a safe way.
As one relatively common example, rolling out SSO can result in a delightful experience for engineering teams because all their apps are now in one place; the user just needs to SSO into their apps for access rather than maintaining N separate accounts (reducing the number of steps)7. It means we’ve improved security while boosting speed and ease of use. Our business will be dead chuffed.
Towards better cybering
I mentioned that I’ve heard people sincerely describe the CISO role as one of the hardest jobs in corporate America. They should be delighted to hear that the CISOs I know who follow at least some of the above seem to enjoy their jobs much more than those who are still attempting to use carriage whips to steer their devs.
In the model described above – and indeed in the Platform Resilience Engineering discipline I describe in my book – we get to build things and solve real problems for real humans in an empathetic way. We can behold tangible improvements, not vague successes like “improved risk coverage by X%.”8
We don’t have to feel so alone in the cybersecurity struggle. This approach gives us opportunities to learn from our platform engineering and SRE colleagues – to develop a collaborative vibe rather than a combative one. These teams want to help us solve security problems; they don’t want to create or implement roadblocks, but we shouldn’t either. We should want sustained resilience – because that means we can sustain our organization’s success.
tl;dr Paved roads, not roadblocks.
Thanks to Lita Cho and Gregory Poirier for feedback.
The exception is when the cybersecurity program also covers content moderation, including screening for sexual and child abuse; that is truly one of the most difficult jobs in corporate America. ↩︎
Instead, we want the SSB of “Science & Sensemaking, Bitch [affectionate].” ↩︎
What especially troubles me is when I give this example, many security leaders do not know what queues or message brokers are… ↩︎
I know the Talos worshippers will be offended but like, really, he deserves a place alongside of Akatosh, a time lord dragon?? To be clear, I’m vehemently not on Team Thalmor and whatever they want to do with the Towers, but I’m also not really down with monarchy and how effective was Tiber’s unification of Tamriel in the long run, anyway? He’s basically a war lord that fled a climate crisis only to completely reshape an entire ecosystem because his people hated jungles and fuck the people already living there, right? Imperial and Nord stans, come at me. ↩︎
There are other security leaders who are excited by this prospect but still filled with disbelief. And a select few who are like “hells yeah, show me the way.” That tends to make the project (or conversation) more fun. ↩︎
There are eight opportunities described, which mirrors the eight divines quip from earlier because I’m a ho for foreshadowing. ↩︎
I refer to this goal of reducing the number of steps our users must perform as “What Would Gilbreth Do?”, which I introduced in my book. ↩︎
There are still some wayward engineering leaders who believe code coverage is a good metric, but way way more cybersecuritiy leaders who only recently discovered it as a metric and love it. This is why I will soon flee to the woods to live in a cottage and forsake my role as cybersecurity Cassandra. This also makes the eighth footnote, which, per footnote 6, is yet again a reprisal of the motif. ↩︎