IT admins are always off, hurrying somewhere or the other, putting out fires. It seems that their job has moved from empowering organizations to ensuring systems stay online for efficient work.

All the while managing all compliances. Only for all of it to crash and burn after some error in some system crashes the applications the end user is working on. And then everyone panics, the DevOps team, the CI/CD team, and everyone in between starts blaming IT for not doing the job well.

That should be part of the job description – “Blame the IT guys.”

It’s a meme at this point.

However, these tech teams don’t understand that IT complexity arises because of the layers and layers of applications, services, and systems running in sync with each other.

And the solution to solve this problem is as gargantuan as the problem itself. There are many fixes that IT teams employ, but can there be a definitive one, and do leaders, beyond the CIO, get the severity of it all?

Let’s venture to search for a clear answer.

Why is the IT architecture such a mess?

No, it isn’t because the IT teams love being surrounded by wires and servers, and that’s why they build equally messy systems.

The reason behind it is that there is no clear answer to writing better code, nor can it be handled by a single system. Yes, there have been attempts at improving architecture with methods like modular monolith structures and clean architectures, but these, too, increase complexity instead of decreasing it.

The sole reason is simple: There are too many applications and API calls that muddle up the process. There’s your HR software, then there’s the CRM, then there’s the ERP, the SaaS products, the finance products, so on and so forth, and then there are the native applications.

Then there are internal productions and external tools that need access to these internal tools, and the list goes on. Imagine your IT team is not just taking care of some core business function.

They’re taking care of all of it.

And there’s another layer that goes unnoticed- the humans involved in the process.

Why does IT complexity arise?

There are many layers stacked on top of each other. Systems sending telemetry reports, the APIs calling AI systems and other software, and then there are the failures.

If a single instance crashes, maybe it’s the K8 engineer’s job, but what happens when the entire application node crashes?

That’s the problem of the software engineers, the CI/CD team, and the IT team together. Imagine so many people in one room, waiting patiently for the problem to be solved. There’s going to be tension there and the possibility of: what if we can’t solve this?

There’s the CEO breathing down your neck and the users waiting for applications to go back online. If you’re a SaaS company, these downtimes are a blow to your reputation and lead to losses.

These are the stakes, very human stakes, that give rise to complexity. And it’s an organizational problem- not just the IT department’s.

Can IT complexity be solved?

Okay, this question has no set answer, like at all. Many organizations have tried and failed. Simplification is not possible, and it doesn’t need to be.

Everyone has tried simplifying, and that created limited systems that can’t be scaled. And for start-ups and SaaS companies, well, that’s not going to fare well at all.

Growth is what makes a start-up. And SaaS must be scalable and resilient.

However, the conclusion leaders force on the IT team is to reduce complexity. There are many tools and dashboards designed for it, whether that’s scrum meetings, tools like Atlas, or initiatives that reduce human error. Maybe there’s less chaos on the field when teams collaborate to solve the problem.

Yes, that is the natural choice. However, there is something deeper that operates within this system. And it cannot be solved for.

It is an immutable law of all systems: entropy.

IT complexity can be managed, not solved.

This piece has actually turned into a venture. While researching, we found many disparate solutions. Some pieces suggest using a tool or using a single vendor, but won’t these create newer complexities?

Most leaders take a brute force approach to solving complexity- let the IT guy do it. And while your CIO does cut costs for you and reduce everything universally possible, they will hit limits that aren’t physically scalable.

Yet, leaders want growth. Growth at the price of what exactly?

The answer is clear: operational efficiency.

Companies eliminate operational efficiency to save money and customer uptime, ending up in a loop where this exact thing starts affecting their business negatively. Let’s talk about Larry Tesler’s law, called Tesler’s law (go figure!)

In it, he posits, every application has an inherent complexity that cannot be removed or hidden. Instead, it must be dealt with either in product development or in user interaction.

The simplest example of this is the GUI. Organizations manage complexity while users click buttons and levers. The same is happening to your servers, and while you can use managed services, which is actually the easier option, it moves the complexity to another plate. And then they have to manage complexity.

But the point is to make leaders aware of the variables involved and what they need to do.

So we’ve got down to the awareness of it all. Eliminating complexity won’t actually make your systems smarter; computing gets rigid. It cannot be scaled.

What role does entropy play in IT complexity?

Every system experiences entropy. It reaches a state of equilibrium, i.e., from structure to non-structure, and this non-structure is essential. Balanced entropy in computing actually makes the systems efficient.

But, as applications are stacked, the entropy reaches a tipping point where the information devolves into disorder or the rate of disorder begins to increase. For example, here’s a simulation we ran.

Imagine you have 73 servers, which are 98% resilient, and then you have a 74th server with 95% resilience. The probability that at least one node in the server fails is 78%, which is the baseline. Now, imagine what happens when applications and nodes fail in succession?

It’s chaos.

And that’s why there are days when your systems crash, servers can’t be accessed, or end users experience downtime. And there’s at least one day when this happens.

Remember CrowdStrike? The cascading effect is entropy at work. Since all systems are interdependent, one failure could lead to the next. And it does this to achieve equilibrium.

In short, your systems fail because it’s the path of least resistance.

So what can help you here? As leaders, you need something that has existed before was even a thing.

It’s called Chaos Engineering.

And your CTOs, CIOs, and even Jr. Engineers, probably and hopefully, know about it.

Chaos Engineering- Pioneered by Netflix.

What is the most server-intensive task in the world? AI data centers are obviously number one. And number 2 has to be Netflix (opinion alert).

They completely changed how businesses move to the cloud. Every organization wants to create a resilient system, but how exactly?

This is the answer. It has the makings of a great strategy.

  1. It’s context-based.
  2. It asks questions that are relevant to your problem
  3. It simulates and gives probabilities of failures and weak points.
  4. It has a cool name.

So what does it do? Essentially, engineers at Netflix understood that their servers bring in a lot of people. And failure is imminent – it’s not unavoidable but imminent. A matter of time before something crashes.

What happens when 1.5 million or 10 million people log in to watch Stranger Things? Of course, Netflix being Netflix, that won’t crash them because they will be prepared. How? Using their Simian Army.

No, it’s not like the monkeys Lex Luthor uses in the pocket universe in Superman. But, a method of anticipating failure points and preparing for them. They developed this by imagining a monkey with a wrench wrecking havoc on their systems.

With this, they found vulnerabilities that they could never have anticipated by shutting down instances and entire servers to see what would break and what would remain functional. They knew that IT complexity wouldn’t be a clean solution.

Netflix knew they had to break things (virtually, of course) to see what was affected and what was not. That was 14 years ago in 2011.

Now, Kubernetes, Grafana, and other tools make it easier to handle failures, but the complexity hasn’t gone anywhere. Instead, chaos engineering might become a focal point of future software development.

As AI builds code or users delve into AI-assisted coding, what would matter the most? Identifying failure points as complexity arises.

In short, a person who can anticipate failure, create systems for it, and manage complexity. Which of course will require clear documentation- yes, documentation that cannot be replicated by AI but by someone who sees clear patterns and observes systems as they become more complex.

Maybe, the future of development is not less complexity but more complexity stuffed into efficient packets- that’s worth exploring.

SHARE THIS ARTICLE

Facebook
Twitter
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

About The Author

Ciente

Tech Publisher

Ciente is a B2B expert specializing in content marketing, demand generation, ABM, branding, and podcasting. With a results-driven approach, Ciente helps businesses build strong digital presences, engage target audiences, and drive growth. It’s tailored strategies and innovative solutions ensure measurable success across every stage of the customer journey.

Table of Contents

Recent Posts