Capacity, Redundancy, and Risk Reduction in cloud: a practitioner's perspective
I’ll start with a rather blunt, one liner: on-premises infrastructure gets a bad reputation in cloud-first conversations. It's framed as legacy, the thing you're considering migrating away from with a lifetime of technical debt. That framing is wrong, and it leads to poor architectural decisions.
On-premises compute has genuine, enduring strengths, especially when it comes to data sovereignty, predictable capital costs, ultra-low latency for workloads that demand it, and full control over the hardware stack. For certain regulated industries, certain workload profiles, and certain cost structures, running your own infrastructure isn't a throwback to the past, it's the right answer for today in these scenarios.
The honest challenge with on-premises isn't that it's inferior, it's that it's capacity-constrained by design, because you can only run what you've bought. And right now, buying is harder and more expensive than it has been in years.
What began as a broad disruption across automotive, consumer electronics, and enterprise hardware has matured into something more structural and more concentrated: a severe shortage of the advanced AI-grade silicon and high-bandwidth memory (HBM) that modern infrastructure depends on. Samsung's memory chief warned as recently as April 2026 that significant shortages across memory products are expected to continue through at least 2027. Dell has been blunter still with their assessment in that there’s no meaningful relief expected until 2028.
The numbers behind that assessment deserve full attention. DRAM prices climbed over 300% in 2025 as data centre demand consumed around half of global memory production, up from 32% just five years earlier. DDR5 RDIMM costs in particular are projected to surge a further 100% across 2026. TSMC's 2nm fabrication capacity, which produces the most advanced AI chips on the market, is fully booked through 2028. Nvidia's latest GPU generations carry wait-lists stretching well beyond a year, with hyperscalers such as Microsoft, Google, Meta, who are dominating the allocation queue ahead of enterprise customers.
If you're planning to expand your on-premises AI or compute capacity right now, you're competing in that queue, at those prices, with those lead times. That's not a reason to abandon on-premises. It is, however, a compelling reason to be strategic about what you run there, and to have cloud available as a genuine, tested, elastic extension of your estate.
This is no secret – Azure at regional levels are subject to the same supply chain physics as everyone else. Behind resources, subscriptions, it’s like any other datacentre; racks, hosts, and power limits in physical buildings — and Microsoft is buying chips from the same constrained global supply as you are.
The difference however is that the hyperscaler’s do have a structural advantage: purchasing power. Microsoft, Google, and Meta are committing close to $700 billion combined in capital expenditure in 2026, the majority for AI infrastructure. They can secure allocation that smaller buyers cannot. But even that firepower has limits. TSMC's 3nm node — which powers today's most advanced AI chips including Nvidia's latest generations — has been running above 100% utilisation, with maintenance being deferred to sustain output.
The consequence for enterprise customers is real and immediate. Certain niche Azure VM SKUs — particularly GPU-accelerated instances — carry wait-lists or are quota-restricted in specific regions. Industry analysts project that 30 to 50% of planned 2026 data centre capacity will slip to 2028, driven by a combination of chip constraints, power grid connection delays, and raw material shortages including specialised gases essential to semiconductor fabrication. If your cloud strategy assumes unlimited, instant access to high-performance compute in your preferred region, the current reality is more complicated than that.
The point isn't to undermine confidence in Azure — it remains the most capable and mature enterprise cloud platform available, and Microsoft's investment commitments are genuine. The point is that cloud capacity should be understood, planned for, and architected accordingly — not assumed. Which brings us to what a well-designed hybrid estate actually looks like.
The right architecture isn't cloud-first or on-premises-first. It's workload-first. That means making a deliberate decision for each class of workload: what belongs on infrastructure you own, and what belongs on infrastructure you rent elastically.
On-premises remains the right home for workloads with consistent, predictable demand profiles; data that carries sovereignty or compliance constraints; latency-sensitive processing that cannot tolerate a network hop; and compute that you've already invested in and is running efficiently.
Cloud - specifically Azure - earns its place as the elastic layer: the capacity you reach for when demand spikes, when you need new workloads without that significant capital commitment, when you require geographic reach your datacentre can't provide, or when you need access to AI-grade GPU compute that is simply unavailable to procure on-premises right now. Given that the hardware shortage is expected to persist through 2027 and possibly 2028, cloud access to GPU capacity for many organisations can be the only viable path to running AI workloads at meaningful scale in the near term.
The critical discipline however is the bursting path that must be designed, architected and tested before you need it. Organisations that treat cloud as an overflow in an emergency can run into trouble, like encountering quota limits at exactly the wrong moment, and operate without the governance controls that make cloud economically rational. Build the integration intentionally. Know your quotas, and validate periodically.
Not all workloads carry the same risk profile, and not all Azure services provide the same resilience guarantees. The mistake I see repeatedly is organisations applying a single redundancy model across their entire cloud estate — either over-engineering commodity workloads or, more dangerously, under-engineering critical ones.
Consider the three layers of Azure redundancy and how your services should align to them:
Today, Microsoft’s partial answer to regional constraints? More regional choices. Azure's European footprint has expanded dramatically, but most customers are architecting for regional services utilised pre-pandemic.
This old mental model was simple but limited: North Europe (Ireland) or West Europe (Netherlands), with UK South and UK West for customers who needed in-country residency. Four regions, two obvious pairs. That model is now significantly out of date.
Azure currently operates or has recently opened regions across France (Paris), Germany (Frankfurt), Sweden, Norway, Poland, Italy (Milan), Denmark (Copenhagen), Austria (Vienna), Belgium (Brussels), Spain, Finland, and Switzerland in addition to the original European and UK locations. That's over twenty European regions, many with full Availability Zone support.
This expansion matters for redundancy in ways that go beyond simple disaster recovery. More regions mean more architectural options for decentralising workloads, reducing concentration risk, improving end-user latency, and satisfying data sovereignty requirements across different jurisdictions. Customers who are still pinning everything to West Europe and UK South are leaving resilience and performance on the table.
Sweden Central is a good example worth calling out specifically. It has matured quietly from a regional option into a genuinely capable tier-1 Azure region supporting Availability Zones, a broad PaaS service catalogue including AKS, Azure SQL, Cosmos DB, Event Hubs, and Azure OpenAI, and a strong compliance posture covering ISO 27001, SOC 1/2/3, and the EU attestations that procurement teams increasingly require. For UK-based customers, Sweden Central sits at roughly 20 to 30 milliseconds round-trip latency, still well within average latency tolerances, which makes it a realistic secondary or even co-primary location, not just a theoretical DR target.
|
Region |
Estimated Latency back to UK |
|
UK West |
5 ms |
|
West Europe |
8 ms |
|
North Europe |
10 ms |
|
France Central |
15 ms |
|
Germany West Central |
20 ms |
|
Switzerland North |
25 ms |
|
Sweden Central |
30 ms |
|
Norway East |
38 ms |
The broader European region expansion creates a genuine opportunity to rethink how workloads are distributed. Rather than a primary/DR pair selected years ago and never revisited, consider:
With GPU and memory shortages expected to persist through 2027 and beyond, it's more important than ever to plan both sides of a hybrid estate deliberately.
In a well-designed hybrid estate, it's about knowing which workloads belong where and having the elastic connection to cloud capacity that lets you grow without waiting for hardware.
The hardware shortage is real, it's current, and the analysts aren't expecting meaningful relief before 2028. That's not a reason to panic. It's a reason to plan and to make sure that your on-premises estate and your Azure footprint are designed to complement each other.
To learn more about this topic and your architectural options, join me on the 2nd June for our live webinar conversation, “Reducing Risk in the Cloud: Designing for Resilience, Availability and Change”, covering all of the above in depth and how Trustmarque Ultima are helping our customers every day across their hybrid cloud estates.
About the author
As the hybrid cloud & data professional services director, Chris is responsible for Trustmarque Ultima delivery teams and talking to customers on their strategy across cloud, data & AI for Azure, Microsoft Fabric, and AI Foundry, with roots in on-premises datacentre design in a career spanning 20 years.