The possibilities and limitations of Private AI

Interest in AI is growing rapidly. Organizations are experimenting, developers are building, and few people still question whether AI can deliver value. The discussion has shifted to how, where, and under what conditions AI should be deployed. And that is where things become interesting.

AI is a bit like two-part epoxy: it is a combination of model and data, convenience and risk, scale and context. As a result, the choices organizations make today have direct implications for their infrastructure, costs, and data sovereignty.

Written by
Larik-Jan Verschuren
&
Posted on
16
-
06
-
2026
2024
Written by
Larik-Jan Verschuren
&
Posted on
16
-
06
-
2026
2024

Three powers and the trade-offs they create

To understand what private AI really means, it helps to look at how the global AI landscape is structured. Broadly speaking, three major forces are shaping the market.

The United States is leading the way. American models such as OpenAI, Claude and Gemini are the most advanced and capable, but also the most expensive. They are delivered through APIs and are widely accessible. However, every query means that data leaves the organization and falls under the jurisdiction of legislation such as the CLOUD Act. Organizations that do not view this as a concern can immediately benefit from the most advanced AI currently available. Those that do see it as a concern face a fundamental choice.

China has developed open-source models that can be hosted independently. Models such as Qwen are highly efficient, readily available, deployable in a private datacenter and can be queried without data ever leaving the organization’s environment. This offers greater control and sovereignty, but requires significant hardware investments. On average, these models also trail the leading US models by roughly eight months.

Europe has Mistral, a French AI developer and provider whose models are architecturally strong and fully aligned with European legislation. Available through APIs, Mistral offers one of the most secure options for organizations concerned about compliance and data residency. However, Mistral also lags behind both the US and China, and unless Europe substantially increases its investment in AI development, that gap is unlikely to close anytime soon.

The key question is not which model is the smartest. The real question is what an organization wants to achieve with AI, what data is required to do so, and what level of risk it is willing to accept.

The addictive nature of convenience

There is something undeniably tempting about working with large language models through an API. A few lines of code and intelligence becomes instantly available. Dashboards are populated, alerts are analyzed, reports are generated. The barrier to entry is low. The results are impressive, and that is precisely what makes it so dangerously easy.

The convenience of an API creates a reflex: if something can be solved with an AI query, that becomes the default approach. Tasks that were once handled by a script built once and then effectively ran for free are now replaced by continuous API calls with associated token costs. Those costs may seem negligible on a per-query basis, but at scale they quickly become substantial and difficult to manage.

What organizations too rarely ask themselves is this: when is genuine intelligence required, and when would a traditional script or a smaller local model be sufficient? Not every alert, batch process or automated task justifies a query to a frontier model. Yet the threshold for doing so is so low that the decision is seldom made consciously.

There is another concern. Increasingly, nobody reviews the code that AI generates. It works, so why look any further? This lack of oversight and awareness is where the real vulnerability lies. Not only in terms of costs, but also with regard to data security, integrity and governance.

Data is the real asset

As mentioned earlier, the value of AI consists of two components: the model and the data. Organizations that focus solely on the model are missing half of the equation.

A powerful model without organization-specific data will only produce generic output. A powerful model that is connected to internal systems, customer information, operational data or business-critical knowledge can generate insights that truly differentiate an organization. That data is the real crown jewel. And those crown jewels are exactly what can be exposed when external AI services are used without sufficient awareness or control.

This is not a theoretical risk. Within municipalities, government agencies and educational institutions, employees are increasingly installing AI clients on their own initiative and feeding them all kinds of information: contracts, policy documents, customer data and internal analyses. Not out of malicious intent, but out of convenience and a lack of understanding about what happens to that data once it leaves the organization.

Private AI is therefore not a technical luxury. It is an organizational response to a very real risk.

Private AI: attractive but not cheap

The promise of private AI is clear: the experience of a powerful language model, fully under your own control, running on your own infrastructure, without data leaving the organization. It sounds like the ideal solution.

The reality is that this requires a significant investment. Large open-source models with more than 120B parameters demand substantial GPU memory to run. A consumer-grade GPU typically has 12–32GB of memory, while larger data center GPUs can go up to around 141GB.

Model quality is partly determined by available GPU memory: the more memory available, the larger the model that can be loaded, the wider the context window, and the better the caching performance, which improves response speed.

A full enterprise-grade private AI stack can easily cost hundreds of thousands of euros. The appeal of the outcome makes the business case somewhat uncomfortable.

That said, there are well-considered, smaller-scale alternatives. A smaller local model, deployed for a specific and well-defined use case, is often a more sensible choice than building a full-scale AI factory. It is not about the largest infrastructure, but about the right infrastructure for the right problem.

VCF 9 enables private AI

VMware Cloud Foundation 9 includes a so-called Private AI Practice: an integrated way to make GPU capacity available, segment it, and connect it to applications and containers. The platform can indeed help set up a sovereign AI environment on private infrastructure. However, it is important to understand what VCF 9 does and does not solve.

VMware has traditionally been a virtualization platform. It divides hardware into smaller, logically separated units from large physical resources. This is useful for standard workloads. But large AI models behave differently. They require as much contiguous GPU memory as possible. In that context, splitting hardware is counterproductive: the more you virtualize, the less memory is available per model.

VCF 9 therefore has a clear use case. It is suited for specific, well-defined scenarios where a smaller model needs to run within an existing private cloud environment. Think of an organization that wants to automate a targeted AI task, keep data private, and does not have the budget or scale for a full AI factory. In that scenario, VCF 9 with its Private AI Practice can deliver a platform at the push of a button that exposes GPUs, connects applications, and manages containers. That is valuable functionality.

However, for organizations that want to host large open-source models to build a broad Claude-like internal experience, VCF 9 is not the primary choice. In that case, bare-metal GPU infrastructure with maximum available memory is the logical path. That has little to do with VMware.

Technology is not ideology. The right choice depends on the use case.

Scale determines the use case

There is a clear split in how organizations approach private AI.

On one side are organizations aiming to build a full internal AI experience. They want to run a large, powerful model on their own hardware, fully sovereign, with full control over data and output. This requires serious infrastructure and a well thought-out business model. Model scale determines output quality, and quality comes at a price.

On the other side are organizations with a specific, well-defined need. They want to automate one process, analyze one category of alerts, or query one data source. For that, a large model is unnecessary. A smaller, fine-tuned local model can provide sufficient intelligence without the infrastructure overhead of a full AI platform—and without the ongoing token costs of an external API.

Between these two extremes sits the mid-market and SME segment: organizations that are thinking about their data but do not want to invest hundreds of thousands in their own hardware. For them, a hybrid approach is often the most logical path: hosting with a provider that already manages the data, offers a sovereign platform, and can expose GPU capacity without requiring a full AI factory to be built in-house.

The question behind the question

When organizations think about Private AI, they are effectively answering four questions at once:

What do we want to do with AI, and how intensive will its use be?
What data is involved, and how sensitive is it?
Which model fits the use case, and how much quality loss is acceptable?
What infrastructure is required, and what does it realistically cost?

Only once these questions are answered does it make sense to discuss platforms, tooling, and architecture.

VCF 9, bare-metal GPUs, Chinese open-source models, European APIs: these are all tools. The point is to choose the right tool for the right job.

Fundaments supports that decision-making process. Not from a preference for any specific platform, but from years of experience with infrastructure under critical workloads and the conviction that architectural choices start with the question, not the technology.

No items found.