Skip to content
All articles
Engineering5 min read

Model-Agnostic AI Architecture: Why Lock-In Is A Red Flag

If your AI architecture has `import openai` at the application layer, you are one model deprecation away from a fire drill. Model-agnostic by architecture is non-negotiable. Here is how it works and why most agencies skip it.

The Risk

In May 2026, three major AI providers shipped breaking changes within two weeks of each other. OpenAI deprecated an API version. Anthropic restructured pricing. Google released a new generation that obsoleted prompts written six months prior.

The teams that survived the month had one thing in common: a thin routing layer between their application code and any specific provider. The teams that didn't were on the phone with their agency arguing about who pays for the emergency upgrade.

This is why model-agnostic by architecture is non-negotiable.

What It Means

A model-agnostic architecture has three properties:

1. Application code calls a generic AI interface, not a provider-specific SDK.

```ts // Bad import OpenAI from "openai"; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const response = await openai.chat.completions.create({ ... });

// Good import { ai } from "@/lib/ai"; const response = await ai.complete({ ... }); ```

2. The routing layer can swap providers per call type without changing application code.

The library (LiteLLM, OpenRouter, or a custom 100-300 line wrapper) accepts a generic call and routes it to whichever provider is currently best for the task. Routing rules are configurable.

3. Provider keys live as environment variables, not in code.

Adding a new provider takes 30 seconds: add the env var, add a routing rule. No code changes. No deploy.

What This Costs You

The first time you build this, it costs maybe a day. Maybe two if the team has never done it before.

After that, it costs nothing. The library does the work.

What It Saves You

Every time a provider deprecates an API version: zero engineering effort. Swap the routing rule. Done.

Every time a provider raises prices: route the high-volume calls to a cheaper provider. Zero code change.

Every time a new model lands: test it against your evals. If it scores better at lower cost, swap. Zero code change.

Every time a customer asks "are you locked into one AI vendor?": show them the routing config.

What "Locked In" Looks Like

These are red flags in any AI agency proposal:

  • "We use OpenAI exclusively because it's the best."
  • "Our platform handles all the AI provider stuff for you." (Translation: their platform is the lock-in.)
  • "Switching providers would require a rewrite."
  • "We negotiated a special rate with [provider] so it's already the cheapest."

The first three are technical lock-in. The fourth is financial lock-in. Both are bad.

What "Free" Looks Like

These are green flags:

  • "We use LiteLLM as the routing layer. Here is the config."
  • "You pay model providers directly through your own accounts."
  • "We test new models against your eval suite quarterly."
  • "Our routing rules are in your repo. You can change them."

If your AI vendor cannot say these four things, they are selling you lock-in.

The Tooling

The libraries that handle this well in 2026:

  • LiteLLM. Most mature, supports 100+ providers, drop-in OpenAI-compatible interface.
  • OpenRouter. Managed-service variant of LiteLLM, can be used as the routing layer itself.
  • Vercel AI SDK. Provider-agnostic and ergonomic, but optimized for streaming UX.
  • In-house wrapper. For teams that want full control. Usually 200-300 lines.

We default to LiteLLM on most engagements. It is boring, stable, and exactly the right level of abstraction.

The Bigger Picture

Model-agnostic architecture is not just risk management. It is competitive advantage.

A team that can swap providers in hours can:

  • Optimize cost per output continuously
  • Test new model releases the day they ship
  • Negotiate with providers from a position of strength
  • Recover from provider outages without downtime

A team locked in cannot do any of that.

This is one of the seven engineering principles every Kastling engagement enforces. We do not ship architecture our clients will regret in six months.

Start an audit

Tell us what you are building. We will tell you if we can help.

A brief takes three minutes. We read every one. If there is a fit, you hear back within one business day with a scope call and a proposal. If there is not, we say so and point you somewhere better.

Email the team
Code in your repoEvals as the contractModel-agnosticNo token arbitrageIP yours at the end