Production AI Stalls When Teams Treat Retrieval Design and Cost Discipline as Afterthoughts

Enterprise AI has crossed from demos into production, but the teams seeing measurable ROI are not the ones moving fastest. They are the ones that treated retrieval design, token economics, and security architecture as first-order engineering problems before writing a single prompt. The gap between a working prototype and a system that survives at enterprise scale still catches most organizations off guard, and the failure pattern is consistent: teams optimize for speed to demo rather than speed to sustained value.

Trent Mcfarlane, Senior AI Architect at Zero Click Dev, has spent the past several years taking AI from prototype to enterprise production. At Alaska Airlines, he founded the AI Operations Program and architected the AI Landing Zone that now powers over 100 application teams. At Microsoft, he led delivery of six AI product initiatives including a RAG-powered skilling engine and an early evaluation framework for LLMs. His work spans AI gateways, multi-agent architecture, identity and access control, and observability across the full stack.

"Most teams start with the fastest possible path, dump everything into a system and scale it. But production AI only works when you step back and design for structure, latency, and cost from day one," says Mcfarlane.

Back office first

Mcfarlane argues that the biggest and fastest AI wins are happening internally, not in customer-facing products. The reasoning is straightforward: adoption. "Getting a business to agree to a 3x productivity gain is much easier than a customer. A customer wants to see the productivity gain, wants evidence of it, wants people who have done it before. A business is thinking about how much that compounds and how much it saves over time."

He describes these tools as intellectual processing units for how businesses actually think and operate. Product designers, product managers, developers, and even leadership teams are already seeing workflow improvements from AI tools. The opportunity in the back office is larger because so much of business work is formatting, synthesizing, and passing structured thinking between people. "That doesn't have to be human to human anymore. And that's where we're seeing the biggest transition."

Retrieval is the product

At Microsoft, Mcfarlane worked on a recommender engine for enterprise training built on the Microsoft Learn catalog. The system needed to ingest multiple data types, including modules, courses, demo environments, and documentation, and produce AI-driven training paths for enterprise customers. The first instinct was predictable. "Everyone's going to download as much data as they can and try and process it and then look at that bill and go, oh yeah, I can't do this ten times today."

The team learned quickly that retrieval design determined everything: cost, speed, and whether the system could scale. They moved work out of the LLM wherever possible, applying chunking strategies, preprocessing pipelines, and even classic algorithms to reduce token consumption. "We're pulling out algorithms from the beginning of computer science because if we put this into a bash script, it's 100 tokens to run the script versus 1,000 tokens for the model to trigger it and do it itself."

Keeping data structured from the start, rather than dumping everything into a NoSQL store and scaling, was equally critical. "People jump straight to vector databases or Cosmos-scale setups, but the real question is how structured your data is before you ever involve an LLM." For teams building production retrieval infrastructure, those architecture choices compound over time.

Gateways are not new

On security, Mcfarlane is direct: enterprise AI security is enterprise security. "We're not inventing anything new. We're going with cybersecurity standards, we're following cybersecurity frameworks, and we work with those frameworks as they evolve."

The tools are mature. Companies like Kong, F5, and Okta are shipping production-grade controls that most organizations have not yet reached on their own. Mcfarlane says the buy case for security tooling is strong right now. "I can't build that in three months. Even if I could, I don't want to spend three months building it."

He reserves his strongest advice for AI gateways and MCP gateways, which he says too many teams still treat with unnecessary hesitation. "Gateways have existed for a long time. They're one of the most mature pieces of technology we have in software. These new gateways are not novelty. They are structural controls now being used for new workloads."

He points to projects like OpenClaw and Hermes, where entire companies are organized around securing gateway infrastructure, as evidence that the pattern matters. Combined with sandboxed development environments, gateways close the gap between experimentation and governed production. "The days of running AI workloads on your machine need to come to an end. Go to Cloudflare, get a free AI gateway, and you will be ten times more secure from day one."

The views and opinions expressed are those of Trent Mcfarlane and do not represent the official policy or position of any organization.

All articles

Production AI Stalls When Teams Treat Retrieval Design and Cost Discipline as Afterthoughts

Trent Mcfarlane, Senior AI Architect at Zero Click Dev, argues that production AI fails when teams skip retrieval design, token cost discipline, and gateway infrastructure.

Make AI Data Press one of your go-to sources on Google

Most teams start with the fastest possible path, dump everything into a system and scale it. But production AI only works when you step back and design for structure, latency, and cost from day one.

Trent Mcfarlane

Trent Mcfarlane

Back office first

Retrieval is the product

Gateways are not new

All articles

Data & Infrastructure

Production AI Stalls When Teams Treat Retrieval Design and Cost Discipline as Afterthoughts

Trent Mcfarlane, Senior AI Architect at Zero Click Dev, argues that production AI fails when teams skip retrieval design, token cost discipline, and gateway infrastructure.

Make AI Data Press one of your go-to sources on Google

Most teams start with the fastest possible path, dump everything into a system and scale it. But production AI only works when you step back and design for structure, latency, and cost from day one.

Trent Mcfarlane

Trent Mcfarlane

Back office first

Retrieval is the product

Gateways are not new

Related Stories

Enterprise AI Stalls on Data Quality While Boards Keep Asking About the Model

Software Architecture Fundamentals Matter More Now Than at Any Point in the Industry's History

Enterprises Don't Need Perfect Data To Start Preparing For AI Agents

Production Agentic AI Needs Deterministic Guardrails Not Another Chatbot Layer

The Model Is The Least Of Your Concern Once Enterprise AI Moves Past The Demo