Enterprises keep deploying conversational AI that looks agentic but behaves like a scripted chatbot with a generative coat of paint. The deterministic flow runs until it breaks, and then the system either falls out or produces an irrelevant response. And when these projects fail, the cause is usually organizational, not technical: RAND found that 84% of AI project failures trace back to leadership rather than the technology itself. The ones that succeed take a fundamentally different approach: they start agentic and layer in deterministic controls where compliance, legal, and regulatory workflows demand fixed boundaries. The architecture is hybrid by design, not by accident.
Joseph Huffnagle, Global Vice President of Field Engineering and Delivery at Parloa, leads pre-sales, solution consulting, and technical delivery for an AI-native conversational platform serving enterprise CX organizations. Before Parloa, he built and led global solution engineering teams at Five9, Dialpad, Nextiva, and OneReach.ai, contributing directly to over $300 million in revenue across CCaaS, UCaaS, and conversational AI.
"We go agentic first with deterministic flows. In some cases, you still need those. You need legal and regulatory things that are adherent to those guardrails, which are now the most important part," says Huffnagle.
One bot, many agents
Huffnagle draws a hard line between generative AI and agentic AI. "Gen AI is really cute for making fun pictures you can post on LinkedIn. Agentic, which is where we sit, becomes the workflow engine." The distinction matters because the architecture is fundamentally different.
A true agentic system does not run a single bot through a giant script. It runs one conversational interface backed by multiple subtask agents that execute work in the background. "It's one overseer and a myriad of different agents running in the background, getting all the work done."
That multi-agent architecture still requires deterministic boundaries in regulated industries. In healthcare, financial services, and higher education under FERPA, agents must reason and converse freely while following fixed flows for qualification, compliance, and data handling.
"Our agents still have to be agentic, have the full conversational capabilities, do the ingestion of data points and knowledge in real time, while still following a flow that is relevant for them." Huffnagle calls the harness around those agents the most important part of any deployment.
Parloa's own internal study finds that 96% of the companies it evaluates still run deterministic press-one-for-sales, press-two-for-support IVR systems. "They'll tell you they're this bleeding-edge AI company, and you go, well, we couldn't even get a hold of you."
One KPI, then scale
The rollout discipline that separates production deployments from permanent science projects comes down to a single question: what KPI are you trying to move? Huffnagle says the moment a client names a specific metric, the team works backward from there. "The more what-ifs become the science fair project. What we try to say is, what are you trying to solve? The second they give us that, we walk it back and give them a plan."
He advocates for low-effort, high-impact use cases that ground the integration in real infrastructure planning rather than expansive aspirations. That means understanding the client's backend, whether it is SAP or another system, mapping the data interconnectivity, defining the human-in-the-loop or human-on-the-loop topology, and designing agent memory to prevent context rot. "A lot of folks don't even understand that there's memory you need for these agents. If your prompt is 40 million paragraphs, you're going to get context rot."
Huffnagle says the projects that fail fastest are the ones that try to defeat the final boss before learning the basics. Clients arrive with giant initiatives, then keep adding scope until nothing ships. "Those are the ones in that 84% that fail. They went straight to the final boss. Learn the basics, get in, then iterate."
Measurement beyond containment
Huffnagle pushes back on containment rate as the primary success metric for conversational AI. Reducing call volume is a starting point, but it does not capture whether the agent actually resolved the customer's issue. "Did we contain 70% of your calls? No. Did we complete 90% of your calls? That should be something we're all driving for." The distinction matters because completion measures real work performed, not just interactions deflected.
For enterprises still hesitating, Huffnagle frames the risk in competitive terms. Companies that ground one or two use cases in production-grade agentic architecture create innovation in their CX stack. Those who continue treating AI as a research exercise fall years behind in a single deployment cycle.
"If you're not doing it, your competition is. And if they get to just one or two small use cases and create innovation, you're now years behind from one simple impact." The composable platforms that support modern API infrastructure, A2A, and MCP integration are the ones that let teams iterate as the market moves. "As long as you have bought into a platform that allows you to scale with the market, that's where you get it."