Nobody Is Talking About the Part of AI That Actually Matters

There’s no shortage of discussion around AI models and architectures, but far less focus on the underlying infrastructure and data integrity that ultimately determine whether AI can be trusted, audited, and scaled in enterprise environments.

In practice, AI is only as dependable as the infrastructure and data pipelines that support it, from data capture through to transmission and validation. That layer is increasingly where real differentiation will occur.

The reality is that AI ultimately manifests in very tangible moments: a call routed through the public switched telephone network to a patient, a hotel guest, or a customer expecting service. When that interaction happens, the determining factor is not model quality alone, but the reliability of the underlying infrastructure.

That infrastructure is telecom, and it is on the verge of becoming one of the most strategically important and contested layers in enterprise technology.

The problem with watching the model wars

The way the AI industry talks about itself, you would think the only things that matter are benchmarks, context windows, and which foundation model a startup is building on. That conversation is not wrong exactly; model capability matters. However, it is dangerously incomplete, and the incompleteness is going to catch a lot of organizations off guard.

Think about what an AI voice agent actually needs to be operational in a hospital, a school district, a hotel chain, or a mid-size retail business. It needs a real phone number. It needs a SIP trunk to carry audio from the PSTN to wherever the model is running. It needs a session border controller to handle NAT traversal, codec negotiation, and fraud prevention. It needs routing logic sophisticated enough to escalate to a human when the agent reaches its limits. All of this needs to operate under 200 milliseconds of round-trip latency, because that is the threshold below which a voice conversation still feels natural to a human being. Cross that threshold, and the agent sounds broken regardless of how smart it is.

None of that is built by OpenAI, or Anthropic, or Google. They build the reasoning. Someone else has to build the pipe; and building a production-grade voice pipe that works across every carrier, every device, every regulatory environment is genuinely hard. It takes years. It takes operational experience that does not transfer from software engineering. And it turns out, the companies that have already done it are not the ones getting written about in TechCrunch.

The intelligence layer will keep improving. Models will get cheaper, faster, and more capable on a roughly 18-month cycle. The infrastructure they run on does not follow the same curve.

The historical pattern nobody wants to apply here

We have seen this before. When the internet scaled in the late 1990s, the routing infrastructure companies, the Ciscos, the bandwidth providers, the network equipment vendors, captured value through every cycle of browser wars and dot-com churn. When mobile scaled, the companies that owned spectrum and towers stayed valuable through smartphone generation after generation. When cloud computing scaled, AWS and Azure became among the most profitable businesses in the world, not by building the best application but by owning the layer that every application ran on.

The pattern is not subtle: intelligence commoditizes faster than infrastructure. The application on top changes; the pipe it runs through is harder to build and harder to replace. Yet every time a new platform transition happens, the conversation concentrates on the intelligence layer until it is too late to get a good position on the infrastructure layer.

We are in that window right now with AI agents and voice.

Where the market actually is

The scale of what is forming here is worth sitting with for a moment, because the numbers are not incremental.

The North America UCaaS market was already at $41B in 2025 and is on track to nearly double by 2032, according to Mobility Foresights. At the same time, SME communications are growing at a 27.8% annual rate through 2030 (Mordor Intellige n ce), while adoption patterns are shifting just as quickly. Telehealth is a clear example, with 86% of physicians using it by 2021, up from just 15% two years earlier (Market.us / Newstrail).

Then there is the number that deserves more attention than it gets. Organizations see a 53% revenue uplift when UCaaS, CCaaS, and CPaaS come from a single integrated stack (Mordor Intelligence). This doesn’t imply that bundled products are marginally better. It shows that when the voice infrastructure, the contact center layer, and the programmable communications layer are built to work together, the business outcome is transformative.

An AI agent can then easily escalate from automated to humans, switch from voice to SMS, and pull conversation history across channels. That is very different from cobbling those things together from three separate vendors. The stack integration is what makes AI agents actually work in production.

The four industries that are going to prove this out

Not every vertical is moving at the same pace and understanding that sequencing matters as much as understanding the opportunity itself.

Vertical	Primary use case	Where it stands today	The real friction
Retail & SMB	AI receptionist, order tracking, loyalty outreach	Now	Essentially none — fast decisions, no compliance overhead except for PCI, immediate ROI
Hospitality	Voice booking, guest services, multi-property communications	Next	Low — GM-level decisions, no regulated data, the ROI story closes fast
Education	Emergency alerts, hybrid classroom, attendance communications	Planning	Annual budget cycles, FERPA certification — slower but near-zero churn once in
Healthcare	Telehealth routing, patient scheduling, clinical call management	Considering	HIPAA BAAs, EHR integrations, multi-stakeholder procurement — and a $9.8B market

Retail and hospitality are where this will be proven in the near future. An AI voice agent at a restaurant chain is already capturing 30% of previously missed inbound calls with a documented 760% annual return on investment.

Those are production numbers, and they are the kind of ROI that makes every other technology investment look timid by comparison. These deployments are happening right now; they are working, and they are building the reference base that will make healthcare and education conversations much easier to have in 12 months.

Healthcare is a long game and the most important one. The sales cycle is genuinely painful; HIPAA Business Associate Agreements (BAA), clinical IT security reviews, EHR integration requirements, and procurement committees that include physicians, administrators, and compliance officers who do not always agree.

That friction is real, but it’s also the reason that once a healthcare organization deploys compliant AI voice infrastructure, they almost never leave. The compliance work they have done together with the vendor, the integrations that are embedded in clinical workflows; these are switching costs that compound into something close to permanent. Healthcare is not the first win. It is the most valuable one.

The regulatory complexity that looks like a barrier from the outside is a moat from the inside. Every HIPAA certification, every FERPA audit, every PCI-DSS review that a voice infrastructure vendor clears is one more thing that a competitor has to replicate before they can even be considered. Most of them don’t bother.

The part of the story that engineering teams already know

There is a detail about how AI voice agents are actually being built today that does not get discussed in business press, but that anyone building in this space runs into immediately.

A common approach for connecting large language models to real telephone calls today involves frameworks like Asterisk, a field-proven, highly adaptable open-source framework. Used alongside technologies such as OpenAI Realtime API, Google Gemini Live, and Deepgram, Asterisk provides a flexible and reliable way to bridge AI-generated audio with the PSTN. Its AudioSocket interface, for example, enables real-time streaming between AI models and live calls, making it a strong choice for developers building voice-enabled AI systems.

There is a growing ecosystem of open-source projects and developer communities building production-ready AI voice agent frameworks on top of Asterisk. This pattern is gaining traction because it gives developers control over call handling, media routing, and integration logic in a way that many higher-level platforms abstract away.

This matters strategically because of what happens when a developer ecosystem standardizes a piece of infrastructure. Enterprise adoption follows developer adoption by roughly 18 to 24 months. The engineers building AI voice systems today on Asterisk are the architects who will specify infrastructure for enterprise deployments in 2027. And Sangoma, the company that has stewarded Asterisk for the past two decades, has the deepest relationship with that developer community and the most influence over where the framework goes next, is not a startup. It is the same organization that built it.

Red Hat built an extraordinary business by doing exactly this with Linux. Elastic did it with search. The open-source moat is one of the most durable forms of competitive advantage in technology, precisely because it is not based on locking customers in. It is so deeply embedded in how people build things that moving away requires rebuilding from scratch.

What organizations should actually be doing right now

If you are a technology or operations leader thinking about AI agent deployment, the question worth asking is not which model to use. Rather, is the voice infrastructure you are building on production-grade, compliance-certified for your industry, and capable of surviving a 5x increase in call volume when the deployment goes live?

Most of the AI voice projects that fail in production fail at the infrastructure layer. The model was strong. The latency was fine in testing. The compliance documentation looked adequate. And then a real clinical environment, or a multi-location retail chain, or a school district with 40,000 students pushed the system in ways the integration was not designed for, and the whole thing fell apart.

There is another dimension to this that most discussions about AI agents skip entirely. When an AI agent communicates with a human, or increasingly, with another AI agent, every one of those interactions generates a trail of sensitive data that has to be authenticated, timestamped, and secured across the full communication stack. Who spoke. When. What was said. Whether the voice on the line was who it claimed to be. Whether the data exchanged between agents was intercepted or tampered with in transit. These are not edge cases. They are the basic operating requirements of any AI voice deployment in a regulated environment, and they become exponentially harder to guarantee when a business is stitching together voice from one vendor, messaging from another, video from a third, and network security from a fourth.

Nobody in that arrangement owns the end-to-end chain. Nobody can guarantee where the data lives, who touched it, or whether the audit trail holds up under a HIPAA review or a breach investigation. The compliance certificate on the wall of any single vendor covers only that vendor’s slice. The gaps between vendors, the handoffs, the API calls, the data in transit between platforms, belong to nobody. That liability gap is where breaches happen, where regulatory exposure accumulates, and where AI agent deployments quietly fail the trust requirements that enterprise and healthcare customers cannot compromise on.

A single trusted provider that owns the full stack, voice, messaging, video, and network & security, does not just simplify procurement. It closes the liability gap entirely, gives businesses a single chain of custody for every interaction, and makes compliance something that is structurally guaranteed rather than manually assembled across contracts with vendors who each disclaim responsibility for what happens outside their boundary. Sangoma is one of the very few providers in the market that can make that guarantee, because it built the entire stack itself.

The organizations getting this right are the ones that treated voice infrastructure as a first-class architectural decision rather than a commodity procurement. They picked vendors with carrier-grade SLA commitments, existing compliance certifications for their vertical, and genuine operational experience maintaining voice quality at scale. They then built their AI layer on top of that foundation rather than the other way around.

The window to make that decision thoughtfully, before competitive pressure forces a rushed choice, is probably 12 to 18 months. After that, the organizations that got the infrastructure right early will have reference deployments, compounding compliance investment, and installed bases that their competitors will spend years trying to dislodge.

The AI agent economy is being built right now. And like every platform transition before it, the infrastructure layer is going to matter more, and for longer, than the intelligence layer everyone is currently arguing about.

At Sangoma, we have been building this infrastructure for over 40 years, long before anyone called it AI-ready. What we are watching now is the market finally catching up to what the voice layer was always capable of. That does not make us comfortable. It makes us more focused!

Not sure how we can help your business?

Built for your industry’s unique needs

Solutions Built for Your Industry’s Unique Needs

Need help? Get support in seconds

Join us in building the future of communications

Sangoma Blog

Nobody Is Talking About the Part of AI That Actually Matters