Jan 29, 2026 AI Engineering 5 min read

How We Used It to Run VoIP + an AI Voice Agent

How We Used It to Run VoIP + an AI Voice Agent Managing a call center does not involve only answering the telephone. It …

S
Solomon Amsalu
Senior AI/ML Engineer

Stay in the loop

Get new posts and practical production AI notes in your inbox.

How We Used It to Run VoIP + an AI Voice Agent
Image source: Ethix Media

How We Used It to Run VoIP + an AI Voice Agent

Managing a call center does not involve only answering the telephone. It is about dependability, setting and ensuring that each call takes the right path particularly in the field of healthcare where errors are expensive and time wasted by people is a big bother.

In our project, we have been dealing with a healthcare organization, which used human agents in nearly every call. People were the subject matter of appointment scheduling, service questions, internal coordination all. The system was effective but it was very costly, sluggish to scale and it was also hard to manage as there was an increase in the number of calls.

We did not necessarily set out to substitute humans with AI. It was to create a system that copes with normal tasks effectively, that assists employees when things get complicated and maintains the reign where it is needed. Starting with telephony, not AI.


Understanding the Setup We Walked Into

The company already had a known phone system. Cisco phones dealt with customer facing numbers and the numbers were familiar to their users. It was not possible to replace everything at once and it was not necessary.

The situation was less efficient internally. The mobile phones or external lines used when calling individuals were widespread even though everyone was at the same organization. Internal VoIP extension system was lacking a proper internal system.

Meanwhile, inbound calls continued growing. Many of them were repetitive:

  • scheduling appointments
  • asking about services
  • verifying current reservations.
  • being directed to the appropriate division.

Another problem was after-hours calls. Either, calls were missed or staffing costs were charged to meet night shifts.

The system had to extend and not disrupt the current working processes.

 


 

Why We Started With a PBX, Not AI

When considering automation, it is all too easy to go directly to AI. AI does not solve issues of call routing. It does not deal with SIP, transfers and interruptions. For that, you need a solid PBX. That's why we chose Asterisk.

Asterisk provided us with call control which is reliable:

  • Answering calls
  • Playing audio
  • Recording speech
  • Dealing with transfers and queues.
  • Processing DTMF input

These are base competencies. In their absence, any layer of AI at the top is weak.

Article content

 


 

 

What Asterisk Handled

We utilized Asterisk as the core call server, the point where all calls were directed and decisions made. It dealt with two primary roles.

VoIP communication within the company.

Every employee was allocated a SIP account and extension. The internal calls were easy and quick with a softphone client. Extension to extension calls, transfers and ring groups operated as they are supposed to be handled by a PBX.

Inbound customer calls

There were still public numbers on the Cisco side. The calls were redirected to Asterisk via SIP trunking. In the eyes of the caller, there was nothing different. In the system perspective, Asterisk was the new call flow controller. This was where the AI voice agent was involved.

How the AI Voice Agent Actually Works

The AI part wasn't magic. It was a controlled, simple, foreseeable, and cyclic. An average call consisted of the following flow:

  1. Asterisk answers the call
  2. A greeting is played
  3. The caller speaks
  4. Asterisk records the audio
  5. The audio is forwarded to our backend.
  6. The backend gives an audio response generated.
  7. Asterisk plays the response
  8. The loop continues

This goes on until the call is terminated or called over to a human. On occasions where escalation had to be found, we did not query the AI to figure it out. Asterisk merely forwarded the call as would any conventional call center into a queue, extension or to an agent. Those functions of separation were significant.


Asterisk is very good in call control. It does not have to run AI pipelines, clean audio, or file conversions. Thus we inserted a FastAPI service in between Asterisk and the AI models. Asterisk handled calls. Computation was done in FastAPI.

 


 

The backend:

  • cleaned and converted audio
  • ran speech-to-text
  • texted in a RAG pipeline.
  • generated responses
  • converted the responses to audio.

This division made the system stable. Asterisk did not have to evolve every time the AI logic changed and the AI pipeline could enhance itself.

The Asterisk Features That Made the Difference

What separates a demo voice bot from a real system is how it handles people.

  • People interrupt.
  • People press buttons.
  • People change their minds mid-sentence.

Asterisk handled all of this well.

Barge-in support allowed callers to interrupt audio playback and speak immediately. When that happened, playback stopped, recording started, and the system responded without delay.

DTMF routing allowed callers to press keys instead of talking. Some users prefer simple options like “press 0 to talk to a human.” That behavior was reliable and predictable.

Because Asterisk already has mature call center building blocks, the AI became just one part of the dialplan not the entire system.

What the Company Ended Up With

At the end of the project, the company had:

  • A real internal VoIP system employees actually used
  • An AI-assisted inbound call flow that handled a large portion of routine calls
  • Clean escalation to humans when needed
  • 24/7 availability without extra staffing
  • Fewer errors and less manual repetition

Most importantly, callers were never trapped in a loop. When a human was needed, the system got out of the way.

 


Closing Thoughts

This project wasn’t about chasing trends. It was about building something that works under real conditions. AI voice systems succeed when they’re built on reliable infrastructure, designed with clear boundaries, and focused on assisting humans not replacing them. That’s how voice AI becomes useful, not frustrating.,

AI Agents RAG MLOps

Responses (0)

No responses yet. Be the first to share your thoughts.