This post is part of the technical series introducing Azara.
If you are keen to try out the dev features of Azara, you can sign up for the SDK release here
In the previous post, we covered why Azara is more than just a chatbot or GPT wrapper.
It bootstraps your projects to provide these features to your clients, so you don’t have to build them yourself, but allows you to extend the platform as you need to.
In this post, we cover the technical architecture from a high level, to understand how it fits together.
Hopefully this will also be helpful for those of you who are building your own systems, and looking for examples of how others have done this.
It lays the foundation for the most important section in the core components post, coming next.
High level technical architecture of the Azara platform
Azara is built on AWS and follows the well-architected framework to provide secure, and scalable services.
It consists of a few major components, namely:
API Server FASTAPI server running on AWS EC2 which provides the core agent functionality
WebUI Frontend Next.js / React Frontend running on Vercel to allow users to administer their agents
Web Widgets These are typescript web widgets which are implemented into a customers web site to provide lightweight, public facing agent access.
Plugins Composable, distributed architecture running on Celery, which integrate to 3rd party services as python or LLM tools, e.g. Calendar, Hubspot, Slack, etc.
Channels Specialized plugins which implement channels for chat/voice/email interaction with agents, e.g. Whatsapp, Slack, Telephony, Gmail, etc.
Celery Task automation Workflows are executed cheaply at scale, in a fault tolerant manner using Celery running on one or more EC2 instances. So far load testing has revealed that for most use cases, a single server suffices., and the auto-scaling doesn't get used
Dev/MLOps platform Internal Developer and Ops platform to provide a single pane of glass management of all operations, and automation, audit reporting for the entire stack. Designed to be scalable and meet regulator enterprise requirements.
AWS S3 & other services
API Server
The API Server is designed to be scalable, and fault tolerant. It consists of multiple container setup, with major components being:
FastAPI server
PostgreSQL database
Weaviate for vector database
Redis for Semantic and other caching
Celery / Flower / RabbitMQ for distributed, fault tolerant plugin / tool execution
Various built in performance and debugging support tools
Scalability and Fault Tolerance
The API server is implemented as a load balanced EC2 server, with sharding and sticky sessions at the organization level. The plugin tools and workflows are executed in a distributed, fault tolerant manner across autoscaling servers using Celery, Redis and RabbitMQ.
Load testing of a single server shows minimal CPU and ram Usage even at 1,000 concurrent users ramping up at 50/sec, chatting and executing workflows.. The production server is running on a m6i.4xlarge (16vcpu, 64gb ram). As can be seen the impact is minimal.
Uvicorn instances are running at multiples 10+ and containers are replicated for load balancing too.
Plugins
Plugins are the core of the Azara platform. They are python modules which are injected at runtime into the platform, and can be loaded dynamically either by the API server, or as an LLM tool for function calling purposes.
Additionally, these integration plugins form the backbone of the workflow tasks. Each plugin implements interfaces to the 3rd party services they wrap, and interface it either directly between the Chat Agent and the service using function calling, or as a workflow task.
Simple by design
Plugins are designed to be super simple to implement, This is because one of the greatest impediments to extending systems is inertia. When it's painful, or hard to implement code that allows integrations of 3rd party systems, then the application is often abandoned in favor of simpler systems. Because integrations dominate AI agentic platforms as they scale, we have taken great care to make this as simple a process as possible.
Plugins (and channels / scenarios etc) have the following features:
Designed to be composable. Our plugin management system will allow for multiple library versions and protect the developer and end user from version clashes.
Plugins can be consumed in multiple ways:
As an LLM custom @tool for function calling
As an executable task in a workflow
As a python dynamically loadable module
As a channel (for those plugins which implement the channel interface) for chatting via the service, e.g. an agent listening on a slack channel
Use of decorators e.g. @route to easily add metadata to each route which turns the route into both a Langchain custom tool (e.g. @tool) but also implements the UI data needed to make the route visible to the LLM and to the user or developer consuming the plugin.
Plugins are a single module, though they may have an associated requirements.txt. We do parse the AST of the plugin to process imports etc in a smart manner.
Plugins are deployed by pushing them to the plugins repo, which then publishes them to the servers in a continuous delivery methodology.
Plugins implement multiple levels of authentication. This tends to be one of the more painful parts of integrating many services, so our plugins provide multiple mechanisms to simplify this.
No auth
API key
Access key + secret key
1-click OAuth (social login)
A plugin may have default authentication credentials configured, however a developer or user, may override these at the instance level to provide fine grained use of plugins, e.g. comparing data between 2 accounts in a workflow.
We are also currently working on a plugin creator and an AI agent which will write plugins given the url of a python library in Github for that service. These are currently undergoing QA/QC and will be available in Q4 2024.
Channels
Channels are specialized integration plugins.
They include additional methods implemented on a subset of the integration plugins, e.g. Whatsapp, GMail, WebWidget (WW), Slack, etc. which provides a webhook to allow the plugin to be a conversation channelhannel to an agent in the chatroom.
That’s the quick and lean high level architecture – in the next blog post, we will dig into the core components which make up the composable architecture of the platform: The Plugins, Channels, Workflow, Agentic Scenarios.
Till next time …
Comments