Agents are the new API (almost)

I've been thinking a lot lately about all the different ways to build and implement AI agents in small to medium sized businesses (SMBs). As the team iterated through all the different shapes an agent could take on and what roles set them up for the most success, we saw a pattern emerge. We originally set out to develop somewhat general purpose agents that companies outside of our industry could deploy and realize the impact of almost immediately. But as we began to develop these agents, we found there was still quite a lot of tailoring that had to happen for it to integrate cleanly with our various systems of record. And the more tailored it was, the less general it was, etc. The dilemma was a real headscratcher. How do we build general purpose agents that are broadly applicable across software stacks and just work. There had to be a trade in there somewhere.

Around this same time, a separate project attempting to turn natural language prompts into structured API calls collided with the work we were doing on agents. The thesis was pretty simple - pass your unstructured query through a LLM acting as a filter who will then format a payload that can be sent and accepted by your target service. With enough context, the LLM would be able to understand the shape of the target service endpoint, organize the input data from the natural language query, prompt the user for any missing or forgotten data, and fire off the request on behalf of the user.

We began to see the overlaps in the two projects and the inherent flaws that each brought to the table. In the first project, the fear was that for a given agent role (sales assistant, production scheduler, etc.), we would doom ourselves to drown in technical debt with each new customer we took on. The source code for agents would be dominated by application specific logic - the majority of which would certainly be applications they did not use. This shared codebase would also create CI/CD issues in that patching a bug or adding new features for one customer could have an impact on the rest of the customer using that same image. Additionally, any time the API shape changed for any of these applications, the agent could be rendered useless. Similarly in the second, the LLM had to be a subject matter expert on every API surface that it interacted with and stay up to date as those surfaces evolved.

The pivotal moment was when we had the idea to have an agent on each side of the transaction. You can think of this analogous to a client server architecture. The calling agent (client) sits adjacent to the end user and intakes their intent, plus any supplemental data or information retrieved. The answering agent (server) sits adjacent to the service and is a subject matter expert on the capabilities and requirements of that service. When the user prompts the calling agent, this agent no longer needs to be an expert on the service being called, rather aware of how to contact the answering agent. This is what we call "brokering a transaction". The calling agent and the answering agent exchange authentication (as needed), requirements, and information. If the calling agent meets all the necessary requirements, the answering agent fulfills the request (via the service API) and returns a response. The calling agent then returns that response to the user and the transaction is complete. If the calling agent does not meet all the requirements, and cannot by its own means meet the requirements, it will return a thorough explanation to the calling user in natural language.

While we increased the number of agents in the overall transaction, we reduced the level of complexity at which the users agent has to operate at. The users agent no longer needs to be both a SME on every tool it wants to use and a generalist of the users preferences and methods.

This is best illustrated through an example. In this example, the user asks its personal assistant agent to schedule a service appointment at the car dealership. The calling agent sifts through a directory of agents and finds the answering agent for said dealership. The caller explains the situation to which the answerer replies with all the information necessary to achieve the user's desired outcome. Luckily the caller knows all necessary information such as vehicle make, model, license plate number, etc. The caller reviews the user's calendar and proposes availability. Eventually the caller and the answerer agree on the details and the answerer passes them into the API for the dealership. It is a subject matter expert for said API and the request is accepted. The response is passed along to the calling agent who performs any actions it thinks is necessary based on user preferences.

This example describes the best case scenario, but what if the calling agent has incomplete information? What if the answering agent says, "We actually offer three different maintenance packages. Which one do you want?". Here we must focus on the expectations of the user. The user may want to make all decisions themselves, every time. They may want their agent to figure things out and solve problems on their own. It's important that the overall role of the calling agent is repeated here - Personal Assistant Agent. What would a human personal assistant do? They would evaluate the options based on their understood expectations of their boss (the user) and escalate accordingly based on those expectations. Back to our example, maybe the caller just recalls previous appointments, picks the same package, and never asks the user. Maybe it messages the user with all three options and simply needs a reaction on the one they want. This is all specific to the expectations of the user.

At first, we got caught up throwing the majority of our resources behind the calling agent (client). The answering agent - as previously mentioned - felt like an endless black hole of one off development. The issue we found with this approach was that we couldn't really compete with the interface, functionality, and frictionless experience frontier lab's LLM chat apps. Additionally, users already had so much built up context in these apps directly that reinventing that would simply be inefficient. Now we were stuck. Both sides of the transaction seemed inefficient to try to take ground in. And yet something about the architecture still felt right. We just needed a better way to arrange the pieces.

Around this time, I came across an article that provided a very helpful visual that applied directly to our proposed architecture. In summary, we can think of traditional SaaS applications as an immaculate dining room inside of a nice restaurant. For the last 20 years, said dining room has been tailored to deliver the best customer (user) experience - one commit at a time. But now the paradigm is shifting. Human customers are no longer the only customers. Agents are increasingly acting on user's behalf and these agents could care less how nice the dining (user) experience is. They are headed to the drive thru window to get the user's food to go. Now, to tie back into what was previously mentioned, the agent doesn't jump through the drive thru window and start ordering around the kitchen staff. That would require an understanding of the kitchen operations that the agent doesn't have and frankly doesn't care to have. Instead the agent has a conversation with the drive thru agent - someone who understands the kitchen operation and can translate desired outcome into execution steps. This drive thru agent also doesn't have to prepare the desired outcome itself. Instead it can choose to serve as the orchestrator, delegating tasks to other agents and skillsets within the kitchen.

This analogy ultimately solidified the merrit behind the architecture described and helped compress incredibly complex and cutting edge technology into a real word analogy that just made sense to people - and us! It also provided another pivot for us to lean into as we attempted to get un-stuck. This pivot was the drive thru.

In our original architecture, we had answering agents narrowly scoped to pieces of software or technical functions. Hence why we called them subject matter experts. The drive thru paradigm allowed us to rethink that requirement. The subject matter experts in our analogy are the kitchen staff, but the drive thru agent serves an entirely different purpose. Again, most often this agent serves as the orchestrater - marshaling available resources to deliver an outcome to the calling agent. With that being the function, this agent is less about technical knowledge and more about networking and communicating with other agents, both inside and potentially outside of its organization. And in order for an agent like this to be effective, it must have access to a network of agents. This is the current state of what we are building - a network of drive thru agents. Agents that can be accessed publically from an LLM app via MCP. Agents that, when given a task outside of their domain, can search for and find skilled agents to delegate said task to. Agents that your customers can access directly, unauthenticated, to get information about your business, exchange information, participate in commerce, etc.

In parting, we see one more analogy that could prove useful in interpreting a possible outcome of the future. The website. 30 years ago, the website reduced the friction of exchange between B2B and D2C integrations. It allowed customers to self-serve, 24/7. And we saw consumer behavior follow the technology, not the other way around. Drive thru agents harbor the potential to reduce this friction by yet another order of magnitude. And it is our belief that consumer behavior will follow technology again. We see a world where you perform product and service discovery directly through an LLM chat app instead of searching the web. And through a network of realtime drive thru agents, you are returned the best solution for your problem, everytime. We see a future where every small business prioritizes a drive thru agent over a website because a drive thru agent provides a consistent user experience and allows companies to compete less on marketing and more on the merrit of the offering.

At this point, I've drifted almost entirely from thinking that agents are the new API. They may be so much more - they may be the new website.

This post serves as an ongoing thesis around how the word of AI agents is evolving and where it might be heading.