What in the world IS an AI agent?

Emissary

Home Resources

Playground

'When' 'What' Matters More Than 'How'

21st Nov

3 mins

BLOG

No scrolling session on linkedin or twitter is complete without a few (hundred) mentions of AI Agents. Apparently, we're all building them and they're about to take our jobs, but a concrete definition seems as illusory as successful implementations. And every description of an agent sounds just like an AI-enabled software workflow. But how do you build something you can't define?

Here's how we think about it: Agents transact on outcomes, workflows transact on processes. That is, an AI agent allows users to define desired outcomes (and potentially a metric to measure deltas between delivered and desired), while workflows compel users to outline a specific process to be followed.

“Just tell me WHAT to do, not HOW to do it and I'll figure it out.”

The agentic paradigm is not an entirely new one - it's actually the basis of all deep learning. One where we provide desired outcomes for inputs, and allow models to learn representations that enable it to optimize for our desired outcomes, but it has always been a challenging one to realize - given the vastness of the search space and the lack of good evaluation data. Till now, we've majorly enabled learning & evaluation through data, leading to a very limited usage universe. But the true unlock of the agentic universe will emerge from evaluation by function, instead of data. Here's what we're excited and not so excited about:

Why agents might win over workflows

Digitizing business processes is HARD

Ask someone to write down how they do their job, like every single step. At least to a level that could generate an approximation of their current outcomes. Try this yourself- can you formalize how you REALLY do your job?

You'll find immediately that your workflows are complex, intricate and hard to formalize. Most task workflows are not formalized at all. In the few extremely structured jobs where they are, there are established workflows, and then there are practiced workflows - that is, how people are supposed to do their job or say they do their job is very different from how they actually do it. Now imagine trying to convince workers to do this, while conveying that the mere act of being honest about their workflows puts their livelihoods at risk. To even convert a basic workflow into a digitized one, it would first have to be formalized, then digitized, then monitored, managed and constantly updated by existing executors, all in the face of opposition by those same executors. This is why the hundreds of 'AI Agent' platforms that are truly just AI workflow platforms are struggling with not only retention but also acquisition.
People are great evaluators, bad generators. And they LOVE evaluating.

For most people, judging outcomes is far easier than generating work product. Ask any white-collar worker whether a specific work product they're responsible for is good or not, and they'll be able to evaluate it instantly, and even tell you exactly why in most cases. Even at medium skill levels, ascertaining the quality of work is far easier than generating high quality work.

It's also a lot more pleasurable than sitting down and writing out their workflow. That's also why we have way more critics than creators - evaluating work is much more enjoyable than doing it. From armchair coaches for football games to the millions of restaurant critics, humans truly LOVE judging so an approach to digitizing enterprise intelligence relying on aligning outcomes has an inherent advantage over a processes interface of the human in the loop being a far better supporter of the paradigm.
Exploring execution pathways unreasonable to humans becomes possible

Humans tend to have relatively high intellectual inertia. If we do things a certain way to a satisfactory level, we tend not to explore new execution pathways, even if significantly more optimal pathways exist. We also have real limitations on time, attention and information persistence. As such, an approach disengaged from the human process enables us to explore a whole expanded world of execution pathways while maintaining outcome alignment. With agents, we no longer need to follow human execution pathways, we can expand the scope of the possible to find the optimal.

Why agents might lose?

Lack of scalable evaluation

Given the larger search space of an outcomes interface and sparseness of data at the individual level, the only meaningful pathway to creating a universe of agents is creating evaluation functions instead of evaluating through data. That is, focus on digitizing the evaluator, instead of the process. This is extremely challenging, but extremely necessary, because the last decade of AI has shown us that the volume of data needed to be used for evaluation is simply unsustainable.

The previously explored alternative of federated learning - companies sharing evaluation data, would reduce their individual task-level alpha to 0, destroying any business advantage. Without scalable generalizable specialized evaluation function creation, the agentic paradigm could be forcibly reduced to a workflow paradigm that can be achieved without outcome level alignment.
Compounding executor uncertainty

Generalized models as executors are failing us. Exploring a near-infinite search space of potential execution pathways is already challenging, but doing so with highly unreliable unit executors would be pure foolishness. An agentic universe pre-supposes a world where unit level executions are guaranteed to at least human levels of certainty. This is very trivial to understand through exponentiation of uncertainty - if each step provides an 80% guarantee of desired output (the widely accepted upper bound of performance on a prompt execution), an agent traversing a 4-step pathway will have 41% success rate - something most enterprises will never be able to sustainbly keep in production and soon revert to human approaches.
Overpromise, underdeliver

A key risk with agents is managing expectations. With an outcome focused approach, one must account for the fact that unlike the past few decades of software, performance will not exponentiate then plateau, but rather start poorly and exponentially improve over time through continued evaluation and feedback. All AI systems are poor to start, but managing that expectation will be key to avoid churning the market by being once bitten, twice shy.

Where do Agents make sense? Now and Forever.

Should everything be an agent? The easiest way to answer that question is to ask yourself whether you'd plan your morning routine every day from scratch. That would be absurd. We create efficiencies by planning sporadically, executing frequently. Agentic systems could be amazing, but planning at the same cadence as executing would STILL be highly inefficient in most commercial scenarios. That's why even the most dynamic white-collar work - consulting firms working on drastically different objectives across drastically different domains, create systems and standards as far as they can - repeatable workflows and reliable, cost-efficient and consistent.

Agentic systems of the future will be best suited to spaces which require dynamic planning, which most current white-collar jobs do not. A consulting firm or strategy team might consistently use agents to explore net new pathways, but outside of that, we believe that agentic systems will focus on initiating and improving workflows, as opposed to executing them - and as such, most nontechnical people will interface sporadically with agents, while overseeing their products.

Working on AI agents and looking to build consistent unit components and executors that provide the certainty and determinism you need? Stuck dealing with compounding uncertainty blockers? Finetune your own specialized classifiers, SLMs and embedding models on Emissary today!

Why agents might win over workflows

Why agents might lose?

Where do Agents make sense? Now and Forever.

Follow Us

Company

Resources

Legal