Guide for engineers and architects : building reliable AI agents in production

Written by
François De Fitte
Cofounder @Basalt
Published on
July 20, 2025
About Basalt

Unique team tool

Enabling both PMs to iterate on prompts and developers to run complex evaluations via SDK

Versatile

The only platform that handles both prompt experimentation and advanced evaluation workflows

Built for enterprise

Support for complex evaluation scenarios, including dynamic prompting

Manage full AI lifecycle

From rigorous evaluation to continuous monitoring

Discover Basalt

Introduction

Artificial intelligence has profoundly transformed software development, with AI agents now replacing and reshaping digital products. Beyond simple chatbots that merely respond to questions, modern AI agents can reason, take actions, manage context, and interact with other tools to solve complex business problems. Building production-quality AI agents that are reliable, scalable, and maintainable requires a new engineering discipline. While AI may seem magical, experience shows that fundamental engineering principles remain essential, even as large language models (LLMs) become exponentially more powerful.

Beyond the magic: The need for robustness in production

Many products branded as “AI agents” are essentially deterministic code with LLM calls inserted at key points to make the experience feel “magical.” The classic agent approach, giving an LLM a goal and a set of tools, then looping until the goal is reached, is often insufficient for production demands.

Builders often start with plug-and-play frameworks that accelerate initial development. However, these frameworks typically hit aquality ceiling of 70–80%, which is inadequate for most customer-facing features. Exceeding this threshold requires regaining control and redesigning the architecture, often forcing teams to start over. Effective agents are primarily composed of software engineering components, not just prompt loops.

The “12-Factor Agents” principles address the critical question: What are the foundations needed to build LLM-powered software that is robust enough for production? These principles aim to make AI software more reliable, scalable, and maintainable.

The anatomy of a production agent: a core loop

At the heart of every AI agent is a fundamental loop consisting of three main steps:

  1. The LLM determines the next workflow step, producing a structured JSON output (a “tool call”).
  2. Deterministic code executes the tool call.
  3. The result is appended to the context window.

This loop repeats until the next step signals “done.” The initial context can be a user message, event, or webhook.

Key factors for reliable agents

1. Natural language to tool calls

Since LLMs are inherently limited to “text-in, text-out,” an agent’s ability to trigger external tools is crucial to extend its capabilities. This allows the agent to interact with the outside world beyond mere conversation. For example, when asked about the weather, the agent formats a tool call like get weather("San Francisco"). External software executes this action, retrieves real-time data or calls APIs, and returns the result as an “observation” the agent uses to generate its final response. This transforms agents from passive responders into active, world-aware problem solvers.

2. Own your prompts

Prompt engineering is the craft of designing and refining instructions to guide LLMs effectively. For product managers, engineers, and architects, mastering prompt engineering is essential to prototype, test, and iterate AI use cases efficiently. Clear, contextual, and well-structured prompts reduce engineering dependencies, improve output quality, and ensure reliability and reproducibility.

3. Own your context window

Unlike traditional bots, true agents have memory. Managing context means tracking conversation history and remembering prior actions to adapt behavior accordingly. This “memory” is vital for multi-step workflows and personalized experiences, avoiding repetitive questions and maintaining user journey consistency. A genuine agent leverages all accumulated information as active memory within its reasoning loop.

4. Tools are just structured outputs

Interacting with tools is simply generating structured outputs, usually JSON, that deterministic code executes. This API-based integration enables agents to fetch data, trigger actions, update records, or automate external processes.

5. Unify execution state and business state

Though not detailed explicitly in the source, this principle implies maintaining alignment between the agent’s internal execution state (where it is in the workflow) and the actual business state it addresses, preventing inconsistencies or inappropriate actions.

6. Own your control flow

Acting as the “conductor,” control flow manages the sequence of actions, handles exceptions, and ensures smooth end-to-end operation. It orchestrates branching paths, fallback scenarios, and escalations to humans when needed. This orchestration is essential for real business workflows beyond simple Q&A sessions.

7. Small, focused agents

Multi-agent orchestration, where specialized agents collaborate to divide tasks and leverage expertise, enables parallel processing and robust solutions for complex, multi-step business processes. Designing modular agents with clear objectives is preferable over monolithic, all-in-one agents.

A modular approach to AI integration

The fastest path for developers to deliver high-quality AI software to customers is to integrate small, modular agent concepts into existing products rather than rewriting from scratch. These modular concepts can be defined and applied by most qualified software engineers—even those without deep AI experience.

Conclusion

Building reliable AI agents for production goes far beyond the “magic” of LLMs. It requires a deep understanding of engineering principles and rigorous application of techniques that guarantee reliability, scalability, and maintainability. The “12-Factor Agents” provide a roadmap to achieve the production-grade quality indispensable for real-world AI systems.

Basalt - Integrate AI in your product in seconds | Product Hunt