The product manager’s role in agentic AI innovation

Written by
Published on
September 22, 2025
About Basalt

Unique team tool

Enabling both PMs to iterate on prompts and developers to run complex evaluations via SDK

Versatile

The only platform that handles both prompt experimentation and advanced evaluation workflows

Built for enterprise

Support for complex evaluation scenarios, including dynamic prompting

Manage full AI lifecycle

From rigorous evaluation to continuous monitoring

Discover Basalt

The product manager, master builder of agentic AI: accelerating innovation without coding


The emergence of AI agents marks a profound transformation in the software world , where these agents do not merely enhance existing tools but actively replace them , redefining digital products. In this new era, the role of the product manager (PM) becomes strategic and central to AI agent development, especially regarding non-technical aspects. Building AI agents raises new questions, risks, and organizational transformations, requiring a different mindset for product design, team collaboration, and value measurement. PMs are now positioned as true leaders in defining how AI delivers value, ensuring alignment with customer needs and company strategy.

 

Prompt engineering: the PM’s “superpower”


Prompt engineering is the art of designing and refining instructions to effectively guide large language models (LLMs). For product managers, it is a genuine “superpower” , enabling them to prototype, test, and iterate rapidly on AI use cases without writing a single line of code . This skill reduces reliance on engineers for testing or prototyping, improves the quality of AI-based features, and bridges the gap between business needs and LLM capabilities .

Well-executed prompt engineering translates product intent into actionable AI behaviors. It is a strategic lever allowing PMs to lead AI initiatives and accelerate product innovation. By mastering prompt design, PMs can swiftly transform ideas into working AI demos, which speeds up prototyping and validation and drastically shortens innovation cycles. Carefully crafted prompts reduce hallucinations, errors, and unpredictable AI behaviors, producing more reliable outputs aligned with user expectations. Moreover, prompt engineering acts as a universal language that facilitates cross-functional communication between product, design, and engineering teams.

User feedback and robust feedback loops


Building a performant AI agent involves more than just monitoring its output; it requires a systematic feedback loop that turns every failure into a concrete improvement opportunity . A robust feedback loop is essential for continuous AI agent improvement, converting failures or edge cases into repeatable tests that help the agent learn, adapt, and become more resilient over time.

Evaluating an AI agent goes beyond checking whether the output “looks good”; it involves ensuring it fulfills the user’s request, is factually accurate, safe, brand-compliant, works across scenarios, and remains consistent in production . PMs and non-technical team members can now fully contribute to, and even lead, AI feature evaluation thanks to modern tools and no-code structured workflows.

These no-code tools allow PMs to:

  • Define and apply targeted evaluation criteria at scale : Use pre-built evaluators or “LLM-as-a-Judge” style prompts to assess relevance, factuality, tone, clarity, hallucinations, structural format, and product alignment.
  • Generate structured evaluation reports : Establish a baseline for model behavior, visualize scores across iterations, and share results with teams without spreadsheets or manual tagging.
  • Track regressions and monitor quality in production : Set thresholds, receive alerts when outputs degrade, and maintain a continuous feedback loop.

The evaluation process fits within a five-step cycle: define test cases, configure evaluators, run agents, analyze results, and iterate . Every production failure should be logged with full context (input, output, evaluation) and immediately added as a new test scenario . The goal is to turn every failure into a reproducible test and resolve it swiftly . Continuous AI system monitoring is crucial because the real world is dynamic, and even high-performing models may degrade over time due to data or concept drift. Evaluation and monitoring form a virtuous cycle , where insights from production monitoring refine offline evaluation criteria. Evaluation is not a one-time event but an iterative process throughout the LLM product lifecycle.

Stripe’s approach: bottom-up experimentation and continuous feedback


Stripe’s approach exemplifies how PMs can accelerate adoption and business impact of AI agents. Stripe adopted a fast, bottom-up approach to AI agent deployment. Small teams of engineers rapidly prototype new AI features and roll them out across the company, letting real usage reveal the most relevant use cases.

This experimentation-driven mindset enabled Stripe to move quickly, continually refine agents based on user feedback , and drive adoption by technical and non-technical teams alike. AI agents are integrated into Stripe’s core products, such as AI assistants in Stripe Radar and Sigma, allowing users to interact with complex systems in natural language, breaking down technical barriers and broadening access to advanced capabilities . Real value comes from widespread adoption: by making agents usable by all, Stripe ensures its AI features have a measurable commercial impact . Deployment doesn’t stop at launch; Stripe continuously measures adoption, collects user feedback, and iterates on agents to improve their performance and reliability. Using AI agents not only makes products more powerful but also accelerates internal productivity and transforms product team workflows .

Basalt - Integrate AI in your product in seconds | Product Hunt