Basalt is the end-to-end platform to evaluate, monitor and improve your agents.
Compare models, prompts, or agents. See what performs best - in minutes, not days.
Spot errors with AI evaluators. Choose from 50+ templates or design your own.
Iterate on prompts and multi-tool agents directly from the UI
Trace your agents & monitor usage directly in production
Evaluate your workflow at each step to ensure Agent quality
Set criteria and get alerts for production errors
Co-pilot
Multi-model
SDK
Versioning
Co-pilot guidance
Co-pilot improvements