AI Agents Python Banking
12 min read AI Automation

How to Build Human-in-the-Loop for AI Agents (Practical Guide)

Most AI implementations fail when they encounter edge cases the models weren't trained on. This guide shows how to strategically pause automation workflows for human approval - keeping the benefits of AI speed while preventing costly mistakes in banking, healthcare, and other high-stakes applications.

The Oversight Problem in AI Automation

Every business automating processes eventually hits the same wall - some decisions are too risky to fully delegate to AI. Banking agents can check balances automatically, but should they transfer $50,000 without human approval? Medical diagnosis systems might suggest treatments, but should they proceed without doctor review?

This is where human-in-the-loop patterns become essential. By strategically pausing workflows at predetermined risk points, businesses maintain automation efficiency while preventing costly mistakes. The challenge lies in implementing these guardrails without breaking the natural flow of operations.

Key insight: Human oversight isn't about distrusting AI - it's about recognizing that some edge cases will always require human judgment. Implementing approval workflows lets you deploy AI faster since you don't need perfect automation before going live.

Method 1: LLM as Router with Pydantic

The first approach structures the agent as a decision router using Pydantic models for validation. At 4:32 in the video, we see how this works for a banking agent handling balance checks, deposits, and transfers.

The system defines an Action model with three possible types (check_balance, deposit, transfer) and implements validation rules. Any transfer over $100 automatically requires confirmation, enforced through model validators rather than relying on the LLM's judgment.

Implementation tip: By baking approval requirements into the Pydantic model (like transfer amounts >$100), you prevent the LLM from accidentally bypassing safeguards through prompt engineering.

When executed, the workflow first generates an action plan, then filters for operations needing approval. For approved actions, it proceeds immediately. For transfers over the threshold, it pauses execution and waits for manual confirmation before continuing.

Method 2: Tool Calling Interception

The second approach intercepts specific tool calls during execution. Rather than pre-planning all actions, this method lets the LLM decide which tools to use dynamically, then intercepts high-risk operations.

At 8:15 in the tutorial, we see how the system monitors tool calls in real-time. When detecting a transfer operation, it checks the amount against the approval threshold. Small transfers proceed automatically, while large ones trigger the confirmation workflow.

Key difference: The tool calling method doesn't require pre-planning all actions. This works better for complex workflows where the LLM needs flexibility in choosing tools, but still requires oversight for specific high-risk operations.

Production Implementation Patterns

While the Python examples demonstrate the core concept, production systems require more sophisticated patterns. The terminal-based approval workflow won't work when your agent runs behind an API serving web or mobile clients.

Two primary patterns emerge for real-world implementations: SSE streaming for chat applications where users await responses, and async patterns for backend workflows where approvals happen via notifications. Both share three critical components we'll explore next.

SSE Streaming for Chat Applications

At 12:40 in the video, we examine the SSE streaming pattern used in chat interfaces. When the agent identifies an operation needing approval, it:

  1. Saves the complete workflow state to a database
  2. Signals the frontend that approval is required
  3. Closes the connection while waiting for user input

The frontend displays approval buttons. When clicked, a new API call resumes execution by reloading the saved state from the database. This pattern maintains reliability even if users take hours or days to respond.

Async Pattern for Backend Workflows

For non-interactive processes (like invoice approvals), the async pattern at 18:30 uses notifications instead of streaming. When approval is needed:

  1. The workflow saves its state
  2. Sends a Slack/email notification with approve/deny buttons
  3. Pauses execution indefinitely

Clicking the approval button calls a resume endpoint with the original workflow ID. The system reloads the saved state and continues execution where it left off. This works perfectly for operations where no user is actively waiting.

Three Critical Implementation Concepts

Regardless of which pattern you choose, three concepts are essential for reliable human-in-the-loop implementations:

1. Deferred execution: Never proceed automatically when approval is needed. Save the operation for later rather than holding connections open.

2. State serialization: Persist the complete agent context - messages, tool calls, parameters - everything needed to resume accurately.

3. Stateless resume: Reload everything from storage rather than keeping in memory. This survives server restarts and long delays.

At 22:15 in the tutorial, we see how these principles combine to create approval workflows that work reliably at scale, even when approvals take days or involve multiple systems.

Watch the Full Tutorial

See these concepts in action with complete code examples in the video tutorial below. The demo at 6:45 shows the terminal-based approval workflow, while 14:20 walks through the production SSE pattern implementation.

Human-in-the-loop AI agent workflow tutorial

Key Takeaways

Implementing human oversight transforms AI agents from risky experiments into reliable business tools. By strategically pausing workflows at predetermined risk points, you maintain automation speed while preventing costly mistakes.

In summary: 1) Identify your approval triggers 2) Choose between router or tool-calling patterns 3) Implement state persistence for reliable resumption. This creates AI workflows that scale safely.

Frequently Asked Questions

Common questions about human-in-the-loop AI

Human-in-the-loop refers to strategic points where an AI agent pauses execution to request human approval before continuing. This is critical for high-stakes operations like financial transactions where automated decisions carry risk.

The system identifies cases requiring oversight (like transfers over $100) and halts until manual confirmation is received. This creates a safety net while maintaining automation efficiency for routine operations.

  • Key feature: Selective pausing rather than constant oversight
  • Works for both real-time and asynchronous workflows
  • Implementation requires state persistence for reliability

Human oversight reduces risk in AI-powered operations while maintaining automation efficiency. It allows deployment of AI agents for straightforward cases while routing complex cases to humans.

This hybrid approach speeds implementation since you don't need perfect automation before going live. You can start with human review for all cases, then gradually automate the predictable ones as confidence grows.

  • 50% automation is often achievable immediately with oversight
  • Reduces compliance risk in regulated industries
  • Provides training data to improve fully automated cases

The first method uses LLMs as routers with Pydantic models to structure output and validate actions. The second uses tool calling where the system intercepts specific function executions.

Both approaches filter for high-risk operations and implement approval workflows before proceeding. The router method plans all actions upfront, while tool calling monitors execution dynamically.

  • Router method better for predictable workflows
  • Tool calling more flexible for complex agents
  • Both require state persistence in production

Production implementations require state persistence across API calls rather than terminal input. The system saves workflow state to a database when approval is needed, then resumes execution after receiving confirmation.

This maintains reliability across network requests and user delays. The terminal demo uses Python's input() function, but production systems need to handle cases where approvals take hours or days.

  • Critical difference: State survives process restarts
  • Works across distributed systems
  • Handles delayed user responses

Three critical concepts: 1) Deferred execution - pause instead of proceeding automatically 2) State serialization - persist agent context and pending actions 3) Stateless resume - reload everything from storage rather than holding in memory.

Together these ensure reliability even with delayed approvals. The system doesn't need to maintain open connections or keep processes running while waiting for human input.

  • State serialization is the most challenging part
  • Must capture all context needed to resume accurately
  • Works across server restarts and long delays

SSE streaming works for real-time chat applications where users await responses. Async patterns suit backend workflows where notifications (Slack/email) trigger approvals.

The core difference is whether the user actively waits or receives delayed notifications. Chat interfaces use streaming to maintain responsiveness, while backend processes can wait indefinitely for approvals.

  • SSE streaming for interactive applications
  • Async for background processing
  • Same state persistence underlies both

Financial services (transfers/approvals), healthcare (treatment plans), legal (contract reviews), and any domain where AI decisions carry compliance risk benefit most from human oversight.

The pattern balances automation speed with human judgment for high-stakes operations. It's particularly valuable in regulated industries where audit trails and accountability are required.

  • Banking: Large transfers, fraud flags
  • Healthcare: Treatment recommendations
  • Legal: Contract clause analysis

GrowwStacks builds production-ready AI agents with human oversight workflows tailored to your business requirements. We implement state persistence, approval interfaces, and resume logic so your team can safely automate high-value processes.

Our solutions integrate with your existing systems and compliance requirements. We handle the technical complexity while you focus on defining the approval rules and business logic.

  • Custom approval workflows for your use case
  • Seamless integration with existing systems
  • Free 30-minute consultation to assess your needs

Ready to Implement Human Oversight in Your AI Workflows?

Every day without proper safeguards risks costly automation mistakes. GrowwStacks can implement approval workflows in your existing systems within 2 weeks, combining AI efficiency with human judgment where it matters most.