AI Agents LangChain Ollama
8 min read AI Automation

Build Local Long-Running AI Agents That Don't Lose Context

Discover how to overcome the frustrating "context window problem" where AI agents forget their progress on complex tasks. This implementation using LangChain and Ollama creates reliable checkpoints so your agents can resume work exactly where they left off.

The Frustrating Context Window Problem

Every developer using AI agents hits the same wall eventually Your agent starts strong - adding features, writing tests, making commits. Then suddenly, it starts hallucinating, rewriting files, introducing bugs. This isn't random failure - it's the context window limitation in action.

As Antropic's research shows, even as models improve, maintaining consistent progress across multiple context windows remains an unsolved challenge. When your agent restarts, it begins with fresh memory, losing all context of previous work.

Key insight:: Agents must work in discrete sessions where each restart means starting from scratch. Without checkpoints, complex tasks become impossible to complete reliably.

The core symptoms are unmistakable Your agent might:

  • Forget which features it already implemented
  • Rewrite working code with broken implementations
  • Lose track of completed tests and validations
  • Repeat work already marked as complete

Antropic's Two-Agent Solution

Antropic's breakthrough was recognizing that single-agent architectures fundamentally can't solve this problem. Their solution splits responsibilities between two specialized agents:

1. The Initializer Agent

Acts as the architect creating:

  • Complete feature lists with implementation steps
  • Structured project state tracking
  • Initial project scaffolding

2. The Coding Agent

Works incrementally using:

  • Feature lists as progress trackers
  • Git commits as versioned checkpoints
  • Test results as validation markers

Epiphany moment: By separating planning from execution, the system creates natural breakpoints where work can safely pause and resume without losing context.

Local Implementation with LangChain

The beauty of this architecture is its model-agnostic design. We've implemented it locally using:

  • LangChain 1.1 for agent orchestration
  • Ollama running quantized 3B parameter models
  • Pydantic for structured output validation

This stack delivers several advantages:

  1. Complete local operation - no API latency or costs
  2. Structured outputs prevent error accumulation
  3. Checkpoint files are human-readable JSON

Implementation tip: Always run agents in isolated directories with restricted permissions. Our demo uses simple Python functions, but the same architecture scales to complex projects.

How Checkpoints Actually Work

The checkpoint system relies on three key artifacts maintained in the agent environment:

1. Features List JSON

{   "name": "factorial",   "description": "Implement factorial function",   "steps": ["Create factorial.py", "Write tests"],   "passing": false }

2. Git History

Last 5 commits provide versioned restore points

3. Code Files

Implementation files with passing tests

When the coding agent resumes work:

  1. It checks which features list for incomplete items
  2. Reviews git history for last working state
  3. Verifies existing implementations with tests

The Coding Agent in Action

Let's walk through a complete cycle at 8:22 in the video:

1

Agent selects next incomplete feature from JSON

2

Checks git history for related commits

3

Implements code following feature steps

4

Runs validation tests

5

If tests pass, marks feature complete and commits

Key advantage: The agent doesn't need the full conversation history - just these structured artifacts provide enough context to resume work reliably.

Critical Safety Considerations

While powerful, this approach introduces serious risks if implemented carelessly:

⚠️ Shell Access

Our demo agents have full shell access - never do this production without:

  • Command whitelisting
  • Filesystem restrictions
  • Resource limits

✅ Safe Implementation

Production systems should:

  • Run in Docker containers
  • Restrict to project directories
  • Log all actions

Remember: An agent with write access can modify or delete files just as easily as it can create.

Real Implementation Results

The video demonstrates the system implementing:

  • Factorial function with unit tests
  • Fibonacci sequence generator
  • Validation test suites
100%
Features completed
2
Passing implementations
4
Passing test suites

Beyond the demo: Antropic reports this architecture successfully scales to web applications with dozens of features across multiple context windows.

Watch the Full Tutorial

See the complete implementation from 3:15 where the initializer agent creates the feature list through to 12:40 where the coding agent completes the Fibonacci implementation.

Long-running AI agent tutorial video

Key Takeaways

This architecture solves three fundamental problems in agentic workflows:

  1. Context loss between sessions
  2. Progress tracking across interruptions
  3. Verification of completed work

In summary: By splitting planning and execution while maintaining structured checkpoints, we can create reliable long-running agents that overcome context window limitations.

Frequently Asked Questions

Common questions about long-running AI agents

The core challenge is context window limitations where agents lose track of previous work when their session restarts. Without checkpoints, each new session begins with fresh memory, making it impossible to resume complex tasks reliably.

This manifests as agents forgetting completed work, rewriting files unnecessarily, or introducing errors that didn't exist in previous iterations.

  • Agents work in discrete sessions
  • No memory carries between runs
  • Complex tasks require continuity

Antropic splits the problem into two agents: an initializer that creates feature lists and checkpoints, and a coding agent that uses these checkpoints to resume work. The system maintains progress through JSON files tracking completed features and git commits.

This architecture means:

  • Initializer sets up the roadmap
  • Coding agent follows the roadmap
  • Checkpoints prevent context loss

This implementation uses LangChain for agent orchestration, Ollama as the local LLM provider (running quantized 3B parameter models), and Pydantic for structured output validation. The entire system runs locally without cloud dependencies.

Key components:

  • LangChain 1.1 - agent framework
  • Ollama - local model execution
  • Pydantic - validation and structure

The system maintains three key artifacts: a features list JSON file tracking implementation status, git commit history for version control, and code files containing the actual implementations. The coding agent checks these artifacts to determine where to resume work.

When the agent restarts:

  • Checks features list for incomplete items
  • Reviews git history for last working state
  • Verifies existing implementations with tests

Critical precautions include running in a sandboxed environment, restricting file system access, and implementing command whitelisting. The demonstration shows agents with shell access which should never be used on production systems without proper safeguards.

Always:

  • Run in containers
  • Restrict to project directories
  • Log all actions

While demonstrated with simple Python functions, Antropic reports success with larger web applications. Scaling requires more sophisticated feature breakdowns and additional verification steps, but the core checkpoint architecture remains valid.

The principles apply to:

  • Web apps with dozens of features
  • Multi-file projects
  • Teams of agents working together

The main limitations are dependency on well-structured feature definitions, potential error accumulation across checkpoints, and the need for human verification at scale. The system works best for modular tasks with clear completion criteria.

Real-world challenges include:

  • Defining features clearly
  • Maintaining checkpoint integrity
  • Human oversight at scale

GrowwStacks specializes in implementing reliable AI agent systems for business automation. We can design custom checkpoint architectures, integrate with your existing tools, and deploy secure sandboxed environments for agent operation.

Our services include:

  • Custom agent system design
  • Safe implementation
  • Ongoing maintenance

Book Free Consultation

Ready to Build Reliable Long-Running Agents?

Don't let context windows limit your agents forget their progress. Let GrowwStacks implement Antropic's checkpoint system tailored to your specific needs.