Build Local Long-Running AI Agents That Don't Lose Context
Discover how to overcome the frustrating "context window problem" where AI agents forget their progress on complex tasks. This implementation using LangChain and Ollama creates reliable checkpoints so your agents can resume work exactly where they left off.
The Frustrating Context Window Problem
Every developer using AI agents hits the same wall eventually Your agent starts strong - adding features, writing tests, making commits. Then suddenly, it starts hallucinating, rewriting files, introducing bugs. This isn't random failure - it's the context window limitation in action.
As Antropic's research shows, even as models improve, maintaining consistent progress across multiple context windows remains an unsolved challenge. When your agent restarts, it begins with fresh memory, losing all context of previous work.
Key insight:: Agents must work in discrete sessions where each restart means starting from scratch. Without checkpoints, complex tasks become impossible to complete reliably.
The core symptoms are unmistakable Your agent might:
- Forget which features it already implemented
- Rewrite working code with broken implementations
- Lose track of completed tests and validations
- Repeat work already marked as complete
Antropic's Two-Agent Solution
Antropic's breakthrough was recognizing that single-agent architectures fundamentally can't solve this problem. Their solution splits responsibilities between two specialized agents:
1. The Initializer Agent
Acts as the architect creating:
- Complete feature lists with implementation steps
- Structured project state tracking
- Initial project scaffolding
2. The Coding Agent
Works incrementally using:
- Feature lists as progress trackers
- Git commits as versioned checkpoints
- Test results as validation markers
Epiphany moment: By separating planning from execution, the system creates natural breakpoints where work can safely pause and resume without losing context.
Local Implementation with LangChain
The beauty of this architecture is its model-agnostic design. We've implemented it locally using:
- LangChain 1.1 for agent orchestration
- Ollama running quantized 3B parameter models
- Pydantic for structured output validation
This stack delivers several advantages:
- Complete local operation - no API latency or costs
- Structured outputs prevent error accumulation
- Checkpoint files are human-readable JSON
Implementation tip: Always run agents in isolated directories with restricted permissions. Our demo uses simple Python functions, but the same architecture scales to complex projects.
How Checkpoints Actually Work
The checkpoint system relies on three key artifacts maintained in the agent environment:
1. Features List JSON
{ "name": "factorial", "description": "Implement factorial function", "steps": ["Create factorial.py", "Write tests"], "passing": false } 2. Git History
Last 5 commits provide versioned restore points
3. Code Files
Implementation files with passing tests
When the coding agent resumes work:
- It checks which features list for incomplete items
- Reviews git history for last working state
- Verifies existing implementations with tests
The Coding Agent in Action
Let's walk through a complete cycle at 8:22 in the video:
Agent selects next incomplete feature from JSON
Checks git history for related commits
Implements code following feature steps
Runs validation tests
If tests pass, marks feature complete and commits
Key advantage: The agent doesn't need the full conversation history - just these structured artifacts provide enough context to resume work reliably.
Critical Safety Considerations
While powerful, this approach introduces serious risks if implemented carelessly:
⚠️ Shell Access
Our demo agents have full shell access - never do this production without:
- Command whitelisting
- Filesystem restrictions
- Resource limits
✅ Safe Implementation
Production systems should:
- Run in Docker containers
- Restrict to project directories
- Log all actions
Remember: An agent with write access can modify or delete files just as easily as it can create.
Real Implementation Results
The video demonstrates the system implementing:
- Factorial function with unit tests
- Fibonacci sequence generator
- Validation test suites
Beyond the demo: Antropic reports this architecture successfully scales to web applications with dozens of features across multiple context windows.
Watch the Full Tutorial
See the complete implementation from 3:15 where the initializer agent creates the feature list through to 12:40 where the coding agent completes the Fibonacci implementation.
Key Takeaways
This architecture solves three fundamental problems in agentic workflows:
- Context loss between sessions
- Progress tracking across interruptions
- Verification of completed work
In summary: By splitting planning and execution while maintaining structured checkpoints, we can create reliable long-running agents that overcome context window limitations.
Frequently Asked Questions
Common questions about long-running AI agents
The core challenge is context window limitations where agents lose track of previous work when their session restarts. Without checkpoints, each new session begins with fresh memory, making it impossible to resume complex tasks reliably.
This manifests as agents forgetting completed work, rewriting files unnecessarily, or introducing errors that didn't exist in previous iterations.
- Agents work in discrete sessions
- No memory carries between runs
- Complex tasks require continuity
Antropic splits the problem into two agents: an initializer that creates feature lists and checkpoints, and a coding agent that uses these checkpoints to resume work. The system maintains progress through JSON files tracking completed features and git commits.
This architecture means:
- Initializer sets up the roadmap
- Coding agent follows the roadmap
- Checkpoints prevent context loss
This implementation uses LangChain for agent orchestration, Ollama as the local LLM provider (running quantized 3B parameter models), and Pydantic for structured output validation. The entire system runs locally without cloud dependencies.
Key components:
- LangChain 1.1 - agent framework
- Ollama - local model execution
- Pydantic - validation and structure
The system maintains three key artifacts: a features list JSON file tracking implementation status, git commit history for version control, and code files containing the actual implementations. The coding agent checks these artifacts to determine where to resume work.
When the agent restarts:
- Checks features list for incomplete items
- Reviews git history for last working state
- Verifies existing implementations with tests
Critical precautions include running in a sandboxed environment, restricting file system access, and implementing command whitelisting. The demonstration shows agents with shell access which should never be used on production systems without proper safeguards.
Always:
- Run in containers
- Restrict to project directories
- Log all actions
While demonstrated with simple Python functions, Antropic reports success with larger web applications. Scaling requires more sophisticated feature breakdowns and additional verification steps, but the core checkpoint architecture remains valid.
The principles apply to:
- Web apps with dozens of features
- Multi-file projects
- Teams of agents working together
The main limitations are dependency on well-structured feature definitions, potential error accumulation across checkpoints, and the need for human verification at scale. The system works best for modular tasks with clear completion criteria.
Real-world challenges include:
- Defining features clearly
- Maintaining checkpoint integrity
- Human oversight at scale
GrowwStacks specializes in implementing reliable AI agent systems for business automation. We can design custom checkpoint architectures, integrate with your existing tools, and deploy secure sandboxed environments for agent operation.
Our services include:
- Custom agent system design
- Safe implementation
- Ongoing maintenance
Ready to Build Reliable Long-Running Agents?
Don't let context windows limit your agents forget their progress. Let GrowwStacks implement Antropic's checkpoint system tailored to your specific needs.