Letta AI with Sarah Wooders — Weaviate Podcast #117!

6 min read15 hours ago

Weaviate Podcast #117 — Letta AI with Sarah Wooders

Introduction and Background

The Weaviate Podcast recently featured Sarah Wooders, co-founder and CTO of Letta, a company that emerged from UC Berkeley’s research ecosystem. Sarah brings a strong systems background to her role, having previously been a PhD student in Berkeley’s RISE Lab (now known as Sky Computing) where she worked alongside notable figures such as Ion Stoica of the Ray and Spark projects. During her time at Berkeley, Sarah collaborated with her co-founder Charles Packer on the groundbreaking MemGPT project, which introduced the concept of self-managed memory for AI agents.

With Charles handling the conceptual grounding of MemGPT, Sarah contributed her systems expertise, focusing on how to build agents as services and manage persistent state in a scalable way. This collaborative effort eventually evolved into Letta, which has expanded beyond memory management to become a comprehensive framework for building stateful AI agents that maintain memory and context across extended conversations.

The Evolution from MemGPT to Leta

MemGPT represented a significant breakthrough as the first paper to introduce self-managed memory for agents. However, as the project developed, Sarah and her team recognized that effective memory management needs to happen within the framework itself rather than being left entirely to individual implementations. This insight led to Letta’s development as a general-purpose agents framework with a strong focus on context management.

The team’s core philosophy can be summarized by Sarah’s statement that “memory is everything.” How an agent manages its context window is critical to its performance, as it directly impacts what information the LLM can access when generating responses. Without good context management, it becomes impossible to determine whether issues stem from the LLM itself or from limitations in the context window. Letta addresses this by taking responsibility for memory management, context handling, and providing tools for agents to access both their immediate context and archival information.

Context Compilation and Window Management

One of the most fascinating aspects of Letta’s approach is its handling of context windows. Sarah explained that even as context windows have grown dramatically (from 4,000 tokens to over 200,000 tokens in some models), it’s often impractical to use the entire available space. When developers pack a full 200,000 tokens into a context window, it becomes extremely difficult to read through and debug issues. Additionally, recent research suggests that the impressive reasoning capabilities observed with smaller context windows don’t necessarily scale well to extremely large contexts. Large contexts also make inference slow and expensive.

Letta addresses these challenges through what they call “context compilation.” The system structures the context window with dedicated sections, including a message buffer and in-context memory. It also manages recursive summarization and maintains statistics about external data so the LLM knows when to query external sources. Rather than maximizing token usage, Letta focuses on optimizing the 30,000 or so tokens that most practical applications tend to use, ensuring the most important information is available to the model within that budget.

In-Context Memory Management

A distinctive feature of Letta’s approach is how it handles memory management. Unlike systems that rely on predetermined rules, Letta offloads memory management responsibility to the LLM itself. The model decides what information to save in its in-context memory by calling specific tools (such as “core memory replace” or “core memory append”). This creates a more general-purpose, flexible approach that improves as LLMs get better.

Memory in Letta is divided into different sections, including a human section (containing information about the user) and a persona section (containing information about the agent itself). This structure allows agents to maintain consistency in their behavior while also adapting based on interactions. For example, if a user provides feedback about how the agent should behave, the agent can update its persona memory to incorporate this feedback, effectively “learning” from the interaction in a human-readable way.

The Agent Development Environment (ADE)

To make agent development more accessible and transparent, Leta has created an Agent Development Environment (ADE) available in both their desktop application and cloud service. Unlike many other frameworks that operate as “black boxes,” the ADE gives developers clear visibility into all aspects of their agents’ operation.

The environment includes a context window viewer that allows developers to adjust token budgets and see how different parts of the context window are being allocated. It provides visibility into in-context memory and archival memories, letting developers watch as these are edited during agent operation. The ADE also supports “tool rules” that constrain agent behavior, such as requiring specific tool sequences or starting actions.

Sarah emphasized that the ADE differs from graph-based or drag-and-drop interfaces by staying closely tied to the core abstractions of LLMs. Everything configured in the ADE directly relates to how the context window is structured, making the relationship between configuration changes and context modifications transparent to developers.

Database Integration and Persistence

Letta treats agents as services with all state persisted in a database. In-context memory, conversation history, agent definitions, and tool definitions are all stored as database rows. This approach makes debugging easier, as developers can examine the exact configurations and context that led to specific results.

An interesting capability enabled by this database-centric approach is shared memory between agents. Because sections of in-context memory (called “blocks”) are persisted in the database, multiple agents can be created with shared memory blocks. When one agent updates a shared memory block, that change is reflected in the context windows of all agents connected to it. This, combined with inter-agent messaging, enables sophisticated coordination between multiple agents.

Tool Execution and Multi-Agent Systems

In Letta’s framework, everything an agent does is considered a tool call. Even sending a message is implemented as a tool where the agent specifies both its internal reasoning and the message content. This consistent approach makes the system model-agnostic and supports a wide range of agent behaviors, from workflow agents to conversational agents to multi-agent systems.

Unlike some frameworks that merely return requests to run tools, Letta handles tool execution for the developer. This creates a more streamlined development experience, as tools can be iterated on within the ADE before being attached to agents. For security, Letta uses tool sandboxes (via E2B) when running in the cloud, while local deployments run tools directly on the machine.

The tool-centric approach extends to multi-agent coordination as well. Letta implements multi-agent systems as agents talking to each other through tools. An agent can broadcast messages to multiple other agents, send messages with or without waiting for responses, and share information through the previously mentioned shared memory blocks.

Future Directions and Takeaways

Looking ahead, Sarah expressed particular excitement about agents’ ability to derive insights from data. Rather than just using RAG or traditional analytics, she envisions powerful LLMs with statefulness examining large datasets across multiple calls, similar to how a human might read through a book or analyze Excel sheets to extract meaningful information.

The conversation with Sarah highlighted several important principles for building effective agent systems. Stateful agents require thoughtful memory management within a well-designed framework. Context compilation remains crucial even as context windows grow. Giving LLMs responsibility for their own memory management creates more natural, flexible systems. Structuring the context window with dedicated sections improves agent performance. And finally, persisting agent state in databases enables more robust, debuggable agent systems that can coordinate effectively in multi-agent scenarios.

You can find the full episode of Weaviate Podcast #117 with Sarah Wooders at the following links:

YouTube: https://youtu.be/JgBKaI6MNpQ

Spotify: https://spotifycreators-web.app.link/e/e1qCzXCnrRb