Chat Flow
This document explains how Coqui assembles chat context for a single turn, how conversation summaries are created and re-injected, and how that differs between the REPL and API execution paths.
It focuses on five distinct context channels:
- The system prompt assembled for the current turn.
- Persisted conversation history loaded from storage.
- Conversation summaries persisted back into history.
- Persistent memory injected into the system prompt.
- The current user input for the new turn.
The goal is to make the ordering explicit so it is clear when summaries enhance context and when they can compete with recent conversation.
Core Rule
For a normal turn, the provider sees context in this order:
- A fresh system prompt for the current turn.
- Loaded non-system conversation history.
- Any persisted conversation summary message that is part of that history.
- The preserved recent messages that were kept after summarization.
- The current user message for the new turn.
This means the summary is injected before the preserved recent conversation, and the current user message is still the last message in the provider payload.
Main Components
| Component | Responsibility |
|---|---|
AgentRunner | Loads history, decides whether to summarize, builds the agent, and runs the turn. |
OrchestratorAgent | Builds the current system prompt, including memory, project, toolkit, and notification context. |
SessionStorage | Persists messages and turns, reloads conversation history, and hides summarized rows from future loads. |
ConversationSummarizer | Replaces older history with a summary while preserving recent user turns and their associated replies. |
MemoryStore | Stores long-lived memories, including session_summary memories created during summarization. |
MemorySummarizer | Compresses stored memories into # CORE MEMORIES for system-prompt injection. |
AbstractAgent | Builds the final provider payload: system prompt, history, then current input. |
End-to-End Turn Flow
The normal turn path starts in AgentRunner::doRun() and ends in AbstractAgent::run().
System Prompt Assembly
The system prompt is rebuilt for each turn by OrchestratorAgent::instructions() and then wrapped by OrchestratorAgent::getSystemPromptText().
Prompt assembly order
OrchestratorAgent::instructions() builds the base instruction text in this order:
- The role prompt or orchestrator prompt.
- Deferred toolkit hints.
# CORE MEMORIESprepended at the start of the instructions block.# KEY CONTEXT REMINDERappended at the end of the instructions block.- Active project context appended after memory context.
- Notification prompt section appended at render time when present.
OrchestratorAgent::getSystemPromptText() then wraps those instructions with SystemPrompt in this order:
# IDENTITY AND PURPOSE# ITERATION BUDGETwhen the role has a finite iteration cap# TOOL USAGE RULES
Unlike tool schemas, toolkit guidelines are embedded in the system prompt text. Tool schemas are sent separately through the provider tool payload.
History Loading
SessionStorage::loadConversation() reconstructs the conversation from persisted messages ordered by created_at.
Important rules:
- Only rows with
is_summarized = 0are loaded. - Persisted
systemmessages can still exist in storage, butAbstractAgent::run()skips system messages from loaded history when building the provider payload. - Persisted summary messages are stored as
usermessages, notsystemmessages, so they survive history injection. - The current user prompt is stored before the turn runs, but the in-memory payload still appends the current prompt explicitly as the final input message.
That produces a clean separation between the fresh system prompt for the current turn and the stored conversational history from prior turns.
How Summarization Works
ConversationSummarizer::summarizeAndPersist() is the main path used by AgentRunner::autoSummarizeIfNeeded().
Trigger
Auto-summarization can fire before the turn when either of these thresholds is crossed:
- Estimated token usage exceeds
agents.defaults.context.autoSummarizeThreshold. - User-turn count exceeds
agents.defaults.context.autoSummarizeTurnThresholdwhen turn-based mode is enabled.
There is also an in-loop fallback through SummarizePruningStrategy if the conversation still exceeds budget during the agent loop.
This is separate from the budget-exit threshold. Auto-summarization decides whether to compress history before or during the turn; budget exit decides whether the current iteration is close enough to the context window that the agent should wrap up. The budget-exit check uses the latest provider-reported usage for the current iteration, not a cumulative session-total counter, and it coexists with the normal maxIterations limit.
Split behavior
The summarizer does not keep the last N raw messages. It keeps the last N user turns.
splitConversation() works like this:
- Find all indices where the message role is
user. - Pick the Nth-from-last user index as the cut point.
- Summarize everything before that cut point.
- Preserve everything from that cut point onward.
This means recent replies and tool results are preserved only if they occur at or after the preserved user-turn boundary.
Persisted summary behavior
After summarization:
- Old rows before the cut point are marked
is_summarized = 1. - A new summary message is persisted as a
usermessage. - The summary marker tells the model to treat the summary as background context and focus on the most recent messages below it.
- The summary is also stored in
MemoryStoreas asession_summarymemory entry.
Provider Payload Order
AbstractAgent::run() constructs the final provider conversation in a strict order:
- Add a fresh
SystemMessagecontaining the current turn’s system prompt. - Inject loaded history, skipping any history messages whose role is
system. - Append the current input message.
That final sequence matters more than database order because it is the exact payload sent to the model.
Practical implication
This already satisfies the primary history-ordering requirement:
- The summary appears before the preserved recent messages.
- The most recent conversation remains available after the summary.
Memory Interaction
Summaries affect the model through two separate channels.
Channel 1: Conversation history
The persisted summary is inserted into message history as a user message. On the next turn it appears before the preserved recent messages.
Channel 2: Persistent memory
The same summary content is also saved as a session_summary memory entry. MemoryStore::getCoreSummary() includes active memories across all areas, and MemorySummarizer::getSummary() compresses those memories into # CORE MEMORIES.
That means a session summary can reappear in the system prompt even before history is injected.
Why Focus Can Still Drift
Even though the ordering is correct, the model can still appear to drift toward older work for a few reasons.
1. Summary content can be seen twice
The summary can appear once in history and again through session_summary memory injection into # CORE MEMORIES.
2. Memory is injected before history
# CORE MEMORIES is prepended to the instruction block before any history is appended. This gives persistent memory a very prominent position in the current turn.
3. Recent preservation is based on user-turn count
The summarizer preserves the last N user turns, not necessarily the last N detailed tool or assistant messages. If a lot happened just before the cut point, some detail can move into the summary sooner than expected.
4. Tool-heavy history is compressed
Very long individual messages are truncated before summarization. The summary is intended to preserve outcomes and next steps, not full tool output fidelity.
REPL and API Differences
The core chat assembly path is the same in both modes. The differences are mostly at the entry and observation layers.
| Concern | REPL | API |
|---|---|---|
| Turn execution | Synchronous in the REPL process | Started by the API server and streamed through SSE |
| Observer | TerminalObserver streams to stdout | SseObserver streams structured events |
| Chat-context assembly | Same AgentRunner and AbstractAgent flow | Same AgentRunner and AbstractAgent flow |
| Background execution | REPL can create tasks but does not execute them | API server executes tasks, loops, schedules, and webhooks |
For standard prompt turns, the ordering of system prompt, history, summary, preserved recent messages, and current input is the same in both modes.
Background Tasks and Loops
Background tasks and loop stages use the same summarization and prompt-building primitives when they run a turn, but they differ operationally:
- Background tasks execute in separate processes managed by the API server.
- Loop stages run as separate background task sessions.
- Each stage gets its own execution session, but loop work can still share artifacts, todos, and sprint context through the work-scope session model.
Those differences change where chat context is sourced from, but they do not change the internal ordering inside a single turn once AgentRunner starts running.
Verified Guarantees
The current implementation already guarantees the following:
- The summary is inserted before the preserved recent messages in history.
- The preserved recent messages remain available after summarization.
- The current user input is appended after both the summary and the preserved recent history.
- Old summarized rows are hidden from future history loads by
is_summarized = 1.
Known Attention Risks
The following are not ordering bugs, but they can still affect perceived focus:
session_summarymemories can make old summary content reappear in# CORE MEMORIES.- Core memory injection happens before history, so memory can be more salient than expected.
- Preserved recency is measured by user messages, not by total conversational detail.
- Summaries intentionally compress detail, especially large tool outputs.
Reading the Next Turn After a Summary
After a summary is created, the next turn effectively looks like this:
System prompt for current turn
- role instructions
- core memories
- key context reminder
- project context
- notification context
Loaded non-system history
- conversation summary user message
- preserved recent messages
Current turn input
- latest user messageThat is the actual model-facing shape to keep in mind when diagnosing post-summary focus loss.