Skip to Content
FeaturesChat Flow

Chat Flow

This document explains how Coqui assembles chat context for a single turn, how conversation summaries are created and re-injected, and how that differs between the REPL and API execution paths.

It focuses on five distinct context channels:

  1. The system prompt assembled for the current turn.
  2. Persisted conversation history loaded from storage.
  3. Conversation summaries persisted back into history.
  4. Persistent memory injected into the system prompt.
  5. The current user input for the new turn.

The goal is to make the ordering explicit so it is clear when summaries enhance context and when they can compete with recent conversation.

Core Rule

For a normal turn, the provider sees context in this order:

  1. A fresh system prompt for the current turn.
  2. Loaded non-system conversation history.
  3. Any persisted conversation summary message that is part of that history.
  4. The preserved recent messages that were kept after summarization.
  5. The current user message for the new turn.

This means the summary is injected before the preserved recent conversation, and the current user message is still the last message in the provider payload.

Main Components

ComponentResponsibility
AgentRunnerLoads history, decides whether to summarize, builds the agent, and runs the turn.
OrchestratorAgentBuilds the current system prompt, including memory, project, toolkit, and notification context.
SessionStoragePersists messages and turns, reloads conversation history, and hides summarized rows from future loads.
ConversationSummarizerReplaces older history with a summary while preserving recent user turns and their associated replies.
MemoryStoreStores long-lived memories, including session_summary memories created during summarization.
MemorySummarizerCompresses stored memories into # CORE MEMORIES for system-prompt injection.
AbstractAgentBuilds the final provider payload: system prompt, history, then current input.

End-to-End Turn Flow

The normal turn path starts in AgentRunner::doRun() and ends in AbstractAgent::run().

System Prompt Assembly

The system prompt is rebuilt for each turn by OrchestratorAgent::instructions() and then wrapped by OrchestratorAgent::getSystemPromptText().

Prompt assembly order

OrchestratorAgent::instructions() builds the base instruction text in this order:

  1. The role prompt or orchestrator prompt.
  2. Deferred toolkit hints.
  3. # CORE MEMORIES prepended at the start of the instructions block.
  4. # KEY CONTEXT REMINDER appended at the end of the instructions block.
  5. Active project context appended after memory context.
  6. Notification prompt section appended at render time when present.

OrchestratorAgent::getSystemPromptText() then wraps those instructions with SystemPrompt in this order:

  1. # IDENTITY AND PURPOSE
  2. # ITERATION BUDGET when the role has a finite iteration cap
  3. # TOOL USAGE RULES

Unlike tool schemas, toolkit guidelines are embedded in the system prompt text. Tool schemas are sent separately through the provider tool payload.

History Loading

SessionStorage::loadConversation() reconstructs the conversation from persisted messages ordered by created_at.

Important rules:

  1. Only rows with is_summarized = 0 are loaded.
  2. Persisted system messages can still exist in storage, but AbstractAgent::run() skips system messages from loaded history when building the provider payload.
  3. Persisted summary messages are stored as user messages, not system messages, so they survive history injection.
  4. The current user prompt is stored before the turn runs, but the in-memory payload still appends the current prompt explicitly as the final input message.

That produces a clean separation between the fresh system prompt for the current turn and the stored conversational history from prior turns.

How Summarization Works

ConversationSummarizer::summarizeAndPersist() is the main path used by AgentRunner::autoSummarizeIfNeeded().

Trigger

Auto-summarization can fire before the turn when either of these thresholds is crossed:

  1. Estimated token usage exceeds agents.defaults.context.autoSummarizeThreshold.
  2. User-turn count exceeds agents.defaults.context.autoSummarizeTurnThreshold when turn-based mode is enabled.

There is also an in-loop fallback through SummarizePruningStrategy if the conversation still exceeds budget during the agent loop.

This is separate from the budget-exit threshold. Auto-summarization decides whether to compress history before or during the turn; budget exit decides whether the current iteration is close enough to the context window that the agent should wrap up. The budget-exit check uses the latest provider-reported usage for the current iteration, not a cumulative session-total counter, and it coexists with the normal maxIterations limit.

Split behavior

The summarizer does not keep the last N raw messages. It keeps the last N user turns.

splitConversation() works like this:

  1. Find all indices where the message role is user.
  2. Pick the Nth-from-last user index as the cut point.
  3. Summarize everything before that cut point.
  4. Preserve everything from that cut point onward.

This means recent replies and tool results are preserved only if they occur at or after the preserved user-turn boundary.

Persisted summary behavior

After summarization:

  1. Old rows before the cut point are marked is_summarized = 1.
  2. A new summary message is persisted as a user message.
  3. The summary marker tells the model to treat the summary as background context and focus on the most recent messages below it.
  4. The summary is also stored in MemoryStore as a session_summary memory entry.

Provider Payload Order

AbstractAgent::run() constructs the final provider conversation in a strict order:

  1. Add a fresh SystemMessage containing the current turn’s system prompt.
  2. Inject loaded history, skipping any history messages whose role is system.
  3. Append the current input message.

That final sequence matters more than database order because it is the exact payload sent to the model.

Practical implication

This already satisfies the primary history-ordering requirement:

  1. The summary appears before the preserved recent messages.
  2. The most recent conversation remains available after the summary.

Memory Interaction

Summaries affect the model through two separate channels.

Channel 1: Conversation history

The persisted summary is inserted into message history as a user message. On the next turn it appears before the preserved recent messages.

Channel 2: Persistent memory

The same summary content is also saved as a session_summary memory entry. MemoryStore::getCoreSummary() includes active memories across all areas, and MemorySummarizer::getSummary() compresses those memories into # CORE MEMORIES.

That means a session summary can reappear in the system prompt even before history is injected.

Why Focus Can Still Drift

Even though the ordering is correct, the model can still appear to drift toward older work for a few reasons.

1. Summary content can be seen twice

The summary can appear once in history and again through session_summary memory injection into # CORE MEMORIES.

2. Memory is injected before history

# CORE MEMORIES is prepended to the instruction block before any history is appended. This gives persistent memory a very prominent position in the current turn.

3. Recent preservation is based on user-turn count

The summarizer preserves the last N user turns, not necessarily the last N detailed tool or assistant messages. If a lot happened just before the cut point, some detail can move into the summary sooner than expected.

4. Tool-heavy history is compressed

Very long individual messages are truncated before summarization. The summary is intended to preserve outcomes and next steps, not full tool output fidelity.

REPL and API Differences

The core chat assembly path is the same in both modes. The differences are mostly at the entry and observation layers.

ConcernREPLAPI
Turn executionSynchronous in the REPL processStarted by the API server and streamed through SSE
ObserverTerminalObserver streams to stdoutSseObserver streams structured events
Chat-context assemblySame AgentRunner and AbstractAgent flowSame AgentRunner and AbstractAgent flow
Background executionREPL can create tasks but does not execute themAPI server executes tasks, loops, schedules, and webhooks

For standard prompt turns, the ordering of system prompt, history, summary, preserved recent messages, and current input is the same in both modes.

Background Tasks and Loops

Background tasks and loop stages use the same summarization and prompt-building primitives when they run a turn, but they differ operationally:

  1. Background tasks execute in separate processes managed by the API server.
  2. Loop stages run as separate background task sessions.
  3. Each stage gets its own execution session, but loop work can still share artifacts, todos, and sprint context through the work-scope session model.

Those differences change where chat context is sourced from, but they do not change the internal ordering inside a single turn once AgentRunner starts running.

Verified Guarantees

The current implementation already guarantees the following:

  1. The summary is inserted before the preserved recent messages in history.
  2. The preserved recent messages remain available after summarization.
  3. The current user input is appended after both the summary and the preserved recent history.
  4. Old summarized rows are hidden from future history loads by is_summarized = 1.

Known Attention Risks

The following are not ordering bugs, but they can still affect perceived focus:

  1. session_summary memories can make old summary content reappear in # CORE MEMORIES.
  2. Core memory injection happens before history, so memory can be more salient than expected.
  3. Preserved recency is measured by user messages, not by total conversational detail.
  4. Summaries intentionally compress detail, especially large tool outputs.

Reading the Next Turn After a Summary

After a summary is created, the next turn effectively looks like this:

System prompt for current turn - role instructions - core memories - key context reminder - project context - notification context Loaded non-system history - conversation summary user message - preserved recent messages Current turn input - latest user message

That is the actual model-facing shape to keep in mind when diagnosing post-summary focus loss.

Last updated