spec: machine-readable export for agent conversations (#10112)#10500
spec: machine-readable export for agent conversations (#10112)#10500lonexreb wants to merge 3 commits intowarpdotdev:masterfrom
Conversation
|
I'm starting a first review of this spec-only pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This spec adds a versioned JSON export for agent conversations. The direction is clear, but several requirements are ambiguous or internally inconsistent enough to risk divergent or unsafe implementation.
Concerns
- The export setting is named two different ways.
- The
tool_callschema constrains input/output as objects even though lossless tool I/O may be scalar or text. - CLI export is promised, but redaction is defined only via the rendered UI tree.
- Sensitive reasoning export is left as an open question while the behavior defaults to including it.
- A decoder round-trip test conflicts with the V1 non-goal of no re-import/restore path.
Security
- CLI redaction needs an explicit redacted-source contract so headless exports cannot accidentally read raw content.
- Reasoning blocks need an explicit include/exclude policy and controls before implementation.
Verdict
Found: 0 critical, 5 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
|
|
||
| ### B1. UI entry point | ||
|
|
||
| The existing "Export Conversation" UI gains a format selector with two options: `Markdown` (existing, default behavior preserved) and `JSON` (new). Settings → Agents → Conversation Export adds `default_export_format` (enum, default `Markdown`) which seeds that selector. |
There was a problem hiding this comment.
default_export_format conflicts with agent.conversation_export.default_format later in the spec; choose one canonical setting key so UI, CLI, migration, and tests implement the same API.
|
|
||
| - `{ "type": "text", "text": "..." }` | ||
| - `{ "type": "reasoning", "text": "...", "duration_ms": <int> }` | ||
| - `{ "type": "tool_call", "tool": "<name>", "input": { ... }, "output": { ... }, "duration_ms": <int>, "status": "ok" | "error" }` |
There was a problem hiding this comment.
input and output to JSON objects, but lossless tool I/O also needs strings, arrays, numbers/null, and possibly structured stdout/stderr; define these fields as arbitrary JSON values plus truncation metadata, or add a wrapper shape.
|
|
||
| ### B4. Redaction is preserved | ||
|
|
||
| Any content already redacted in the displayed conversation MUST appear redacted in the JSON export — typically as a `"[REDACTED]"` placeholder string in the relevant `text` / `input` / `output` field. The export pipeline reuses the redaction state from the rendered conversation tree; it never re-reads the raw underlying source. This guarantees secrets that were stripped from the UI cannot reappear in an export. |
There was a problem hiding this comment.
| - T8. Integration: CLI `--format json --output -` and UI export produce byte-identical (or at minimum JSON-structurally-identical) output for the same conversation snapshot. | ||
| - T9. Unit: filename slug matches `<conversation_title-or-id>-<timestamp>.warp-export.json` for several title shapes including titles with spaces, slashes, and unicode. | ||
| - T10. Unit: a conversation with an inline image serializes as a `data_uri` content block. | ||
| - T11. Unit: unknown content-block type encountered during decode round-trips unchanged (forward-compat probe). |
There was a problem hiding this comment.
|
|
||
| - Should we ALSO ship YAML and JSONL? Suggest V1 = JSON only, V1.5 = JSONL (one message per line) for streaming consumption. Defer YAML unless concrete demand surfaces. | ||
| - Should image blocks support an external-file mode (`{"type":"image","path":"..."}`) for very large images, with the JSON exported alongside an `assets/` sibling directory? Suggest defer to V1.5. | ||
| - Should there be an explicit "include reasoning blocks" toggle? Reasoning may be sensitive in some workflows. Default include; offer toggle if early users ask. |
There was a problem hiding this comment.
|
Round-1 fixes for oz-for-oss review: (1) export setting unified as |
|
/oz-review |
|
I'm re-reviewing this spec-only pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This spec defines a JSON export format for agent conversations with UI and CLI parity, redaction preservation, typed content blocks, truncation controls, and telemetry updates.
Concerns
- The spec requires validation against a documented JSON Schema, but the schema itself is not included or located, so implementers and tests do not have a concrete contract to validate against.
- Several typed block shapes remain placeholders, which undermines the promised round-trip fidelity for plan steps and code diffs.
- Tool-output truncation is ambiguous now that outputs are discriminated
TypedValues that may be strings, arrays, objects, numbers, booleans, or null.
Security
- Reasoning is included by default even though the spec acknowledges users may not want it in shared exports; the export action needs an explicit per-export disclosure or override so users do not accidentally include sensitive reasoning content.
Verdict
Found: 0 critical, 3 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| - `{ "type": "reasoning", "text": "...", "duration_ms": <int> }` | ||
| - `{ "type": "tool_call", "tool": "<name>", "input": <TypedValue>, "output": <TypedValue>, "duration_ms": <int>, "status": "ok" | "error" }` | ||
| - `{ "type": "plan_step", "title": "...", "status": "pending" | "in_progress" | "complete", "items": [ ... ] }` | ||
| - `{ "type": "code_diff", "files": [ { "path": "...", "before": "...", "after": "...", "hunks": [ ... ] } ] }` |
There was a problem hiding this comment.
plan_step.items and code_diff.hunks are still placeholders, so the spec does not define enough structure to deliver the promised round-trip fidelity; define the item and hunk schemas, including line ranges and old/new content semantics.
|
|
||
| Default: large tool outputs are included in full. | ||
|
|
||
| If the user enables `agent.conversation_export.truncate_large_outputs`, each tool-call `output` exceeding `agent.conversation_export.large_output_limit_kb` (default 64 KiB) is truncated to that limit. The block additionally carries `"truncated": true` and `"original_size_bytes": <int>` so consumers can detect and re-fetch if needed. |
There was a problem hiding this comment.
TypedValue outputs; state whether the byte limit applies to serialized JSON, nested strings, or only string outputs, and whether the original type must be preserved after truncation.
| - A5. With `truncate_large_outputs = true` and limit `64 KiB`, the same tool call's `output` is truncated to 64 KiB and the block carries `"truncated": true` and `"original_size_bytes": 204800`. | ||
| - A6. The CLI `--format json` flag produces output identical to the UI export for the same conversation. The CLI consumes `RedactedConversationView` (B4.1) and runs the same exporter the UI runs. | ||
| - A6a. CLI export of a conversation containing redacted secrets produces `[REDACTED]` placeholders in exactly the same fields the UI export does (CLI redaction parity). | ||
| - A7. The exported JSON validates against the documented JSON Schema for `schema_version 1.0.0`. (Schema-decoder validation; this is NOT a re-import test — V1 has no re-import path. See Non-Goals.) |
There was a problem hiding this comment.
|
|
||
| ### B4.2 Reasoning blocks: include/exclude policy | ||
|
|
||
| V1 INCLUDES reasoning blocks (`{ "type": "reasoning", ... }`) by default. Some users will not want reasoning surfaced in shared exports, so V1 ships an explicit toggle: |
There was a problem hiding this comment.
💡 [SUGGESTION] [SECURITY] Reasoning is exported by default despite being sensitive in shared transcripts; require the export UI to show the include-reasoning state and allow a per-export override or confirmation before writing reasoning blocks.
|
Addressed all 3 important + 1 suggestion + 1 security concern from the latest oz-for-oss review:
Commit: a7b9853 |
|
/oz-review |
|
I'm re-reviewing this spec-only pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
The spec defines a new machine-readable JSON export for agent conversations, including UI/CLI surfaces, schema versioning, redaction expectations, truncation behavior, and validation tests. The overall structure is complete, but several requirements are internally inconsistent or infeasible as written.
Concerns
- Reasoning export is default-on even though the spec explicitly acknowledges reasoning can contain sensitive draft thoughts or API keys.
- String truncation mixes raw UTF-8 byte limits with canonical JSON byte sizing, so implementations cannot satisfy both requirements consistently.
- The JSON Schema requirements ask the schema to enforce runtime byte-size conditions that JSON Schema cannot know.
- The test plan requires decode round-tripping despite V1 declaring no re-import/decode path.
- Plan-step status coverage is ambiguous between top-level
plan_step.statusand nesteditems[].status.
Security
- Reasoning inclusion should use a safe default or explicit per-export opt-in before sensitive reasoning is serialized.
Verdict
Found: 0 critical, 5 important, 1 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
|
|
||
| V1 INCLUDES reasoning blocks (`{ "type": "reasoning", ... }`) by default. Some users will not want reasoning surfaced in shared exports, so V1 ships an explicit toggle: | ||
|
|
||
| - Setting: `agent.conversation_export.include_reasoning` — bool, default `true`. |
There was a problem hiding this comment.
|
|
||
| | `output.type` | Truncation behavior when `original_size_bytes > limit_bytes` | | ||
| |---|---| | ||
| | `string` | Replace `value` with the first `limit_bytes` bytes of the original UTF-8 string, snapped backward to the nearest valid codepoint boundary so the result is well-formed UTF-8. | |
There was a problem hiding this comment.
original_size_bytes using canonical JSON bytes, but this rule truncates string values to raw UTF-8 bytes; the serialized value can still exceed limit_bytes because JSON adds quotes/escapes. Define string truncation against canonical serialized size or change B5.1 to measure raw string bytes.
| `TypedValue` variants in B2.1.1), `plan_step` (B2.1.2), | ||
| `code_diff` (B2.1.3), `image`. Required vs optional fields and | ||
| enum values match this spec exactly. | ||
| - **Truncation flags**: the schema constrains `truncated`, |
There was a problem hiding this comment.
iff rule that truncation fields appear only when original_size_bytes > limit_bytes, because the configured limit and canonical byte size are not available to the schema. Restrict this section to schema-enforceable type/field constraints and leave size-dependent checks to tests.
| - T_reasoning_omitted. Unit: with `include_reasoning = false`, reasoning blocks are omitted entirely from `content` arrays — no empty marker, no `[REDACTED]` stub — and surrounding text blocks remain in their original order. Cover the `[text, reasoning, text]` -> `[text, text]` case explicitly. | ||
| - T9. Unit: filename slug matches `<conversation_title-or-id>-<timestamp>.warp-export.json` for several title shapes including titles with spaces, slashes, and unicode. | ||
| - T10. Unit: a conversation with an inline image serializes as a `data_uri` content block. | ||
| - T11. Unit: unknown content-block `type` encountered during decode round-trips unchanged (forward-compat probe). |
There was a problem hiding this comment.
| - T10. Unit: a conversation with an inline image serializes as a `data_uri` content block. | ||
| - T11. Unit: unknown content-block `type` encountered during decode round-trips unchanged (forward-compat probe). | ||
| - T_schema_artifact_in_tree. Unit: `app/src/ai/agent/export_schema/v1.0.0/schema.json` exists, parses as JSON Schema draft 2020-12, and matches the generator output of `export_schema.rs` byte-for-byte. | ||
| - T_plan_step_items_concrete. Unit: a plan step exercising every `status` value (including `blocked` with a non-null `blocked_reason` and `cancelled`), with mixed `subitems` depths, round-trips losslessly and validates against the schema. |
There was a problem hiding this comment.
plan_step status is defined as only pending | in_progress | complete, while blocked/cancelled live on items[]; this test reads as requiring top-level plan steps to exercise blocked and cancelled. Reword it to plan_step.items or extend the top-level plan_step.status enum.
| contain those fields — they only appear when an actual truncation | ||
| occurred. | ||
|
|
||
| #### B5.3 Worked example (object) |
There was a problem hiding this comment.
💡 [SUGGESTION] The heading says object, but the example is a string output; rename it so implementers do not look for an object truncation fixture.
| #### B5.3 Worked example (object) | |
| #### B5.3 Worked example (string) |
Spec-only PR for #10112 — JSON export option for agent conversations.
Summary
Warp can already export agent conversations to Markdown, but Markdown is for human reading: tool calls, plan steps, reasoning blocks, and structured tool I/O collapse into prose. Scripts and tools that want to consume agent transcripts have to parse rendered Markdown, which is brittle. This spec adds a versioned JSON export option to the existing Export Conversation flow so transcripts are losslessly machine-readable.
Scope (V1)
schema_versionsemver, V1.0.0) with typed content blocks: text, reasoning, tool_call, plan_step, code_diff, image.truncated/original_size_bytesflags so consumers can detect).warp export-conversation <id> --format json --output path(or stdout via-).Non-Goals
Schema versioning
schema_versionfollows semver. Decoders MUST accept the current major, SHOULD reject unknown majors, and SHOULD pass through unknown minor-version fields and content-block types unchanged. This lets future Warp versions evolve the schema while keeping older tools still able to parse past exports.Spec document
specs/GH10112/SPEC.md— full Behavior Contract (B1–B8 with the typed content-block schema), Acceptance Criteria (A1–A10), Implementation Pointers verified against the codebase (app/src/ai/agent/conversation.rs,app/src/ai/agent/conversation_yaml.rs,app/src/ai/agent_conversations_model.rs,app/src/ai/agent/api/convert_conversation.rs), Tests (T1–T11), Open Questions, and Telemetry notes.Labels
ready-to-spec