| name |
longform-to-shorts |
| description |
Turn a podcast, interview, talk, screen recording, or long YouTube video into strong vertical shorts by selecting moments, snapping cut boundaries, reframing to 9:16, and rendering with native captions. |
| allowedTools |
|
| model |
inherit |
| argumentHint |
{"timestampsPath":"output/content-machine/audio/timestamps.json","sourceMediaPath":"input/source.mp4","outputDir":"runs/source-clips/longform-to-shorts","maxCandidates":3} |
| entrypoint |
node --import tsx scripts/harness/longform-to-shorts.ts |
| inputs |
| name |
description |
required |
timestampsPath |
Word-level timestamps from the longform transcript or ASR pass. |
true |
|
| name |
description |
required |
sourceMediaPath |
Optional local source video or audio path for analysis and clipping provenance. |
false |
|
| name |
description |
required |
outputDir |
Directory for candidate, boundary, approval, and handoff artifacts. |
false |
|
| name |
description |
required |
approvedCandidateIds |
Candidate ids to approve after review. |
false |
|
|
| outputs |
| name |
description |
highlight-candidates.v1.json |
Ranked short-form candidate moments. |
|
| name |
description |
render-handoff.v1.json |
Explicit handoff naming what still must be clipped, reframed, rendered, and reviewed. |
|
|
- The user already has a long video or URL and wants multiple shorts,
not a net-new faceless script.
- The main problem is moment selection, clipping, reframing, and
captioning.
- The source is a talk, podcast, interview, screen recording, or
commentary video where transcript quality matters more than stock
sourcing.
- Start with transcript and structure, not with random timestamps.
- Score candidate clips on hook, coherence, value density, emotional
intensity, and payoff.
- Snap cut points to speech boundaries, sentence endings, and silences.
- Reframe for portrait based on source type: speaker, cursor, or
general center-safe crop.
- Use aggressive captions only after clip quality is proven.
- long-form local video file or URL
- optional transcript or transcript cache
- target platform
- optional source type hint:
talking-head, podcast,
screen-recording, mixed
- candidate clip list or approved clip plan
- per-clip timestamps
- portrait-ready render inputs
- final short MP4s plus review bundles if executed end to end
- Use the executable
longform-to-shorts.flow when
a Claude Code, Codex CLI, Cursor, or similar harness needs the
selection path in one run-scoped call.
- Use the direct
longform-to-shorts runtime when one harness tool call
is enough and the flow manifest is not needed.
- Use
longform-highlight-select
directly only when you need to inspect or replace the selection stage
in isolation.
- Use
reverse-engineer-winner
for reference analysis only, not raw clipping.
- Use
video-render and
publish-prep-review for final
output and review.
- Use
longform-clip-extract after
approval to cut source ranges and write clip-local render inputs.
- Use
references/production-shape.md
for the concrete boundary-snap, reframe, and review sequence.
- The current executable path stops at
render-handoff.v1.json. It does
not call video-render directly; run longform-clip-extract first,
then reframe if needed.
Inside a harness, prefer outcome prompts such as:
Use Content Machine to turn this longform video into candidate shorts.
Run the longform-to-shorts flow, show me the candidate plan, and do
not render until I approve a candidate.
Repo-local command form:
cat skills/longform-to-shorts/examples/request.json | \
node --import tsx scripts/harness/longform-to-shorts.ts
- Pull from
yt-dlp, transcript, scene detection, and blueprint files
when available.
- The first executable selector is transcript/timestamp based. Frame,
speaker, face, and cursor signals should be added after candidate
moment selection is stable.
- Use
reframe-vertical for crop
strategy.
- Use
short-form-captions after
moment selection and reframing are stable.
AgriciDaniel/claude-shorts
imgly/videoclipper
iDoust/youtube-clip
mutonby/openshorts
- Chosen clips make sense without full-video context.
- Start and end points do not cut across words or thoughts.
- Portrait framing keeps the active subject or screen action readable.
- Captions fit the clip instead of compensating for a weak selection.
- Final clips feel like distinct shorts, not arbitrary excerpts.