name

longform-to-shorts

description

Turn a podcast, interview, talk, screen recording, or long YouTube video into strong vertical shorts by selecting moments, snapping cut boundaries, reframing to 9:16, and rendering with native captions.

allowedTools

shell

read

write

model

inherit

argumentHint

{"timestampsPath":"output/content-machine/audio/timestamps.json","sourceMediaPath":"input/source.mp4","outputDir":"runs/source-clips/longform-to-shorts","maxCandidates":3}

entrypoint

node --import tsx scripts/harness/longform-to-shorts.ts

inputs

name	description	required
timestampsPath	Word-level timestamps from the longform transcript or ASR pass.	true

name	description	required
sourceMediaPath	Optional local source video or audio path for analysis and clipping provenance.	false

name	description	required
outputDir	Directory for candidate, boundary, approval, and handoff artifacts.	false

name	description	required
approvedCandidateIds	Candidate ids to approve after review.	false

outputs

name	description
highlight-candidates.v1.json	Ranked short-form candidate moments.

name	description
render-handoff.v1.json	Explicit handoff naming what still must be clipped, reframed, rendered, and reviewed.

Longform To Shorts

Use When

The user already has a long video or URL and wants multiple shorts, not a net-new faceless script.
The main problem is moment selection, clipping, reframing, and captioning.
The source is a talk, podcast, interview, screen recording, or commentary video where transcript quality matters more than stock sourcing.

Core Approach

Start with transcript and structure, not with random timestamps.
Score candidate clips on hook, coherence, value density, emotional intensity, and payoff.
Snap cut points to speech boundaries, sentence endings, and silences.
Reframe for portrait based on source type: speaker, cursor, or general center-safe crop.
Use aggressive captions only after clip quality is proven.

Inputs

long-form local video file or URL
optional transcript or transcript cache
target platform
optional source type hint: talking-head, podcast, screen-recording, mixed

Outputs

candidate clip list or approved clip plan
per-clip timestamps
portrait-ready render inputs
final short MP4s plus review bundles if executed end to end

Runtime Surface

Use the executable longform-to-shorts.flow when a Claude Code, Codex CLI, Cursor, or similar harness needs the selection path in one run-scoped call.
Use the direct longform-to-shorts runtime when one harness tool call is enough and the flow manifest is not needed.
Use longform-highlight-select directly only when you need to inspect or replace the selection stage in isolation.
Use reverse-engineer-winner for reference analysis only, not raw clipping.
Use video-render and publish-prep-review for final output and review.
Use longform-clip-extract after approval to cut source ranges and write clip-local render inputs.
Use references/production-shape.md for the concrete boundary-snap, reframe, and review sequence.
The current executable path stops at render-handoff.v1.json. It does not call video-render directly; run longform-clip-extract first, then reframe if needed.

Invocation

Inside a harness, prefer outcome prompts such as:

Use Content Machine to turn this longform video into candidate shorts. Run the longform-to-shorts flow, show me the candidate plan, and do not render until I approve a candidate.

Repo-local command form:

cat skills/longform-to-shorts/examples/request.json | \
  node --import tsx scripts/harness/longform-to-shorts.ts

Technical Notes

Pull from yt-dlp, transcript, scene detection, and blueprint files when available.
The first executable selector is transcript/timestamp based. Frame, speaker, face, and cursor signals should be added after candidate moment selection is stable.
Use reframe-vertical for crop strategy.
Use short-form-captions after moment selection and reframing are stable.

Aggregated From

AgriciDaniel/claude-shorts
imgly/videoclipper
iDoust/youtube-clip
mutonby/openshorts

Validation Checklist

Chosen clips make sense without full-video context.
Start and end points do not cut across words or thoughts.
Portrait framing keeps the active subject or screen action readable.
Captions fit the clip instead of compensating for a weak selection.
Final clips feel like distinct shorts, not arbitrary excerpts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Longform To Shorts

Use When

Core Approach

Inputs

Outputs

Runtime Surface

Invocation

Technical Notes

Aggregated From

Validation Checklist

FilesExpand file tree

SKILL.md

Latest commit

History

SKILL.md

File metadata and controls

Longform To Shorts

Use When

Core Approach

Inputs

Outputs

Runtime Surface

Invocation

Technical Notes

Aggregated From

Validation Checklist