Every well-structured Hyperframes video flows through the same 7 steps, whether it starts from a website, a PDF, a CSV, or a blank page. Each step produces a named artifact that the next step depends on, so your AI agent (and you) always know what’s done, what’s next, and where the creative decisions live on disk. This pipeline is the backbone of the website-to-video workflow, but it’s just as useful when you’re scripting a brand reel from scratch, turning research notes into a launch teaser, or learning Hyperframes for the first time. Most of the production-grade launch videos HeyGen ships are organized this way.Documentation Index
Fetch the complete documentation index at: https://hyperframes.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
The seven steps
Each step produces an artifact that feeds the next:| # | Step | Output | What happens |
|---|---|---|---|
| 1 | Capture | capture/ | Extract screenshots, design tokens, fonts, assets, animations from a source |
| 2 | Design | DESIGN.md | Brand reference: colors, typography, components, do’s and don’ts |
| 3 | Script | SCRIPT.md | Narration text with hook, story, proof, and CTA |
| 4 | Storyboard | STORYBOARD.md | Per-beat creative direction: mood, assets, animations, transitions |
| 5 | VO + Timing | narration.wav + transcript.json | TTS audio with word-level timestamps |
| 6 | Build | compositions/*.html | Animated HTML compositions, one per beat |
| 7 | Validate | Snapshot PNGs + lint/validate pass | Visual verification and runtime checks before delivery |
Project layout
A typical project directory after the pipeline runs:capture/ so they’re cleanly separated from the build outputs. Everything downstream lives at the project root.
Step 1: Capture
Output:capture/
When the video is grounded in an existing source (a website, a brand site, a competitor reference), start with capture. Hyperframes ships a built-in capture command for websites:
capture/ so later steps can reference paths instead of inlining content.
Gate: You can describe the source’s visual identity in one or two sentences and name its top colors, fonts, and standout assets.
Step 2: Design
Output:DESIGN.md in the project root
DESIGN.md is the brand cheat sheet. It encodes the visual identity factually so every downstream decision can reference exact colors, fonts, and components instead of inventing them. It’s a reference document, not a creative plan. The creative work happens in the storyboard.
A typical DESIGN.md has six sections:
| Section | What it captures |
|---|---|
| Overview | 3-4 sentences describing layout patterns, color strategy, typography tone |
| Colors | 5-10 HEX values with semantic roles (primary surface, accent warm, etc.) |
| Typography | Font families with weights, roles, and distinctive usage |
| Components | Patterns the brand uses: bento grids, logo walls, gradient meshes |
| Imagery | Asset categories and how the brand uses them |
| Do’s and Don’ts | Hard rules: “white backgrounds, never dark”, “no drop shadows” |
DESIGN.md is also the input format for Open Design and Claude Design; both produce a DESIGN.md you can drop into a Hyperframes project.
Gate: DESIGN.md exists with all six sections filled in from real captured data (or chosen deliberately for greenfield projects).
Step 3: Script
Output:SCRIPT.md in the project root
SCRIPT.md is the narration backbone. Scene durations come from the narration, not from guessing, so write the script before the storyboard and time beats to spoken words.
A typical structure: hook (one sentence that earns attention), story (what the product or topic is), proof (numbers, components, customers), CTA (one clear action). Reference real features, real stats, and real components from capture/extracted/visible-text.txt. Don’t invent claims the source doesn’t support.
For videos without narration (brand reels, music-driven teasers), SCRIPT.md becomes a per-beat copy plan instead: the on-screen text and headlines, with timing notes.
Gate: SCRIPT.md exists in the project root.
Step 4: Storyboard
Output:STORYBOARD.md in the project root
STORYBOARD.md tells the engineer (human or agent) exactly what to build for each beat: mood, camera, animations, transitions, assets, depth layers, sound effects. It’s where the creative choices get pinned down.
Each beat in STORYBOARD.md typically covers:
| Field | What it specifies |
|---|---|
| Timing | 0.0s - 5.8s, taken from transcript.json once Step 5 runs |
| Narration line | The exact words spoken during this beat |
| Mood & camera | One sentence describing the feel and the shot |
| Assets | Which captured images, icons, and fonts go in this beat, referenced by path |
| Techniques | 2-3 picks from the techniques library: SVG path drawing, Canvas 2D, CSS 3D, per-word typography, Lottie, video compositing, typing effects, variable fonts, MotionPath, velocity transitions, audio-reactive |
| Transitions | How this beat enters from the previous one and exits to the next |
| SFX | Short, specific sound effects (e.g. “woosh on logo entry, soft tick on counter”) |
STORYBOARD.md exists with beat-by-beat direction and an asset audit that names every file used.
Step 5: VO and timing
Outputs:narration.wav (or .mp3), narration.txt, transcript.json
Generate the TTS narration, then transcribe it for word-level timestamps. Those timestamps are the source of truth for every beat duration downstream.
| File | What it contains |
|---|---|
narration.wav | The TTS audio that ships with the final render |
narration.txt | The exact spoken text with pronunciation substitutions applied (API → A P I, $2T → two trillion). Distinct from SCRIPT.md so you can regenerate the audio later with a different voice without redoing the substitutions. |
transcript.json | [{ text, start, end }] for every word. Every later step reads this for timing. |
/hyperframes-media for the skill that picks one. After generating audio, update STORYBOARD.md with the real beat boundaries from transcript.json.
Gate: narration.wav, narration.txt, and transcript.json exist. STORYBOARD.md beat timings reference real timestamps, not estimates.
Step 6: Build
Output:compositions/<beat-name>.html, one HTML file per beat
This is where the storyboard becomes runnable HTML. Each composition is a self-contained file that imports captured assets by path, uses the exact colors and fonts from DESIGN.md, and animates with the techniques the storyboard picked.
For multi-beat videos, spawn a focused sub-agent per beat. Each one gets fresh context, the storyboard section for its beat, the asset paths it needs, and the relevant technique references. That produces noticeably better output than building every beat in one long-running context.
After each composition is built, run a self-review for layout, asset placement, and animation quality. The /hyperframes skill encodes the composition rules: required class="clip" attributes, GSAP timeline registration, data-* attribute semantics, and adapter registries.
Gate: Every composition is self-reviewed. No overlapping elements, no misplaced assets, no static images sitting unanimated.
Step 7: Validate
Outputs:snapshots/frame-*.png, lint and validate passing with zero errors
Three checks before delivery:
lint catches missing attributes, timeline registration issues, tween conflicts, and CSS-transform vs. GSAP conflicts. validate loads each composition in headless Chrome and surfaces runtime JS errors, missing assets, and failed network requests. snapshot captures frames at specific timestamps so you can see your output without a full render.
The pipeline delivers the localhost Studio URL as the handoff. Your AI agent runs npx hyperframes preview and shares the project URL. Rendering to MP4 is on-demand:
lint and validate pass with zero errors. Snapshot frames look right. The Studio preview URL is ready to share.
Iterating
The pipeline is built around named artifacts on disk so you can re-enter anywhere without re-running everything:- To rework the creative plan, edit
STORYBOARD.md: change a beat’s mood, swap an asset, retime the entrance, then ask the agent to rebuild just that beat. - For surgical tweaks, open a composition file directly (e.g.
compositions/beat-3-proof.html) and adjust animations, colors, or layout.npx hyperframes previewshows changes live. - To rebuild one beat from scratch, prompt the agent: “Rebuild beat 2 with more energy. Use the product screenshot as full-bleed background.” It reads
STORYBOARD.md,DESIGN.md, and the transcript, then regenerates just that file. - To swap the voice without redoing Step 3, re-run TTS against
narration.txt, which already has the pronunciation substitutions baked in.
When to use the pipeline
The pipeline is the recommended structure for:- Capturing a website with the /website-to-hyperframes skill, which follows it end-to-end.
- Shipping a product launch. Most of the HeyGen launch videos use this artifact layout.
- Any narrative video with three or more beats, where a storyboard pays for itself.
- Learning Hyperframes, because the artifacts leave every creative decision inspectable on disk.
STORYBOARD.md.
Next steps
Website to Video
The full website-to-video workflow built on this pipeline.
Prompting
How to invoke the pipeline through your AI agent.
Launch Videos
Real production projects organized around this pipeline.
CLI Reference
Every command the pipeline calls.