fix: double-audio scaffold, lint rules, docs guide, Gemini 3.1 by ukimsanov · Pull Request #299 · heygen-com/hyperframes

ukimsanov · 2026-04-17T04:17:26Z

What

Remove scaffold index.html from captures, add two new lint rules, add website-to-video docs guide, and switch to Gemini 3.1 Flash Lite for image captioning.

Why

The scaffold index.html in captures/ had data-composition-id and audio elements that collided with the agent-built index.html, causing double audio playback
No lint rules existed to catch multiple root compositions or overlapping audio tracks
The docs site had no guide for the website-to-video workflow
Gemini 3.1 Flash Lite is 2.5x faster with richer descriptions than 2.5 Flash

How

scaffolding.ts: Removed index.html generation from capture output (captures are data folders, not video projects)
lintProject.ts: Added multiple_root_compositions (error) and duplicate_audio_track (warning) project-level rules
website-to-video.mdx: New Mintlify guide with Steps, Tabs, AccordionGroup components
cli.mdx: Added capture command to Create tab, snapshot to Preview tab
contentExtractor.ts: Switched model to gemini-3.1-flash-lite-preview

Test plan

Manual testing performed — capture on basecamp.com produces no scaffold index.html
Gemini 3.1 tested — 28 images captioned with detailed descriptions
Documentation updated

Double-audio bug fix: - scaffolding.ts: stop writing index.html in captures/ (root cause — runtime discovered scaffold + real index.html as two compositions) - New lint rule: multiple_root_compositions — errors if >1 root HTML - New lint rule: duplicate_audio_track — warns on overlapping audio Capture improvements (from testing 30+ websites): - Catalog runs BEFORE extractHtml (which mutates DOM — converts img src to data URLs). HeyKuba: 2 images → 78. - networkidle2 instead of networkidle0 (unblocks SPAs with WebSockets) - Lazy-load image wait, CSS background-image cataloging - SVG naming from class/id/parent (not just aria-label) - Gemini batch 5→20, pause 12s→2s, maxOutputTokens 300→500 - Asset descriptions sorted: captioned first Docs: - New guide: guides/website-to-video.mdx (full tutorial) - CLI docs: added capture and snapshot commands - docs.json: website-to-video in Guides nav C

Gemini 3.1 Flash Lite Preview: 2.5x faster TTFT, 45% faster output, slightly cheaper ($0.25/M vs $0.30/M input), near-2.5-Flash quality. Descriptions are actually more detailed in testing.

mintlify · 2026-04-17T04:17:29Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
hyperframes	🟢 Ready	View Preview	Apr 17, 2026, 4:18 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

jrusso1020

Nice bug find and the guide reads well. A few things I'd want addressed before merging — two of them are correctness bugs in the new lint rules, and I think the PR is doing too many things at once.

🔴 Blockers

1. `lintMultipleRootCompositions` can never fire

const rootFiles = results.map((r) => r.file).filter((f) => !f.startsWith("compositions/"));
if (rootFiles.length > 1) { ... }

results is built earlier in lintProject: exactly one push for "index.html" and N pushes for "compositions/${file}". After the filter, rootFiles always has length 1, so the > 1 branch is unreachable. This rule can't catch the bug it was designed to catch.

To actually detect a stray scaffold, the lint needs to walk the project directory for *.html files at the root and compare against project.indexPath — not filter what lintProject already chose to read.

2. `lintDuplicateAudioTracks` regex is order-sensitive

The regex requires attributes in the source as data-track-index → data-start → data-duration. The old scaffold that caused this bug wrote them as:

<audio id="narration" data-start="0" data-duration="28" data-track-index="0" data-volume="1" src="narration.wav">

data-start before data-track-index — that audio tag would not match this regex. Any agent-authored <audio> with a different attribute order is silently skipped.

Fix: match <audio[^>]*> first, then extract each attribute with its own regex against the tag body. While you're there: because the scan walks allHtmlSources (root + every composition), the same <audio> reachable through both a root and a sub-composition will flag as a duplicate of itself — worth deduping by (src, start, duration) or scoping to a single file.

Also, landing two new lint rules with zero unit tests is what let #1 slip through — a single fixture per rule would have caught it.

🟡 Should fix

3. Docs/code drift on the Gemini model

You updated the model in three places but missed one:

step-1-capture.md → "Gemini 3.1 Flash Lite" ✅
contentExtractor.ts → updated ✅
docs/packages/cli.mdx:357 → still says "Gemini 2.5 Flash vision (~$0.001/image)" ❌

4. `gemini-3.1-flash-lite-preview` is a preview model

The PR body claims "2.5x faster with richer descriptions" without numbers. Preview endpoints get deprecated on Google's schedule, not ours, so two things I'd want:

A short note with the actual measurements — latency, sample caption quality, any rate-limit changes (the batching code still assumes 2000 RPM).
An easy swap path — either an env override (HYPERFRAMES_GEMINI_MODEL) or a constant at the top of contentExtractor.ts so the next swap is one line.

5. Drop the `greensock/gsap-skills` install line

(From @jrusso1020) We ship skills/gsap/ in this repo, so pointing users at greensock/gsap-skills is now redundant and a second source of truth we don't control. Remove both the npx skills add greensock/gsap-skills line in docs/guides/website-to-video.mdx and any similar references.

6. Lead with explicit skill invocation in the guide

(From @jrusso1020, with my take) The current guide shows implicit discovery as the happy path and treats explicit invocation as a troubleshooting fallback. I'd flip that for the published docs:

Deterministic — users get the same behavior every time, no "why didn't it trigger?" support threads.
Teachable — the docs actually name the thing they're telling you to use.
Self-documenting in transcripts — easier to tell what ran when someone pastes a session.

I wouldn't make it the only pattern though — ambient discovery is part of what makes the product feel magical, and removing that story would be a loss. Concretely:

Use the /website-to-hyperframes skill to create a 25-second product launch
video from https://stripe.com. Bold, cinematic, financial infrastructure energy.

…as the primary example, with a <Note> afterwards saying "Agents will also trigger this skill automatically when they see a URL and a video request — the explicit form is just more predictable."

7. `docs/docs.json` indentation

             "pages": [
"guides/website-to-video",
               "guides/prompting",

New entry is flush-left while its siblings are at 14 spaces. Mintlify parsed it (CI green) but this drifts on the next formatter pass. One-line fix.

🟢 Scope

Four independent things in one PR: a P0-ish bug fix, two lint rules, a Mintlify guide, and a model swap. I'd split into:

PR A (ship now): scaffold removal — self-contained, fixes a real bug.
PR B: lint rules, once they work + tests.
PR C: docs guide + cli.mdx edits + gsap-skills removal.
PR D: Gemini 3.1 swap with benchmark + env override.

If anything regresses, revert blast radius becomes one feature instead of four.

👍 What's good

The root-cause write-up in scaffolding.ts is exactly right and the inline comment explaining why we no longer emit index.html is the kind of breadcrumb I want to find in six months.
The website-to-video guide is well-structured, uses Mintlify components idiomatically, and the "With/Without Gemini" <Tabs> comparison is a nice touch.

Requesting changes on #1, #2, #3, #5. Happy to re-review once those are addressed; the rest are strong suggestions.

- lintMultipleRootCompositions: scan filesystem for HTML files with data-composition-id (was filtering results array — always 1 entry) - lintDuplicateAudioTracks: order-independent attribute extraction, dedup by (src,start,duration,trackIndex), Infinity fallback for missing data-duration (matches runtime behavior) - 10 new tests for both lint rules - docs: explicit skill invocation, remove gsap-skills, fix indentation - Gemini: env override (HYPERFRAMES_GEMINI_MODEL), benchmark data in code comment (49 imgs: 3.1-lite ~507ms/img, 2.5-lite ~230ms/img) - cli.mdx: version-agnostic "Gemini vision" reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ukimsanov · 2026-04-17T14:16:58Z

Addressed all 7 review items

🔴 Blockers (all fixed)

1. lintMultipleRootCompositions can never fire — Rewrote to scan the project directory filesystem for .html files containing data-composition-id, instead of filtering the results array (which always has exactly 1 root entry). 3 tests added.

2. lintDuplicateAudioTracks regex is order-sensitive — Replaced single order-dependent regex with <audio[^>]*> match + per-attribute extractAttr() helper. Also:

Deduplicates by (src, start, duration, trackIndex) to avoid false positives when the same audio appears in root + sub-composition
Falls back to Infinity when data-duration is absent (matches runtime behavior at hyperframes-player.ts:635)
Handles Infinity.toFixed() safely in warning messages
Regex created inside the for loop to avoid g-flag lastIndex carryover across HTML strings
7 tests added (attribute order, non-overlapping, different tracks, dedup, Infinity fallback, Infinity formatting, g-flag regression)

3. cli.mdx still says "Gemini 2.5 Flash" — Changed to version-agnostic "Gemini vision".

5. Drop greensock/gsap-skills — Removed from website-to-video.mdx.

🟡 Should fix (all addressed)

4. Preview model benchmark + env override — Added HYPERFRAMES_GEMINI_MODEL env override. Benchmark results (49 images, paid tier, sequential):

Model	Success rate	Latency/img	Caption length
gemini-3.1-flash-lite-preview	49/49	~507ms	131 chars avg
gemini-2.5-flash-lite	49/49	~230ms	117 chars avg
gemini-2.5-flash	15/49	~1324ms	111 chars avg

3.1-flash-lite-preview produces richer captions (+14 chars avg) with higher variance on cold starts. 2.5-flash-lite is faster and more reliable. Keeping 3.1 as default for caption quality; easy swap via env var. Benchmark data added as code comment.

6. Lead with explicit skill invocation — Step 2 example updated, <Note> about ambient discovery added, troubleshooting accordion updated.

7. docs.json indentation — Fixed flush-left entry to 14 spaces.

Bonus from self-review

fixHint in lintProjectAudioFiles now includes data-duration="__DURATION__" (was missing, users copying the hint would get broken audio)
lintDuplicateAudioTracks handles missing data-duration by falling back to Infinity (matches runtime at hyperframes-player.ts:635)

Not splitting the PR

Keeping as one PR per discussion — the changes are small and interdependent (lint rules reference the same scaffold bug the removal fixes, docs reference the same Gemini model the code uses).

ukimsanov added 2 commits April 16, 2026 22:58

feat(capture): switch Gemini 2.5 Flash → 3.1 Flash Lite

de5b53c

Gemini 3.1 Flash Lite Preview: 2.5x faster TTFT, 45% faster output, slightly cheaper ($0.25/M vs $0.30/M input), near-2.5-Flash quality. Descriptions are actually more detailed in testing.

mintlify bot deployed to staging - docs April 17, 2026 04:18 View deployment

ukimsanov requested review from jrusso1020 and vanceingalls April 17, 2026 04:20

jrusso1020 requested changes Apr 17, 2026

View reviewed changes

mintlify bot deployed to staging - docs April 17, 2026 14:16 View deployment

ukimsanov requested review from jrusso1020 and miguel-heygen April 17, 2026 14:17

miguel-heygen approved these changes Apr 17, 2026

View reviewed changes

jrusso1020 approved these changes Apr 17, 2026

View reviewed changes

ukimsanov merged commit a95539a into main Apr 17, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: double-audio scaffold, lint rules, docs guide, Gemini 3.1#299

fix: double-audio scaffold, lint rules, docs guide, Gemini 3.1#299
ukimsanov merged 3 commits intomainfrom
feat/capture-improvements-v2

ukimsanov commented Apr 17, 2026

Uh oh!

mintlify bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

jrusso1020 left a comment

Uh oh!

ukimsanov commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ukimsanov commented Apr 17, 2026

What

Why

How

Test plan

Uh oh!

mintlify bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

🔴 Blockers

1. lintMultipleRootCompositions can never fire

2. lintDuplicateAudioTracks regex is order-sensitive

🟡 Should fix

3. Docs/code drift on the Gemini model

4. gemini-3.1-flash-lite-preview is a preview model

5. Drop the greensock/gsap-skills install line

6. Lead with explicit skill invocation in the guide

7. docs/docs.json indentation

🟢 Scope

👍 What's good

Uh oh!

ukimsanov commented Apr 17, 2026

Addressed all 7 review items

🔴 Blockers (all fixed)

🟡 Should fix (all addressed)

Bonus from self-review

Not splitting the PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mintlify bot commented Apr 17, 2026 •

edited

Loading

1. `lintMultipleRootCompositions` can never fire

2. `lintDuplicateAudioTracks` regex is order-sensitive

4. `gemini-3.1-flash-lite-preview` is a preview model

5. Drop the `greensock/gsap-skills` install line

7. `docs/docs.json` indentation