Skip to content

fix: double-audio scaffold, lint rules, docs guide, Gemini 3.1#299

Merged
ukimsanov merged 3 commits intomainfrom
feat/capture-improvements-v2
Apr 17, 2026
Merged

fix: double-audio scaffold, lint rules, docs guide, Gemini 3.1#299
ukimsanov merged 3 commits intomainfrom
feat/capture-improvements-v2

Conversation

@ukimsanov
Copy link
Copy Markdown
Collaborator

What

Remove scaffold index.html from captures, add two new lint rules, add website-to-video docs guide, and switch to Gemini 3.1 Flash Lite for image captioning.

Why

  • The scaffold index.html in captures/ had data-composition-id and audio elements that collided with the agent-built index.html, causing double audio playback
  • No lint rules existed to catch multiple root compositions or overlapping audio tracks
  • The docs site had no guide for the website-to-video workflow
  • Gemini 3.1 Flash Lite is 2.5x faster with richer descriptions than 2.5 Flash

How

  • scaffolding.ts: Removed index.html generation from capture output (captures are data folders, not video projects)
  • lintProject.ts: Added multiple_root_compositions (error) and duplicate_audio_track (warning) project-level rules
  • website-to-video.mdx: New Mintlify guide with Steps, Tabs, AccordionGroup components
  • cli.mdx: Added capture command to Create tab, snapshot to Preview tab
  • contentExtractor.ts: Switched model to gemini-3.1-flash-lite-preview

Test plan

  • Manual testing performed — capture on basecamp.com produces no scaffold index.html
  • Gemini 3.1 tested — 28 images captioned with detailed descriptions
  • Documentation updated

Double-audio bug fix:
- scaffolding.ts: stop writing index.html in captures/ (root cause —
  runtime discovered scaffold + real index.html as two compositions)
- New lint rule: multiple_root_compositions — errors if >1 root HTML
- New lint rule: duplicate_audio_track — warns on overlapping audio

Capture improvements (from testing 30+ websites):
- Catalog runs BEFORE extractHtml (which mutates DOM — converts img src
  to data URLs). HeyKuba: 2 images → 78.
- networkidle2 instead of networkidle0 (unblocks SPAs with WebSockets)
- Lazy-load image wait, CSS background-image cataloging
- SVG naming from class/id/parent (not just aria-label)
- Gemini batch 5→20, pause 12s→2s, maxOutputTokens 300→500
- Asset descriptions sorted: captioned first

Docs:
- New guide: guides/website-to-video.mdx (full tutorial)
- CLI docs: added capture and snapshot commands
- docs.json: website-to-video in Guides nav

C
Gemini 3.1 Flash Lite Preview: 2.5x faster TTFT, 45% faster output,
slightly cheaper ($0.25/M vs $0.30/M input), near-2.5-Flash quality.
Descriptions are actually more detailed in testing.
@mintlify
Copy link
Copy Markdown

mintlify bot commented Apr 17, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
hyperframes 🟢 Ready View Preview Apr 17, 2026, 4:18 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice bug find and the guide reads well. A few things I'd want addressed before merging — two of them are correctness bugs in the new lint rules, and I think the PR is doing too many things at once.

🔴 Blockers

1. lintMultipleRootCompositions can never fire

const rootFiles = results.map((r) => r.file).filter((f) => !f.startsWith("compositions/"));
if (rootFiles.length > 1) { ... }

results is built earlier in lintProject: exactly one push for "index.html" and N pushes for "compositions/${file}". After the filter, rootFiles always has length 1, so the > 1 branch is unreachable. This rule can't catch the bug it was designed to catch.

To actually detect a stray scaffold, the lint needs to walk the project directory for *.html files at the root and compare against project.indexPath — not filter what lintProject already chose to read.

2. lintDuplicateAudioTracks regex is order-sensitive

The regex requires attributes in the source as data-track-index → data-start → data-duration. The old scaffold that caused this bug wrote them as:

<audio id="narration" data-start="0" data-duration="28" data-track-index="0" data-volume="1" src="narration.wav">

data-start before data-track-index — that audio tag would not match this regex. Any agent-authored <audio> with a different attribute order is silently skipped.

Fix: match <audio[^>]*> first, then extract each attribute with its own regex against the tag body. While you're there: because the scan walks allHtmlSources (root + every composition), the same <audio> reachable through both a root and a sub-composition will flag as a duplicate of itself — worth deduping by (src, start, duration) or scoping to a single file.

Also, landing two new lint rules with zero unit tests is what let #1 slip through — a single fixture per rule would have caught it.

🟡 Should fix

3. Docs/code drift on the Gemini model

You updated the model in three places but missed one:

  • step-1-capture.md → "Gemini 3.1 Flash Lite" ✅
  • contentExtractor.ts → updated ✅
  • docs/packages/cli.mdx:357still says "Gemini 2.5 Flash vision (~$0.001/image)"

4. gemini-3.1-flash-lite-preview is a preview model

The PR body claims "2.5x faster with richer descriptions" without numbers. Preview endpoints get deprecated on Google's schedule, not ours, so two things I'd want:

  • A short note with the actual measurements — latency, sample caption quality, any rate-limit changes (the batching code still assumes 2000 RPM).
  • An easy swap path — either an env override (HYPERFRAMES_GEMINI_MODEL) or a constant at the top of contentExtractor.ts so the next swap is one line.

5. Drop the greensock/gsap-skills install line

(From @jrusso1020) We ship skills/gsap/ in this repo, so pointing users at greensock/gsap-skills is now redundant and a second source of truth we don't control. Remove both the npx skills add greensock/gsap-skills line in docs/guides/website-to-video.mdx and any similar references.

6. Lead with explicit skill invocation in the guide

(From @jrusso1020, with my take) The current guide shows implicit discovery as the happy path and treats explicit invocation as a troubleshooting fallback. I'd flip that for the published docs:

  • Deterministic — users get the same behavior every time, no "why didn't it trigger?" support threads.
  • Teachable — the docs actually name the thing they're telling you to use.
  • Self-documenting in transcripts — easier to tell what ran when someone pastes a session.

I wouldn't make it the only pattern though — ambient discovery is part of what makes the product feel magical, and removing that story would be a loss. Concretely:

Use the /website-to-hyperframes skill to create a 25-second product launch
video from https://stripe.com. Bold, cinematic, financial infrastructure energy.

…as the primary example, with a <Note> afterwards saying "Agents will also trigger this skill automatically when they see a URL and a video request — the explicit form is just more predictable."

7. docs/docs.json indentation

             "pages": [
"guides/website-to-video",
               "guides/prompting",

New entry is flush-left while its siblings are at 14 spaces. Mintlify parsed it (CI green) but this drifts on the next formatter pass. One-line fix.

🟢 Scope

Four independent things in one PR: a P0-ish bug fix, two lint rules, a Mintlify guide, and a model swap. I'd split into:

  • PR A (ship now): scaffold removal — self-contained, fixes a real bug.
  • PR B: lint rules, once they work + tests.
  • PR C: docs guide + cli.mdx edits + gsap-skills removal.
  • PR D: Gemini 3.1 swap with benchmark + env override.

If anything regresses, revert blast radius becomes one feature instead of four.

👍 What's good

  • The root-cause write-up in scaffolding.ts is exactly right and the inline comment explaining why we no longer emit index.html is the kind of breadcrumb I want to find in six months.
  • The website-to-video guide is well-structured, uses Mintlify components idiomatically, and the "With/Without Gemini" <Tabs> comparison is a nice touch.

Requesting changes on #1, #2, #3, #5. Happy to re-review once those are addressed; the rest are strong suggestions.

- lintMultipleRootCompositions: scan filesystem for HTML files with
  data-composition-id (was filtering results array — always 1 entry)
- lintDuplicateAudioTracks: order-independent attribute extraction,
  dedup by (src,start,duration,trackIndex), Infinity fallback for
  missing data-duration (matches runtime behavior)
- 10 new tests for both lint rules
- docs: explicit skill invocation, remove gsap-skills, fix indentation
- Gemini: env override (HYPERFRAMES_GEMINI_MODEL), benchmark data in
  code comment (49 imgs: 3.1-lite ~507ms/img, 2.5-lite ~230ms/img)
- cli.mdx: version-agnostic "Gemini vision" reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ukimsanov
Copy link
Copy Markdown
Collaborator Author

Addressed all 7 review items

🔴 Blockers (all fixed)

1. lintMultipleRootCompositions can never fire — Rewrote to scan the project directory filesystem for .html files containing data-composition-id, instead of filtering the results array (which always has exactly 1 root entry). 3 tests added.

2. lintDuplicateAudioTracks regex is order-sensitive — Replaced single order-dependent regex with <audio[^>]*> match + per-attribute extractAttr() helper. Also:

  • Deduplicates by (src, start, duration, trackIndex) to avoid false positives when the same audio appears in root + sub-composition
  • Falls back to Infinity when data-duration is absent (matches runtime behavior at hyperframes-player.ts:635)
  • Handles Infinity.toFixed() safely in warning messages
  • Regex created inside the for loop to avoid g-flag lastIndex carryover across HTML strings
  • 7 tests added (attribute order, non-overlapping, different tracks, dedup, Infinity fallback, Infinity formatting, g-flag regression)

3. cli.mdx still says "Gemini 2.5 Flash" — Changed to version-agnostic "Gemini vision".

5. Drop greensock/gsap-skills — Removed from website-to-video.mdx.

🟡 Should fix (all addressed)

4. Preview model benchmark + env override — Added HYPERFRAMES_GEMINI_MODEL env override. Benchmark results (49 images, paid tier, sequential):

Model Success rate Latency/img Caption length
gemini-3.1-flash-lite-preview 49/49 ~507ms 131 chars avg
gemini-2.5-flash-lite 49/49 ~230ms 117 chars avg
gemini-2.5-flash 15/49 ~1324ms 111 chars avg

3.1-flash-lite-preview produces richer captions (+14 chars avg) with higher variance on cold starts. 2.5-flash-lite is faster and more reliable. Keeping 3.1 as default for caption quality; easy swap via env var. Benchmark data added as code comment.

6. Lead with explicit skill invocation — Step 2 example updated, <Note> about ambient discovery added, troubleshooting accordion updated.

7. docs.json indentation — Fixed flush-left entry to 14 spaces.

Bonus from self-review

  • fixHint in lintProjectAudioFiles now includes data-duration="__DURATION__" (was missing, users copying the hint would get broken audio)
  • lintDuplicateAudioTracks handles missing data-duration by falling back to Infinity (matches runtime at hyperframes-player.ts:635)

Not splitting the PR

Keeping as one PR per discussion — the changes are small and interdependent (lint rules reference the same scaffold bug the removal fixes, docs reference the same Gemini model the code uses).

@ukimsanov ukimsanov merged commit a95539a into main Apr 17, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants