Skip to content

feat: LTX-2 support#1458

Open
pwilkin wants to merge 2 commits intoleejet:masterfrom
pwilkin:ltx-2
Open

feat: LTX-2 support#1458
pwilkin wants to merge 2 commits intoleejet:masterfrom
pwilkin:ltx-2

Conversation

@pwilkin
Copy link
Copy Markdown

@pwilkin pwilkin commented Apr 23, 2026

Please have mercy, had to murder my Claude Code to get this working.

SD_CUDA_DEVICE=1 SD_CUDA_DEVICE_CLIP=-1 SD_CUDA_DEVICE_VAE=0 timeout 1800 ./bin/sd-cli -M vid_gen \
    --diffusion-model /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev-Q5_K_S.gguf \
    --llm /media/ilintar/D_SSD/models/ltx-2/gemma-3-12b-it-qat-IQ4_XS.gguf \
    --vae /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev_video_vae.safetensors \
    -m /media/ilintar/D_SSD/models/ltx-2/ltx-2.3-22b-dev_embeddings_connectors.safetensors \
    --gemma-tokenizer /home/ilintar/.cache/huggingface/hub/models--google--gemma-3-12b-it/snapshots/96b6f1eccf38110c56df3a15bffe176da04bfd80/tokenizer.json \
    -W 640 -H 480 --video-frames 25 --steps 60 --fps 24 --cfg-scale 6.0 --seed 42 \
    -p "a cat walking on a sandy beach at sunset, cinematic, 4k" \
    -o /tmp/ltx2_smoke.webm
ltx2_smoke_v2.webm

@Green-Sky
Copy link
Copy Markdown
Contributor

I think there is some good stuff we can pull out of here (:

btw, gemma-3-12b-it-qat-IQ4_XS.gguf why iq4 of qat?

@pwilkin
Copy link
Copy Markdown
Author

pwilkin commented Apr 24, 2026

@Green-Sky that's a very good question, probably "because I wasn't thinking about it" is the proper answer ;)

@JohnLoveJoy
Copy link
Copy Markdown

Great work. How does this perform compared to ComfyUI?

@pwilkin
Copy link
Copy Markdown
Author

pwilkin commented Apr 24, 2026

Haven't compared yet but gonna optimize further.

@mudler
Copy link
Copy Markdown
Contributor

mudler commented Apr 24, 2026

wow! was actually playing with it myself as well with Claude letting it go by itself. Will open up a PR just for reference, got this working with claude as well yesterday

this is the result I got with it

ltx23_fix

@pwilkin
Copy link
Copy Markdown
Author

pwilkin commented Apr 24, 2026

Slightly funky still, so guess there's a subtle error somewhere, but I added fitting, so I managed to get 80 frames at 720p ("a black cat jumping at a brown mouse on green grass"):

ltx2_cat_mouse_720p.webm

@pwilkin
Copy link
Copy Markdown
Author

pwilkin commented Apr 24, 2026

@mudler yours looks much better, wonder if that's quants or if my implementation has a bug somewhere.

Edit: might be distilled vs full too though.

@pwilkin
Copy link
Copy Markdown
Author

pwilkin commented Apr 24, 2026

Probably FA is the culprit here - I'm running this on 26 GB VRAM total (3080 10 GB + 5060 16 GB), so really struggling to get anything reasonable :)

Comment thread src/stable-diffusion.cpp
// SD_CUDA_DEVICE_VAE VAE (falls back to SD_CUDA_DEVICE)
// SD_CUDA_DEVICE_CONTROL ControlNet (falls back to SD_CUDA_DEVICE)
// SD_VK_DEVICE same pattern for the Vulkan build
// Setting any of these to -1 forces CPU for that component.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a reminder: this should be coordinated with #1184 .

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is just a rough PoC for now.

@mudler
Copy link
Copy Markdown
Contributor

mudler commented Apr 24, 2026

@mudler yours looks much better, wonder if that's quants or if my implementation has a bug somewhere.

Edit: might be distilled vs full too though.

I'm using the distilled model:

~/ltxv-sd-cpp/build-cuda/bin/sd-cli -M vid_gen \                              
    -m ltxv-models/ltx-2.3-22b-distilled.safetensors \                                                                                                                                                    
    --text-encoder gemma-3-12b-it \                                                                                                                                                                       
    -p 'a cat walking across a grassy field' \                     
    -W 768 -H 512 --video-frames 121 \                                                                                                                                                                    
    --steps 8 --cfg-scale 1 \                                                   
    -o /tmp/ltx23_clean.webp --seed 42                                                                      

@pwilkin
Copy link
Copy Markdown
Author

pwilkin commented Apr 24, 2026

@mudler yeah I'm doing full for some reason (probably the same one that caused me to pick IQ4_XS :D)

@pwilkin
Copy link
Copy Markdown
Author

pwilkin commented Apr 24, 2026

So apparently there are some major divergences between CPU and CUDA Gemma3, which is a bit surprising (and it happens on both Q4_0 and the IQ4_XS quants).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants