Add architecture registration + fallback loading path for newly released HF model types#27
Conversation
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/274cc6b0-d42e-47b1-9673-1f6db346ecf2 Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/274cc6b0-d42e-47b1-9673-1f6db346ecf2 Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
There was a problem hiding this comment.
@copilot Thank you for this PR. It’s a very well-targeted and timely improvement. Directly addressing the “architecture not recognized by transformers” error for brand-new models is one of the highest-priority issues for QuantLLM right now. The overall direction — adding a lightweight registration system + multi-tier fallback loading — is excellent and aligns perfectly with our goal of staying competitive with Unsloth on new-model support.
What I Like
- Clean and focused scope.
- Good API design:
register_architecture(),model_type_override,base_model_fallback, andfrom_config_onlyare intuitive and powerful. - Smart use of token-based regex matching with caching instead of naive string checks.
- Solid integration with the existing quantization path (
_should_apply_quantization). - Clear error messages with actionable suggestions — this is a big UX win.
- Comprehensive tests and updated documentation (including contribution template).
This PR already makes QuantLLM significantly more robust for recently released models.
Suggested Improvements (Before Merging)
Here are a few concrete refinements I recommend:
-
Security Warning for
trust_remote_code=True
Since this flag is now more prominently exposed for new models, we should add a prominent warning (both in code and docs) when it is used with unregistered or very new models. Something like:if trust_remote_code and not is_registered: logger.warning("trust_remote_code=True was enabled for an unregistered architecture. " "Only use this for models from trusted sources.")
Also add a clear note in
docs/guide/loading-models.md. -
Improve Fallback Error Message
The currentRuntimeErroris good, but we can make it even more helpful by including the resolved base model type and a one-line registration example:f"Architecture '{config.model_type}' is not recognized.\n" f"Try: register_architecture('{config.model_type}', base_model_type='llama')\n" f"or use model_type_override='llama'."
-
Make
base_model_fallback=Truethe Default
For maximum “it just works” experience with new models, consider makingbase_model_fallback=Truethe default inturbo()andTurboModel.from_pretrained(). Users can still disable it withbase_model_fallback=Falseif needed. This would reduce friction significantly. -
Minor Code Polish
- Ensure
_compiled_model_name_patternusesfunctools.lru_cache(maxsize=None)for thread-safety and performance. - Add a small comment explaining the fallback priority order in
_load_model_with_fallback. - Verify that when fallback succeeds, quantization (NF4, double quant, etc.) is still applied correctly — maybe add one integration test for 4-bit loading after fallback.
- Ensure
-
Documentation
Add one concrete real-world-style example in the loading guide using a hypothetical model released “yesterday” (e.g.,Qwen3-8Bor similar).
Overall Verdict
This PR is very close to ready — I’d rate it ~85% complete. With the above adjustments (especially the security warning and default fallback behavior), it will be production-ready and a strong addition to the library.
Once these points are addressed, I’m happy to approve and merge.
Let me know if you’d like me to push any of these changes directly or if you have questions about any suggestion.
Thanks again for the solid work!
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/8867f3b4-18ae-4207-b2e8-51444418c7aa Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
…ates Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/8867f3b4-18ae-4207-b2e8-51444418c7aa Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/8867f3b4-18ae-4207-b2e8-51444418c7aa Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Uh oh!
There was an error while loading. Please reload this page.