Skip to content

Add architecture registration + fallback loading path for newly released HF model types#27

Merged
codewithdark-git merged 6 commits intomainfrom
copilot/add-flexible-model-registration
Apr 25, 2026
Merged

Add architecture registration + fallback loading path for newly released HF model types#27
codewithdark-git merged 6 commits intomainfrom
copilot/add-flexible-model-registration

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 25, 2026

  • Confirm baseline state and inspect current fallback implementation/docs/tests
  • Add trust_remote_code safety warning for unregistered/new architectures
  • Improve fallback RuntimeError with resolved/base model-type guidance and registration example
  • Ensure defaults/polish: explicit base_model_fallback default path, fallback-order comment, unbounded regex cache
  • Add fallback quantization regression test
  • Update loading docs with security note and a concrete "released yesterday" example
  • Run targeted lint/tests for touched files and summarize results
  • Address automated review feedback (test import order)

Copilot AI and others added 2 commits April 25, 2026 07:18
Copilot AI changed the title [WIP] Add flexible model class registration and fallback system Add architecture registration + fallback loading path for newly released HF model types Apr 25, 2026
Copilot AI requested a review from codewithdark-git April 25, 2026 07:25
Copy link
Copy Markdown
Owner

@codewithdark-git codewithdark-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Thank you for this PR. It’s a very well-targeted and timely improvement. Directly addressing the “architecture not recognized by transformers” error for brand-new models is one of the highest-priority issues for QuantLLM right now. The overall direction — adding a lightweight registration system + multi-tier fallback loading — is excellent and aligns perfectly with our goal of staying competitive with Unsloth on new-model support.

What I Like

  • Clean and focused scope.
  • Good API design: register_architecture(), model_type_override, base_model_fallback, and from_config_only are intuitive and powerful.
  • Smart use of token-based regex matching with caching instead of naive string checks.
  • Solid integration with the existing quantization path (_should_apply_quantization).
  • Clear error messages with actionable suggestions — this is a big UX win.
  • Comprehensive tests and updated documentation (including contribution template).

This PR already makes QuantLLM significantly more robust for recently released models.

Suggested Improvements (Before Merging)

Here are a few concrete refinements I recommend:

  1. Security Warning for trust_remote_code=True
    Since this flag is now more prominently exposed for new models, we should add a prominent warning (both in code and docs) when it is used with unregistered or very new models. Something like:

    if trust_remote_code and not is_registered:
        logger.warning("trust_remote_code=True was enabled for an unregistered architecture. "
                       "Only use this for models from trusted sources.")

    Also add a clear note in docs/guide/loading-models.md.

  2. Improve Fallback Error Message
    The current RuntimeError is good, but we can make it even more helpful by including the resolved base model type and a one-line registration example:

    f"Architecture '{config.model_type}' is not recognized.\n"
    f"Try: register_architecture('{config.model_type}', base_model_type='llama')\n"
    f"or use model_type_override='llama'."
  3. Make base_model_fallback=True the Default
    For maximum “it just works” experience with new models, consider making base_model_fallback=True the default in turbo() and TurboModel.from_pretrained(). Users can still disable it with base_model_fallback=False if needed. This would reduce friction significantly.

  4. Minor Code Polish

    • Ensure _compiled_model_name_pattern uses functools.lru_cache(maxsize=None) for thread-safety and performance.
    • Add a small comment explaining the fallback priority order in _load_model_with_fallback.
    • Verify that when fallback succeeds, quantization (NF4, double quant, etc.) is still applied correctly — maybe add one integration test for 4-bit loading after fallback.
  5. Documentation
    Add one concrete real-world-style example in the loading guide using a hypothetical model released “yesterday” (e.g., Qwen3-8B or similar).

Overall Verdict

This PR is very close to ready — I’d rate it ~85% complete. With the above adjustments (especially the security warning and default fallback behavior), it will be production-ready and a strong addition to the library.

Once these points are addressed, I’m happy to approve and merge.

Let me know if you’d like me to push any of these changes directly or if you have questions about any suggestion.

Thanks again for the solid work!

@codewithdark-git codewithdark-git marked this pull request as ready for review April 25, 2026 07:44
Copilot AI review requested due to automatic review settings April 25, 2026 07:44

This comment was marked as low quality.

Copilot AI and others added 3 commits April 25, 2026 13:28
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/8867f3b4-18ae-4207-b2e8-51444418c7aa

Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
…ates

Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/8867f3b4-18ae-4207-b2e8-51444418c7aa

Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codewithdark-git/QuantLLM/sessions/8867f3b4-18ae-4207-b2e8-51444418c7aa

Co-authored-by: codewithdark-git <144595403+codewithdark-git@users.noreply.github.com>
Copilot AI requested a review from codewithdark-git April 25, 2026 13:40
@codewithdark-git codewithdark-git merged commit c32c63d into main Apr 25, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New LLM Architectures Not Recognized by Transformers — Add Flexible Model Class Registration and Fallback System

3 participants