Skip to content

New LLM Architectures Not Recognized by Transformers — Add Flexible Model Class Registration and Fallback System #26

@codewithdark-git

Description

@codewithdark-git

Description
When a brand-new model (released 1-2 days ago) is quantized via QuantLLM, loading fails with an error that the architecture is not recognized by the transformers package.

QuantLLM currently supports 45+ architectures (Llama 2/3, Mistral, Qwen, Phi, Gemma, etc.) through TurboModel and turbo() loading. However, it relies heavily on Hugging Face transformers AutoModel classes without sufficient fallback or dynamic registration for very recent models.

Unsloth handles this by using flexible model class resolution (resolve_model_class with fallbacks) and quickly adding targeted fixes/custom kernels, allowing new models to load and quantize even when standard transformers fails.

Current Behavior

model = turbo("new-model-org/NewModel-7B")  # Fails if architecture not yet in transformers

Expected Behavior
Support new models within days of release by:

  • Dynamic architecture mapping and registration
  • Fallback to base classes with manual config overrides
  • Auto-detection of common patterns (e.g., Llama-like, Qwen-like)

Proposed Fix

  • In core/turbo_model.py and model registration logic, implement a registry system (similar to PEFT or Unsloth-style resolution).
  • Add register_architecture() utility for rapid community/PR support of new models.
  • Allow loading with trust_remote_code=True + custom model_type overrides when auto-class fails.
  • Include a from_config_only or base_model_fallback mode for newest releases.
  • Document a fast contribution path for new architectures (already mentioned in README, but make it actionable with templates).

This will resolve the immediate "architecture not recognized" error for recent models.

Metadata

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions