Description
When a brand-new model (released 1-2 days ago) is quantized via QuantLLM, loading fails with an error that the architecture is not recognized by the transformers package.
QuantLLM currently supports 45+ architectures (Llama 2/3, Mistral, Qwen, Phi, Gemma, etc.) through TurboModel and turbo() loading. However, it relies heavily on Hugging Face transformers AutoModel classes without sufficient fallback or dynamic registration for very recent models.
Unsloth handles this by using flexible model class resolution (resolve_model_class with fallbacks) and quickly adding targeted fixes/custom kernels, allowing new models to load and quantize even when standard transformers fails.
Current Behavior
model = turbo("new-model-org/NewModel-7B") # Fails if architecture not yet in transformers
Expected Behavior
Support new models within days of release by:
- Dynamic architecture mapping and registration
- Fallback to base classes with manual config overrides
- Auto-detection of common patterns (e.g., Llama-like, Qwen-like)
Proposed Fix
- In
core/turbo_model.py and model registration logic, implement a registry system (similar to PEFT or Unsloth-style resolution).
- Add
register_architecture() utility for rapid community/PR support of new models.
- Allow loading with
trust_remote_code=True + custom model_type overrides when auto-class fails.
- Include a
from_config_only or base_model_fallback mode for newest releases.
- Document a fast contribution path for new architectures (already mentioned in README, but make it actionable with templates).
This will resolve the immediate "architecture not recognized" error for recent models.
Description
When a brand-new model (released 1-2 days ago) is quantized via QuantLLM, loading fails with an error that the architecture is not recognized by the
transformerspackage.QuantLLM currently supports 45+ architectures (Llama 2/3, Mistral, Qwen, Phi, Gemma, etc.) through
TurboModelandturbo()loading. However, it relies heavily on Hugging FacetransformersAutoModel classes without sufficient fallback or dynamic registration for very recent models.Unsloth handles this by using flexible model class resolution (
resolve_model_classwith fallbacks) and quickly adding targeted fixes/custom kernels, allowing new models to load and quantize even when standardtransformersfails.Current Behavior
Expected Behavior
Support new models within days of release by:
Proposed Fix
core/turbo_model.pyand model registration logic, implement a registry system (similar to PEFT or Unsloth-style resolution).register_architecture()utility for rapid community/PR support of new models.trust_remote_code=True+ custommodel_typeoverrides when auto-class fails.from_config_onlyorbase_model_fallbackmode for newest releases.This will resolve the immediate "architecture not recognized" error for recent models.