| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-1.7B |
| tags: |
| - scaling-laws |
| - neural-scaling |
| - performance-prediction |
| - configuration-to-performance |
| - pytorch |
| library_name: transformers |
| --- |
| |
| # NCPL-intermediate: Neural Configuration to Performance Scaling Law |
|
|
| This model predicts the performance of neural network configurations using scaling laws. It is trained on the Marin and StepLaw datasets to forecast performance metrics based on model configurations. |
|
|
| ## Model Description |
|
|
| **NCPL-intermediate** (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that: |
|
|
| - Takes pretraining configurations as input |
| - Predicts intermediate performance metrics using learned scaling law patterns |
| - Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP |
| - Supports multiple scaling law formulations (Marin, StepLaw) |
|
|
| ### Architecture |
|
|
| The model consists of: |
|
|
| 1. **Base Model**: Qwen/Qwen3-1.7B |
| - Provides contextual embeddings for text tokens |
|
|
| 2. **Numeric MLP**: |
| - Processes numeric values (performance metrics, configuration parameters) |
| - Projects numeric inputs to the same hidden dimension as text embeddings |
| - Architecture: Linear(1 → 2*hidden_size) → ReLU → Linear(2*hidden_size → hidden_size) |
|
|
| 3. **Prediction Head**: |
| - Linear layer mapping from hidden_size to scalar predictions |
| - Outputs performance forecasts for each token position |
| |
| ## Training Data |
| |
| The model was trained on: |
| |
| - **Datasets**: Marin and StepLaw scaling law datasets |
| - **Training configuration**: |
| - Stage 1: 10 epochs with learning rate 5e-5 (frozen base model) |
| - Stage 2: 400 epochs with learning rate 1e-5 (full fine-tuning) |
| - Batch size: 480 (across 8 GPUs) |
| - Weight decay: 0.01 |
| - Loss: MSE (Mean Squared Error) |
| |
| ## Usage |
| |
| The `ScalingLawForecaster` class can be found in the [GitHub repository](https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law). |
| |
| ```python |
| import torch |
| from transformers import AutoTokenizer |
| # Get ScalingLawForecaster from: https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law |
| from model import ScalingLawForecaster |
| |
| # Load model |
| model = ScalingLawForecaster( |
| base_model_name="Qwen/Qwen3-1.7B", |
| init_from_pretrained=True, |
| force_fp32=True |
| ) |
|
|
| # Load checkpoint |
| checkpoint = torch.load("pytorch_model.bin") |
| model.load_state_dict(checkpoint["model_state_dict"]) |
| model.eval() |
| |
| # Load tokenizer |
| tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B") |
|
|
| # Prepare inputs |
| # input_ids: tokenized text sequence |
| # is_number_mask: boolean mask indicating which tokens are numeric |
| # number_values_filled: actual numeric values (0 for non-numeric tokens) |
| |
| with torch.no_grad(): |
| predictions = model( |
| input_ids=input_ids, |
| is_number_mask=is_number_mask, |
| number_values_filled=number_values_filled, |
| attention_mask=attention_mask |
| ) |
| ``` |
| |
| ## Input Format |
|
|
| The model expects three key inputs: |
|
|
| 1. **input_ids** (torch.LongTensor): Tokenized sequence with special numeric tokens |
| 2. **is_number_mask** (torch.BoolTensor): Boolean mask marking numeric token positions |
| 3. **number_values_filled** (torch.FloatTensor): Actual numeric values at marked positions |
| |
| ## Intended Use |
| |
| This model is designed for: |
| |
| - **Scaling law research**: Understanding how neural network performance scales with configuration |
| - **Performance forecasting**: Predicting model performance before full training |
| - **Configuration optimization**: Finding optimal hyperparameters based on scaling patterns |
| - **Resource planning**: Estimating computational requirements for different model sizes |
| |
| ## Limitations |
| |
| - Trained specifically on Marin and StepLaw datasets; generalization to other settings likely require at least finetuning |
| - Requires properly formatted inputs with numeric tokens replaced and masked |
| |
| ## Citation |
| |
| If you use this model in your research, please cite: |
| |
| ```bibtex |
| @article{ncpl2026, |
| title = {Neural Configuration to Performance Scaling Law}, |
| author = {Huaqing Zhang and Kaiyue Wen and Tengyu Ma}, |
| journal = {arXiv preprint arXiv:2602.10300}, |
| year = {2026}, |
| url = {https://www.arxiv.org/abs/2602.10300} |
| } |
| ``` |
| |