cmpatino HF Staff commited on
Commit
b0df9a3
·
verified ·
1 Parent(s): d10cc5f

docs: simplify Usage to from_pretrained (now works after modeling fix)

Browse files
Files changed (1) hide show
  1. README.md +5 -15
README.md CHANGED
@@ -74,23 +74,13 @@ This model implements key DeepSeek-V4 innovations at a miniature scale:
74
 
75
  ```python
76
  import torch
77
- from safetensors.torch import load_file
78
- from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
79
- from huggingface_hub import hf_hub_download
80
-
81
- # Load model (recommended: manual load for reliability)
82
- config = AutoConfig.from_pretrained("cmpatino/nanowhale-100m-base", trust_remote_code=True)
83
- model = AutoModelForCausalLM.from_config(config, trust_remote_code=True).float()
84
-
85
- # Download and load weights
86
- weights_path = hf_hub_download("cmpatino/nanowhale-100m-base", "model.safetensors")
87
- state_dict = load_file(weights_path)
88
- model.load_state_dict(state_dict, strict=True)
89
- model = model.cuda().eval()
90
 
 
 
 
91
  tokenizer = AutoTokenizer.from_pretrained("cmpatino/nanowhale-100m-base")
92
 
93
- # Generate
94
  input_ids = tokenizer.encode("The meaning of life is", return_tensors="pt").cuda()
95
  output = model.generate(input_ids, max_new_tokens=100, temperature=0.7, top_p=0.9,
96
  pad_token_id=tokenizer.eos_token_id)
@@ -102,7 +92,7 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
102
  - **Small model**: 110M params with 129K vocab means ~37% of parameters are in embeddings, limiting model capacity
103
  - **Limited training**: Only 5K steps / 2.6B tokens — significantly undertrained compared to production models
104
  - **Pretrained only**: This is a base model without instruction tuning. Outputs are language-model completions, not conversations.
105
- - **bf16 NaN**: Use fp32 — the Hyper-Connections architecture produces values that overflow bf16 range at this scale.
106
  - **Custom architecture**: Requires `trust_remote_code=True`
107
 
108
  ## License
 
74
 
75
  ```python
76
  import torch
77
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
+ model = AutoModelForCausalLM.from_pretrained(
80
+ "cmpatino/nanowhale-100m-base", trust_remote_code=True, dtype=torch.float32
81
+ ).cuda().eval()
82
  tokenizer = AutoTokenizer.from_pretrained("cmpatino/nanowhale-100m-base")
83
 
 
84
  input_ids = tokenizer.encode("The meaning of life is", return_tensors="pt").cuda()
85
  output = model.generate(input_ids, max_new_tokens=100, temperature=0.7, top_p=0.9,
86
  pad_token_id=tokenizer.eos_token_id)
 
92
  - **Small model**: 110M params with 129K vocab means ~37% of parameters are in embeddings, limiting model capacity
93
  - **Limited training**: Only 5K steps / 2.6B tokens — significantly undertrained compared to production models
94
  - **Pretrained only**: This is a base model without instruction tuning. Outputs are language-model completions, not conversations.
95
+ - **fp32 recommended**: The Hyper-Connections architecture can produce values that overflow bf16 range at this scale. Use `dtype=torch.float32`.
96
  - **Custom architecture**: Requires `trust_remote_code=True`
97
 
98
  ## License