RWKV7-G1 "GooseOne" pure RNN reasoning model
These are BASE models (pretrained with web/code/synthetic + instruction/chat/reasoning data), suitable for post-training and fine-tuning (check https://huggingface.co/spaces/Jellyfish042/UncheatableEval to see their performance at language modeling).
More info & Gradio demo: https://rwkv.com/
Search "RWKV Chat" in play store / app store for our local inference app
RWKV Chat: https://rwkv.halowang.cloud/ (local inference for mobile/desktop) and https://github.com/RWKV-APP/RWKV_APP
GGUF: https://huggingface.co/collections/shoumenchougou/rwkv7-gxx-gguf
Ollama GGUF: https://ollama.com/mollysama
RWKV-7 pth => GGUF script: https://github.com/MollySophia/rwkv-mobile/blob/master/converter/convert_rwkv_pth_to_gguf.py
Training: https://github.com/BlinkDL/RWKV-LM and https://github.com/Joluck/RWKV-PEFT
Efficient inference: https://github.com/BlinkDL/Albatross
- 145+ token/s RWKV-7 7.2B fp16 bsz1 decoding @ RTX5090 (always const speed & vram)
- 10250+ token/s RWKV-7 7.2B fp16 bsz960 decoding @ RTX5090 (always const speed & vram)
- 9650+ token/s RWKV-7 7.2B fp16 bsz320 decoding @ RTX5090 (always const speed & vram)
- 11289 token/s RWKV-7 7.2B fp16 bsz1 prefill @ RTX5090 (always const speed & vram)
pip inference: https://pypi.org/project/rwkv/
mobile inference: https://github.com/MollySophia/rwkv-mobile
Please always use latest models (with newest date) (better at everything).
Note: rwkv7a has DeepEmbed
Decoding Suggestion (note: this is for RWKV pip pkg, which apply temp after topp):
Chat: temp 1, topp 0.5, alpha_presence 2, alpha_frequency 0.1, alpha_decay 0.99
Creative (great for fiction etc.): temp 0.6, topp 0.6 ~ 0.8, alpha_presence 2, alpha_frequency 0.2, alpha_decay 0.99
There should not be any space at the end of your input (so strip it) or you will upset the tokenizer and see non-English reponse.
Chat prompt (note: better replace all \n\n in USER_PROMPT to \n as i am using \n\n as "chat round separator" in pretrain data):
System: YOU_CAN_USE_SYSTEM_IF_NEEDED
User: PREVIOUS_STUFF
Assistant: PREVIOUS_STUFF
User: USER_PROMPT
Assistant:
Think prompt (for hard prompts):
User: USER_PROMPT
Assistant: <think
Fake think prompt (great result, highly recommended):
User: USER_PROMPT
Assistant: <think></think
Think prompt, alternative style, for G1c and newer models. Note there is a space before the "(think)" after USER_PROMPT:
User: USER_PROMPT (think)
Assistant: <think
Shorter think, same style:
User: USER_PROMPT (think a bit)
Assistant: <think
Longer think, same style:
User: USER_PROMPT (think a lot)
Assistant: <think
FIM prompt (for G1c and newer models, works for text & code & everything):
✿prefix✿When I was young, I only liked to✿suffix✿and that’s how first I got interested in AI research.✿middle✿
Better (recommended):
✿prefix✿✿suffix✿and that’s how first I got interested in AI research.✿middle✿When I was young, I only liked to
Note "✿" will always be tokenized to one single token in RWKV tokenizer, so I picked it.
Gxx = Data Version
G0x = less than 1 epoch, as training 1 epoch for a large model is expensive :(
G0 G0a G0a2 G0a3 ... G0b ... = adding more (newer and better) data, so G0a has better quality (but less) data than G1
G1x = more than 1 epoch
G1 G1a G1a2 G1a3 ... G1b ... = adding more (newer and better) data, note G1a has better quality (and more) data than G0a