RWKV7-G1 "GooseOne" Pure RNN Reasoning Model (Safetensors)

This repository contains safetensors converted versions of the BlinkDL/rwkv7-g1 models.

Attribution: This is an unofficial conversion. The original model and all research credit belongs to BlinkDL. The documentation below is reproduced from the original repository.

Using with Candle (Rust)

These safetensors weights can be used with the Candle ML framework:

RWKV Candle Example: https://github.com/huggingface/candle/tree/main/candle-examples/examples/rwkv

These are BASE models (pretrained with web/code/synthetic + instruction/chat/reasoning data), suitable for post-training and fine-tuning (check https://huggingface.co/spaces/Jellyfish042/UncheatableEval to see their performance at language modeling).

More info & Gradio demo: https://rwkv.com/

For developers: https://github.com/BlinkDL/RWKV-LM

RWKV-7 pth => GGUF script: https://github.com/MollySophia/rwkv-mobile/blob/master/converter/convert_rwkv_pth_to_gguf.py

GGUF: https://huggingface.co/collections/shoumenchougou/rwkv7-gxx-gguf

Ollama GGUF: https://ollama.com/mollysama

Use rwkv pip package 0.8.32+ for RWKV-7 inference: https://pypi.org/project/rwkv/

Note: rwkv7a has DeepEmbed

Efficient inference project: https://github.com/BlinkDL/Albatross

RWKV APP: https://github.com/RWKV-APP/RWKV_APP (local inference on Android/iOS)

Please always use latest models (with newest date) (better at everything).

Decoding Suggestion (note: this is for RWKV pip pkg, which apply temp after topp):

Chat: temp 1, topp 0.5, alpha_presence 2, alpha_frequency 0.1, alpha_decay 0.99

Creative (great for fiction etc.): temp 0.6, topp 0.6 ~ 0.8, alpha_presence 2, alpha_frequency 0.2, alpha_decay 0.99

There should not be any space at the end of your input (so strip it) or you will upset the tokenizer and see non-English reponse.

Chat prompt (note: better replace all \n\n in USER_PROMPT to \n as i am using \n\n as "chat round separator" in pretrain data):

System: YOU_CAN_USE_SYSTEM_IF_NEEDED

User: PREVIOUS_STUFF

A: PREVIOUS_STUFF

User: USER_PROMPT

A:

Think prompt (for hard prompts):

User: USER_PROMPT

A: <think

Fake think prompt (great result, highly recommended):

User: USER_PROMPT

A: <think></think

Think prompt, alternative style, for G1c and newer models. Note there is a space before the "(think)" after USER_PROMPT:

User: USER_PROMPT (think)

A: <think

Shorter think, same style:

User: USER_PROMPT (think a bit)

A: <think

Longer think, same style:

User: USER_PROMPT (think a lot)

A: <think

FIM prompt (for G1c and newer models, works for text & code & everything):

✿prefix✿When I was young, I only liked to✿suffix✿and that's how first I got interested in AI research.✿middle✿

Better (recommended):

✿prefix✿✿suffix✿and that's how first I got interested in AI research.✿middle✿When I was young, I only liked to

Note "✿" will always be tokenized to one single token in RWKV tokenizer, so I picked it.

Gxx = Data Version

G0x = less than 1 epoch, as training 1 epoch for a large model is expensive :(
G0 G0a G0a2 G0a3 ... G0b ... = adding more (newer and better) data, so G0a has better quality (but less) data than G1

G1x = more than 1 epoch
G1 G1a G1a2 G1a3 ... G1b ... = adding more (newer and better) data, note G1a has better quality (and more) data than G0a

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for DanielClough/rwkv7-g1-safetensors

Base model

BlinkDL/rwkv7-g1

Finetuned

(18)

this model

DanielClough
/

rwkv7-g1-safetensors

RWKV7-G1 "GooseOne" Pure RNN Reasoning Model (Safetensors)

Using with Candle (Rust)

Model tree for DanielClough/rwkv7-g1-safetensors

Datasets used to train DanielClough/rwkv7-g1-safetensors