=== New version out: v1.1.0! ===
I want to know what this is.
These models are Bidirectional LSTMs with attention.
How to run?
Each models version folder includes a script to run, however it also has training capabilities, for inference set 'RETRAIN' and 'continueTrain' to False, continueTrain continues training from the last check point, RETRAIN trains the model from scratch and overrides any models that may have the same file name. Directory layout: folderWhereTheIAMAMscriptIs/ = Script model/ = tokenizer.pkl = chatbot.keras
Note: Ensure tokenizer.pkl is in the same directory as the script.
But i want to run this NOW! (or not download it)
Theres a HF space for the 2M param model that Does use beam search:
- https://huggingface.co/spaces/DJF-on-arm/IAMAM-DEMO-2M (This demo uses beam search)
- https://huggingface.co/spaces/DJF-on-arm/IAMAM-Math-Autobiography-generator (This demo uses beam search, use it to make storys on math!)
How was this trained?
In each models versions folder, a rough sorta guide will tell you how that model was trained.
This is cool, but whats the best one here and what shoud I use?
There are some you should use and some you shouldn't:
- version 1.0.x (2M params) Its mid.
- version 1.1.x (40M param) DO NOT USE shows the same or marginally better than the v1.0.0 2M model, unless your using it to make math autobiographys!
- version 1.2.x (7M param) This is a great base model consider using this! I recomend using or trying version 1.2.x
Why are some of these spitting out junk? (limitations)
These models may produce incoherent or incorrect outputs because:
- They are not fine-tuned using reinforcement learning (e.g., PPO)
- They could be under trained
- They do not use of have used RLHF
- They are trained primarily on supervised data only
- No system prompts or instruction tuning are applied
- Some models are trained from scratch without leveraging external pretrained base models*
*: Some models will be trained off of other IAMAM models but will never use base models that are not part of the IAMAM project!
Why does it say some (probally all) models .keras files are 'unsafe'?
Because i'm using callbacks that are custom python definitions, theres .keras files should not be unsafe.
However, if you do not trust me or don't want to take any risk what-so-ever or this just feels off to you YOU DON'T HAVE TO BELIVE ME (Please don't go around trusting random people online or random models that don't have the code used to train them public and that you can verify that was the code in the model!)
Well how can i know it is safe?
If you want to audit the custom logic before running it, you can load the model's metadata using:
import keras
# This allows you to see the architecture/config without executing the custom code
config = keras.saving.load_v3_config("path_to_model.keras")
print(config)
- Downloads last month
- 2,844