MatchMiner-AI-1225
Collection
9 items • Updated
This is a 'tiny' masked language model fine-tuned on synthetic oncology clinical text from prajjwal1/bert-tiny as a preparatory step to training TinyBertOncoTagger.
Training data: https://huggingface.co/datasets/ksg-dfci/mmai-synthetic/blob/main/all_synthetic_notes.parquet
Training script: https://github.com/kenlkehl/matchminer-ai-training/blob/main/3b_train_tiny_oncbert.py
Training script call:
accelerate launch 3b_train_tiny_oncbert.py
--data trial_space_lineitems.csv:trial_text
trial_space_lineitems.csv:this_space
trial_space_lineitems.csv:trial_boilerplate_text
all_synthetic_notes.parquet:synthetic_note
--output_dir ./onc_bert_tiny
--per_device_train_batch_size 64
Base model
prajjwal1/bert-tiny