PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning
Paper • 2603.03331 • Published
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import neurokit2 as nk
model = AutoModelForCausalLM.from_pretrained(
"Manhph2211/PulseLM",
trust_remote_code=True,
dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Manhph2211/PulseLM", trust_remote_code=True)
model.eval()
device = next(model.parameters()).device
ppg = nk.ppg_simulate(duration=10, sampling_rate=125, heart_rate=50)
ppg = torch.from_numpy(ppg).unsqueeze(0).to(device)
messages = [
{"role": "system", "content": (
"You are a physiological signal analysis expert. "
)},
{"role": "user", "content": (
"Classify the heart rate level shown in this PPG segment."
)},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(device)
with torch.no_grad():
output_ids = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
ppg=ppg,
max_new_tokens=32,
do_sample=False,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True).strip())
If you find this work useful, please consider citing our paper:
@misc{pham2026pulselmfoundationdatasetbenchmark,
title={PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning},
author={Hung Manh Pham and Jinyang Wu and Xiao Ma and Yiming Zhang and Yixin Xu and Aaqib Saeed and Bin Zhu and Zhou Pan and Dong Ma},
year={2026},
eprint={2603.03331},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.03331},
}
Base model
meta-llama/Llama-3.1-8B