File size: 1,434 Bytes
a195040
 
 
 
 
618a1c8
a195040
618a1c8
a195040
 
618a1c8
a195040
618a1c8
a195040
618a1c8
 
 
a195040
618a1c8
a195040
 
618a1c8
 
 
a195040
618a1c8
a195040
618a1c8
 
 
a195040
618a1c8
a195040
618a1c8
 
a195040
618a1c8
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
library_name: transformers
tags: []
---

# Nemotron-Diffusion-Exp-Ministral-8B

Developed by [DLER team](https://nv-dler.github.io/) @ NVR and will be updated actively. Contact Yonggan Fu and Pavlo Molchanov for any question.


# Environment

Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_dllm_ministral.sqsh` on CW-DFW. Apply for interactive nodes with the following command:

```
srun -A {account} --partition interactive --time 4:00:00 --gpus 8 --container-image /lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_dllm_ministral.sqsh --container-mounts=$HOME:/home,/lustre:/lustre  --pty bash
```

## Chat with Our Model


```
from transformers import AutoModel, AutoTokenizer
import torch

repo_name = "nvidia/Nemotron-Diffusion-Exp-Ministral-8B"

tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_name, trust_remote_code=True)
model = model.cuda().to(torch.bfloat16)

user_input = input("User: ").strip()

prompt_ids = tokenizer(user_input,return_tensors='pt').input_ids.to(device='cuda')
out_ids, nfe = model.generate(prompt_ids, max_new_tokens=128, steps=128, block_length=32, shift_logits=False, causal_context=True, threshold=0.9)

tokenized_out = tokenizer.batch_decode(out_ids[:, prompt_ids.shape[1]:], skip_special_tokens=True)[0]
print(f"Model: {tokenized_out}")
print(f"[Num Function Eval (NFE)={nfe}]")
```