Instructions to use Joesh1/onca-1.0-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Joesh1/onca-1.0-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Joesh1/onca-1.0-9B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Joesh1/onca-1.0-9B")
model = AutoModelForCausalLM.from_pretrained("Joesh1/onca-1.0-9B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Joesh1/onca-1.0-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Joesh1/onca-1.0-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.0-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Joesh1/onca-1.0-9B

SGLang

How to use Joesh1/onca-1.0-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Joesh1/onca-1.0-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.0-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Joesh1/onca-1.0-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.0-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Joesh1/onca-1.0-9B with Docker Model Runner:
```
docker model run hf.co/Joesh1/onca-1.0-9B
```

onca-1.0-9B

File size: 8,805 Bytes

---
license: apache-2.0
base_model:
- Jackrong/Qwopus3.5-9B-v3
tags:
- oncology
- pancreatic-cancer
- pdac
- clinical-nlp
- medical-llm
- text-generation
- research
language:
- en
pipeline_tag: text-generation
library_name: transformers
---

<p align="center">
  <img src="./assets/onca-logo-horizontal.svg" alt="Onca logo" width="520">
</p>

# Onca 1.0 9B

## Model Summary

Onca 1.0 is an open 9B language model for pancreatic cancer clinical tasks. It is designed for four PDAC-relevant task families:

- clinical trial screening
- case-specific clinical reasoning
- structured pathology report extraction
- molecular variant evidence reasoning

This release is the main FP16/BF16-compatible checkpoint intended as the reference Hugging Face release for the Onca 1.0 model family.

## Base Model

Onca 1.0 is fine-tuned from `Jackrong/Qwopus3.5-9B-v3`, a Qwen3.5-derived 9B dense reasoning model. The released checkpoint reflects task-focused supervised fine-tuning for pancreatic cancer workflows while preserving the underlying Qwen3.5-class architecture and tokenizer setup.

## Training Scope

The model was trained on 37,364 prepared rows from openly available sources. The multitask mixture covers:

- trial eligibility screening
- oncology clinical reasoning
- CAP-aligned pathology abstraction
- CIViC-style variant interpretation

The project was built around an open-data, open-weight, single-workstation pipeline so the workflow can be audited and reproduced without private institutional corpora.

## Intended Use

Onca 1.0 is intended for:

- research on oncology-focused language models
- benchmarking PDAC-oriented clinical NLP workflows
- prototyping structured extraction and screening pipelines
- local experimentation in privacy-sensitive environments

## Out-of-Scope Use

Onca 1.0 is not intended for:

- direct clinical care
- autonomous treatment recommendations
- unsupervised patient-facing use
- deployment as a validated medical device or diagnostic system

This is a research model and does not replace clinician judgment.

## Evaluation Summary

In the companion manuscript, Onca 1.0 was evaluated across 11 panels against Woollie-7B, CancerLLM-7B, OpenBioLLM-8B, and the unfine-tuned Qwopus base. Headline results reported in the draft include:

- Trial Screening: 81.6 F1
- Clinical Reasoning: 14.1 composite
- Pathology Extraction: 30.5 field exact-match
- PubMedQA Cancer: 68.3 macro-F1
- PubMedQA: 66.5 macro-F1

The strongest gains appear in workflow-proximal tasks such as trial review and pathology structuring. Variant evidence reasoning remains more difficult than the other task groups.

## Limitations

- The model is specialized for pancreatic cancer and oncology-adjacent workflows rather than general medicine.
- Training data come from openly available sources rather than private institutional notes, which improves reproducibility but does not fully capture real-world documentation style.
- Benchmark sample sizes for several panels are deliberately limited and should be interpreted with care.
- Performance is uneven across task families and does not imply broad medical competence.

## Usage

This repository contains the main full-precision checkpoint files. A standard `transformers` loading pattern is:

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Joesh1/onca-1.0-9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)
```

Inference formatting should follow the included tokenizer and chat template files in this repository.

### Quick Chat Helper

```python
def run_onca(prompt, system_prompt="You are Onca 1.0, a pancreatic-cancer clinical research assistant."):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.2,
            do_sample=False,
        )
    completion = outputs[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(completion, skip_special_tokens=True)
```

### Example 1: Trial Screening

```python
prompt = """
Task: Trial eligibility screening for pancreatic cancer.

Patient summary:
- 63-year-old with metastatic PDAC
- Liver metastases present
- ECOG 1
- Prior gemcitabine plus nab-paclitaxel
- Total bilirubin 0.9 mg/dL
- ANC 2.4
- Platelets 188
- No active infection
- No brain metastases

Trial criteria:
- Histologically confirmed metastatic pancreatic adenocarcinoma
- ECOG 0-1
- Progression after 1 prior systemic regimen
- Adequate marrow and hepatic function
- Exclude uncontrolled infection or CNS metastases

Return:
1. Eligibility label: eligible / ineligible / unclear
2. Criterion-by-criterion reasoning
3. Missing information, if any
"""

print(run_onca(prompt))
```

### Example 2: Clinical Reasoning

```python
prompt = """
Task: Pancreatic cancer clinical reasoning.

Case:
A 58-year-old patient has borderline resectable PDAC in the pancreatic head.
CA19-9 is elevated. ECOG is 0. Germline testing is pending. No distant metastases
are seen on imaging.

Please provide:
1. A concise assessment
2. A high-level management plan
3. Key factors that could change the plan
4. Important limitations or uncertainties

Do not present this as medical advice. Keep it research-oriented.
"""

print(run_onca(prompt))
```

### Example 3: Pathology Extraction

```python
prompt = """
Task: Structured pathology extraction.

Extract the report into JSON with the following fields:
specimen_type, primary_site, histology, tumor_grade, tumor_size_cm,
margin_status, lymphovascular_invasion, perineural_invasion,
lymph_nodes_examined, lymph_nodes_positive, pT, pN, pM,
ajcc_stage, treatment_effect, tumor_focality, additional_findings

Report:
Whipple resection specimen showing moderately differentiated pancreatic ductal
adenocarcinoma, 3.1 cm, centered in the pancreatic head. Tumor extends into
peripancreatic soft tissue. All margins are negative; closest margin is 0.4 cm
at the uncinate margin. Perineural invasion is present. Lymphovascular invasion
is present. Sixteen lymph nodes examined, 3 positive for metastatic carcinoma.
Pathologic stage: pT2 pN1. No distant metastasis identified in specimen.
"""

print(run_onca(prompt))
```

### Example 4: Variant Evidence Interpretation

```python
prompt = """
Task: Variant evidence reasoning for pancreatic cancer.

Variant:
- Gene: BRCA2
- Alteration: pathogenic loss-of-function variant
- Tumor type: pancreatic ductal adenocarcinoma

Return a JSON object with:
- gene
- alteration
- disease
- evidence_summary
- therapeutic_implication
- diagnostic_implication
- prognostic_implication
- evidence_direction
- confidence

Keep the answer concise and note uncertainty when evidence is incomplete.
"""

print(run_onca(prompt))
```

### Prompting Tips

- Ask for a specific output format such as bullet points or JSON.
- For extraction tasks, list the exact fields you want returned.
- For screening tasks, provide both the patient summary and the trial criteria.
- For reasoning tasks, request uncertainties and missing data explicitly.
- Treat outputs as research artifacts that require expert review.

## Files in This Repository

- `model-00001-of-00004.safetensors` through `model-00004-of-00004.safetensors`: sharded model weights
- `model.safetensors.index.json`: shard index
- `config.json`: model architecture configuration
- `generation_config.json`: default generation settings
- `tokenizer.json` and `tokenizer_config.json`: tokenizer files
- `chat_template.jinja`: chat formatting template

## Related Variants

Quantized releases are provided separately:

- `JosephKBS/onca-1.0-9B-Int8`
- `JosephKBS/onca-1.0-9B-Int4`

## License

This release is provided under the Apache 2.0 license. Users should also review the license and usage terms of the upstream base model and any referenced datasets or benchmarks.

## Citation

If you use Onca 1.0, please cite the accompanying manuscript when publicly available. A temporary reference is:

```bibtex
@misc{shim2026onca,
  title  = {Onca: An Open 9B Language Model for Pancreatic Cancer Clinical Tasks},
  author = {Shim, Kwan Bo},
  year   = {2026},
  note   = {Preprint in preparation}
}
```

## Acknowledgments

This project builds on the work of the Qwen and Qwopus model developers, as well as the many institutions and open-data contributors who created and maintained the public datasets used in training and evaluation.