Instructions to use Joesh1/onca-1.0-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Joesh1/onca-1.0-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Joesh1/onca-1.0-9B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Joesh1/onca-1.0-9B")
model = AutoModelForCausalLM.from_pretrained("Joesh1/onca-1.0-9B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Joesh1/onca-1.0-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Joesh1/onca-1.0-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.0-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Joesh1/onca-1.0-9B

SGLang

How to use Joesh1/onca-1.0-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Joesh1/onca-1.0-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.0-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Joesh1/onca-1.0-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joesh1/onca-1.0-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Joesh1/onca-1.0-9B with Docker Model Runner:
```
docker model run hf.co/Joesh1/onca-1.0-9B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Onca 1.0 9B

Model Summary

Onca 1.0 is an open 9B language model for pancreatic cancer clinical tasks. It is designed for four PDAC-relevant task families:

clinical trial screening
case-specific clinical reasoning
structured pathology report extraction
molecular variant evidence reasoning

This release is the main FP16/BF16-compatible checkpoint intended as the reference Hugging Face release for the Onca 1.0 model family.

Base Model

Onca 1.0 is fine-tuned from Jackrong/Qwopus3.5-9B-v3, a Qwen3.5-derived 9B dense reasoning model. The released checkpoint reflects task-focused supervised fine-tuning for pancreatic cancer workflows while preserving the underlying Qwen3.5-class architecture and tokenizer setup.

Training Scope

The model was trained on 37,364 prepared rows from openly available sources. The multitask mixture covers:

trial eligibility screening
oncology clinical reasoning
CAP-aligned pathology abstraction
CIViC-style variant interpretation

The project was built around an open-data, open-weight, single-workstation pipeline so the workflow can be audited and reproduced without private institutional corpora.

Intended Use

Onca 1.0 is intended for:

research on oncology-focused language models
benchmarking PDAC-oriented clinical NLP workflows
prototyping structured extraction and screening pipelines
local experimentation in privacy-sensitive environments

Out-of-Scope Use

Onca 1.0 is not intended for:

direct clinical care
autonomous treatment recommendations
unsupervised patient-facing use
deployment as a validated medical device or diagnostic system

This is a research model and does not replace clinician judgment.

Evaluation Summary

In the companion manuscript, Onca 1.0 was evaluated across 11 panels against Woollie-7B, CancerLLM-7B, OpenBioLLM-8B, and the unfine-tuned Qwopus base. Headline results reported in the draft include:

Trial Screening: 81.6 F1
Clinical Reasoning: 14.1 composite
Pathology Extraction: 30.5 field exact-match
PubMedQA Cancer: 68.3 macro-F1
PubMedQA: 66.5 macro-F1

The strongest gains appear in workflow-proximal tasks such as trial review and pathology structuring. Variant evidence reasoning remains more difficult than the other task groups.

Limitations

The model is specialized for pancreatic cancer and oncology-adjacent workflows rather than general medicine.
Training data come from openly available sources rather than private institutional notes, which improves reproducibility but does not fully capture real-world documentation style.
Benchmark sample sizes for several panels are deliberately limited and should be interpreted with care.
Performance is uneven across task families and does not imply broad medical competence.

Usage

This repository contains the main full-precision checkpoint files. A standard transformers loading pattern is:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Joesh1/onca-1.0-9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

Inference formatting should follow the included tokenizer and chat template files in this repository.

Quick Chat Helper

def run_onca(prompt, system_prompt="You are Onca 1.0, a pancreatic-cancer clinical research assistant."):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.2,
            do_sample=False,
        )
    completion = outputs[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(completion, skip_special_tokens=True)

Example 1: Trial Screening

prompt = """
Task: Trial eligibility screening for pancreatic cancer.

Patient summary:
- 63-year-old with metastatic PDAC
- Liver metastases present
- ECOG 1
- Prior gemcitabine plus nab-paclitaxel
- Total bilirubin 0.9 mg/dL
- ANC 2.4
- Platelets 188
- No active infection
- No brain metastases

Trial criteria:
- Histologically confirmed metastatic pancreatic adenocarcinoma
- ECOG 0-1
- Progression after 1 prior systemic regimen
- Adequate marrow and hepatic function
- Exclude uncontrolled infection or CNS metastases

Return:
1. Eligibility label: eligible / ineligible / unclear
2. Criterion-by-criterion reasoning
3. Missing information, if any
"""

print(run_onca(prompt))

Example 2: Clinical Reasoning

prompt = """
Task: Pancreatic cancer clinical reasoning.

Case:
A 58-year-old patient has borderline resectable PDAC in the pancreatic head.
CA19-9 is elevated. ECOG is 0. Germline testing is pending. No distant metastases
are seen on imaging.

Please provide:
1. A concise assessment
2. A high-level management plan
3. Key factors that could change the plan
4. Important limitations or uncertainties

Do not present this as medical advice. Keep it research-oriented.
"""

print(run_onca(prompt))

Example 3: Pathology Extraction

prompt = """
Task: Structured pathology extraction.

Extract the report into JSON with the following fields:
specimen_type, primary_site, histology, tumor_grade, tumor_size_cm,
margin_status, lymphovascular_invasion, perineural_invasion,
lymph_nodes_examined, lymph_nodes_positive, pT, pN, pM,
ajcc_stage, treatment_effect, tumor_focality, additional_findings

Report:
Whipple resection specimen showing moderately differentiated pancreatic ductal
adenocarcinoma, 3.1 cm, centered in the pancreatic head. Tumor extends into
peripancreatic soft tissue. All margins are negative; closest margin is 0.4 cm
at the uncinate margin. Perineural invasion is present. Lymphovascular invasion
is present. Sixteen lymph nodes examined, 3 positive for metastatic carcinoma.
Pathologic stage: pT2 pN1. No distant metastasis identified in specimen.
"""

print(run_onca(prompt))

Example 4: Variant Evidence Interpretation

prompt = """
Task: Variant evidence reasoning for pancreatic cancer.

Variant:
- Gene: BRCA2
- Alteration: pathogenic loss-of-function variant
- Tumor type: pancreatic ductal adenocarcinoma

Return a JSON object with:
- gene
- alteration
- disease
- evidence_summary
- therapeutic_implication
- diagnostic_implication
- prognostic_implication
- evidence_direction
- confidence

Keep the answer concise and note uncertainty when evidence is incomplete.
"""

print(run_onca(prompt))

Prompting Tips

Ask for a specific output format such as bullet points or JSON.
For extraction tasks, list the exact fields you want returned.
For screening tasks, provide both the patient summary and the trial criteria.
For reasoning tasks, request uncertainties and missing data explicitly.
Treat outputs as research artifacts that require expert review.

Files in This Repository

model-00001-of-00004.safetensors through model-00004-of-00004.safetensors: sharded model weights
model.safetensors.index.json: shard index
config.json: model architecture configuration
generation_config.json: default generation settings
tokenizer.json and tokenizer_config.json: tokenizer files
chat_template.jinja: chat formatting template

Related Variants

Quantized releases are provided separately:

JosephKBS/onca-1.0-9B-Int8
JosephKBS/onca-1.0-9B-Int4

License

This release is provided under the Apache 2.0 license. Users should also review the license and usage terms of the upstream base model and any referenced datasets or benchmarks.

Citation

If you use Onca 1.0, please cite the accompanying manuscript when publicly available. A temporary reference is:

@misc{shim2026onca,
  title  = {Onca: An Open 9B Language Model for Pancreatic Cancer Clinical Tasks},
  author = {Shim, Kwan Bo},
  year   = {2026},
  note   = {Preprint in preparation}
}

Acknowledgments

This project builds on the work of the Qwen and Qwopus model developers, as well as the many institutions and open-data contributors who created and maintained the public datasets used in training and evaluation.