Instructions to use HaoyuHuang2/DeepRefine-v1-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HaoyuHuang2/DeepRefine-v1-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HaoyuHuang2/DeepRefine-v1-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HaoyuHuang2/DeepRefine-v1-8B")
model = AutoModelForCausalLM.from_pretrained("HaoyuHuang2/DeepRefine-v1-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HaoyuHuang2/DeepRefine-v1-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HaoyuHuang2/DeepRefine-v1-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HaoyuHuang2/DeepRefine-v1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/HaoyuHuang2/DeepRefine-v1-8B

SGLang

How to use HaoyuHuang2/DeepRefine-v1-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HaoyuHuang2/DeepRefine-v1-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HaoyuHuang2/DeepRefine-v1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HaoyuHuang2/DeepRefine-v1-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HaoyuHuang2/DeepRefine-v1-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use HaoyuHuang2/DeepRefine-v1-8B with Docker Model Runner:
```
docker model run hf.co/HaoyuHuang2/DeepRefine-v1-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

DeepRefine-v1-8B

A general LLM-based reasoning model for agent-compiled knowledge refinement that improves the quality of any pre-constructed knowledge bases with user queries to make it more suitable for the downstream tasks.

Simple Demo

Below is a single refinement process with respect to a specfic query and pre-defined results from its interactions with any databases.

DEEPREFINE_JUDGEMENT_SYSTEM_PROMPT = """
As an advanced judgement assistant, your task is to judge whether the given question is answerable based on the provided KG context.

Evaluate whether the given question is answerable based on the provided KG context. Output your judgment in the following format:
<judge>Yes</judge> or <judge>No</judge>

**Important:** You must think carefully about the question and the KG context before making your judgment. And output your judgment result directly in the specified format.
"""

DEEPREFINE_JUDGEMENT_USER_PROMPT = """
Question: {question}
Knowledge Graph (KG) context: {triples_string}
"""

DEEPREFINE_ERROR_ABDUCTION_SYSTEM_PROMPT = """
As an advanced error abduction assistant, your task is to analyze the error reasons based on the given interaction history.

Analyze the reasons of the unanswerable questions based on the given interaction history from the incompleteness, incorrectness, and redundancy perspectives. Output your analysis in the following format:
<abduction>...</abduction>

**Important:** You must think carefully about the interaction history before making your analysis. And output your analysis result directly in the specified format.
"""

DEEPREFINE_ERROR_ABDUCTION_USER_PROMPT = """
Interaction history: {interaction_history}
"""

DEEPREFINE_ACTION_SYSTEM_PROMPT = """
As an advanced knowledge graph refinement assistant, your task is to generate a series of actions (**within 10 actions**) to refine the given KG to make it more suitable for answering the given question.

Based on the given KG and the analysed error reasons, refine the given KG to make it more easily for retrieval and answering the given question. You have the following three types of actions to conduct:

- insert_edge(subject, relation, object): Insert a new edge into the KG to complete the missing information.
- delete_edge(subject, relation, object): Delete an edge from the KG to remove the redundant information or conflicting information.
- replace_node(old_entity, new_entity): Replace an entity in the KG to correct the errors or deal with disambiguation.

Output a series of actions (**within 10 actions**) in the following format:
<refinement>insert_edge("...", "...", "...")|delete_edge("...", "...", "...")|replace_node("...", "...")|...</refinement>

**Important:** You must think carefully about the given KG and the analysed error reasons before making your refinement. DO NOT DELETE ANY IRRELEVANT TRIPLES FROM THE ORIGINAL KG. TRY TO KEEP THE ORIGINAL KG AS MUCH AS POSSIBLE. DO NOT GENERATE TOO MANY ACTIONS. And output your refinement result directly in the specified format.
"""

DEEPREFINE_ACTION_USER_PROMPT = """
Original Text: {original_text}
KG: {triples_string}
Question: {question}
Error reasons: {error_reasons}
"""

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "HaoyuHuang/DeepRefine-v1-8B"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
    device_map="auto" if device == "cuda" else None,
    trust_remote_code=True,
)
if device != "cuda":
    model = model.to(device)

def call_model(system_prompt: str, user_prompt: str, max_new_tokens: int = 8192) -> str:
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            temperature=0.0,
        )

    generated_ids = outputs[0, inputs["input_ids"].shape[-1]:]
    return tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

@dataclass
class RetrievalStepResult:
    """
    For single step inference result, for debugging / analysis.
    """
    num_hops: int
    base_top_k: int
    query: str
    retrieved_subgraph: List[Dict[str, str]]
    raw_response: str
    answerable: bool
    answer: Optional[str] = None

question = "your_question"
triples_string = "retrieved_triples"
interaction_history = "list_of_RetrievalStepResult"
original_text = "triple_related_original_text"

## Answerability Judgement Call
judgement_user_prompt = DEEPREFINE_JUDGEMENT_USER_PROMPT.format(
    question=question,
    triples_string=triples_string,
)
judgement_result = call_model(
    DEEPREFINE_JUDGEMENT_SYSTEM_PROMPT,
    judgement_user_prompt
)
print("Answerability Judgement:", judgement_result)


## Error Abduction Call
abduction_user_prompt = DEEPREFINE_ERROR_ABDUCTION_USER_PROMPT.format(
    interaction_history=interaction_history,
)
abduction_result = call_model(
    DEEPREFINE_ERROR_ABDUCTION_SYSTEM_PROMPT,
    abduction_user_prompt,
)
print("Error Abduction:", abduction_result)


## Refinement Actions Generation Call
actions_user_prompt = DEEPREFINE_ACTION_USER_PROMPT.format(
    original_text=original_text,
    triples_string=triples_string,
    question=question,
    error_reasons=abduction_result,
)
actions_result = call_model(
    DEEPREFINE_ACTION_SYSTEM_PROMPT,
    actions_user_prompt,
)
print("Refinement Actions:", actions_result)

## parse the refinement actions and make an interface with the operators of your own databases
# ...

Downloads last month: 80

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for HaoyuHuang2/DeepRefine-v1-8B

Quantizations

1 model

Collection including HaoyuHuang2/DeepRefine-v1-8B

DeepRefine

Collection

2 items • Updated about 8 hours ago