Instructions to use IAAR-Shanghai/MemPrivacy-4B-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IAAR-Shanghai/MemPrivacy-4B-RL with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="IAAR-Shanghai/MemPrivacy-4B-RL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("IAAR-Shanghai/MemPrivacy-4B-RL")
model = AutoModelForCausalLM.from_pretrained("IAAR-Shanghai/MemPrivacy-4B-RL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use IAAR-Shanghai/MemPrivacy-4B-RL with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IAAR-Shanghai/MemPrivacy-4B-RL"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IAAR-Shanghai/MemPrivacy-4B-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/IAAR-Shanghai/MemPrivacy-4B-RL

SGLang

How to use IAAR-Shanghai/MemPrivacy-4B-RL with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IAAR-Shanghai/MemPrivacy-4B-RL" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IAAR-Shanghai/MemPrivacy-4B-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IAAR-Shanghai/MemPrivacy-4B-RL" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IAAR-Shanghai/MemPrivacy-4B-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use IAAR-Shanghai/MemPrivacy-4B-RL with Docker Model Runner:
```
docker model run hf.co/IAAR-Shanghai/MemPrivacy-4B-RL
```

MemPrivacy-4B-RL / README.md

nielsr HF Staff

Improve model card metadata and paper referencing

8042792 verified 3 days ago

preview code

raw

history blame

6.06 kB

	---
	base_model:
	- Qwen3-4B
	language:
	- en
	- zh
	license: cc-by-nc-nd-4.0
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- privacy
	- privacy-detection
	- memory
	- personalized-memory
	- memory-system
	- memory-management
	- agent
	- agent-memory
	- information-security
	- information-extraction
	- edge-cloud
	inference: false
	---

	<h1 align="center">
	🛡️ MemPrivacy-4B-RL
	</h1>

	<p align="center">
	<div style="display: flex; justify-content: center; gap: 10px;">
	<a href="https://github.com/MemTensor/MemPrivacy">
	<img src="https://img.shields.io/badge/GitHub-Repository-blue?logo=github" alt="GitHub"/>
	</a>
	<a href="https://huggingface.co/IAAR-Shanghai/MemPrivacy-4B-RL">
	<img src="https://img.shields.io/badge/🤗%20Hugging%20Face-MemPrivacy--4B--RL-yellow" alt="Hugging Face"/>
	</a>
	<a href="https://arxiv.org/abs/2605.09530">
	<img src="https://img.shields.io/badge/Paper-arXiv-red?logo=arxiv" alt="Paper"/>
	</a>
	</div>
	</p>

	MemPrivacy-4B-RL is a lightweight, privacy-preserving model developed from the Qwen3-4B base model and further optimized through reinforcement learning. It was introduced in the paper [MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents](https://huggingface.co/papers/2605.09530).

	It is designed specifically for personalized memory management in edge-cloud agents, enabling more reliable, adaptive, and privacy-aware memory operations. This model functions as the core local extraction engine within the MemPrivacy framework. Instead of relying on aggressive masking that destroys task-relevant semantics, the model accurately identifies privacy-sensitive spans on edge devices, categorizes them according to a four-level privacy taxonomy, and replaces them with semantically structured, type-aware placeholders (e.g., `<Email_1>`) before transmitting data to the cloud.

	---

	## ✨ Key Features & Capabilities

	* High-Precision Privacy Extraction: Achieves state-of-the-art performance in privacy information extraction, substantially surpassing strong general-purpose reasoning models like GPT-5.2 and Gemini-3.1-Pro.
	* Four-Level Privacy Taxonomy (PL1-PL4): Capable of identifying and classifying privacy-relevant content based on identifiability, expected harm, and operational exploitability.
	* Semantic Utility Preservation: By decoupling privacy protection from semantic destruction, the use of typed placeholders ensures that cloud agents retain the relational and semantic cues required for effective memory formation and retrieval.
	* Edge-Optimized Efficiency: Designed for resource-constrained local deployment, maintaining high accuracy while significantly reducing inference latency compared to massive general-purpose LLMs.

	---

	## 🚀 Usage Example

	The model accepts conversational text alongside basic user identifiers and extracts a structured list of privacy instances, detailing the original text, the specific privacy type, and its corresponding privacy level.

	Input:

	```text
	User Name: Zhang San
	Dialogue Text: Hello, my name is Zhang San, and my mobile number is 13800138000. I've been having insomnia recently, and the doctor diagnosed me with mild depression. Here is a photo of my prescription. Also, I just received a verification code 89757, please fill it in for me. By the way, I like spicy food and I speak quite directly.
	```

	Output:

	```json
	[
	{
	"original_text": "Zhang San",
	"privacy_type": "Real Name",
	"privacy_level": "PL2"
	},
	{
	"original_text": "13800138000",
	"privacy_type": "Phone Number",
	"privacy_level": "PL2"
	},
	{
	"original_text": "mild depression",
	"privacy_type": "Medical Health",
	"privacy_level": "PL3"
	},
	{
	"original_text": "89757",
	"privacy_type": "Verification Code",
	"privacy_level": "PL4"
	}
	]
	```

	### 📌 Structured Privacy Extraction with vLLM

	This example shows how to use vLLM to perform structured privacy information extraction from user-AI dialogues, constrained by a JSON schema.

	```py
	import json
	from vllm import LLM, SamplingParams
	from vllm.sampling_params import StructuredOutputsParams
	from transformers import AutoTokenizer

	privacy_schema = {
	"type": "array",
	"items": {
	"type": "object",
	"properties": {
	"original_text": {"type": "string"},
	"privacy_type": {"type": "string"},
	"privacy_level": {
	"type": "string",
	"enum": ["PL1", "PL2", "PL3", "PL4"]
	}
	},
	"required": ["original_text", "privacy_type", "privacy_level"],
	"additionalProperties": False
	}
	}

	model_name_or_path = "IAAR-Shanghai/MemPrivacy-4B-RL"
	tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
	sampling_params = SamplingParams(
	temperature=0.1,
	top_p=0.1,
	repetition_penalty=1.05,
	max_tokens=6144,
	structured_outputs=StructuredOutputsParams(json=privacy_schema)
	)
	model = LLM(model=model_name_or_path, dtype='float16', gpu_memory_utilization=0.9)

	# Example input processing
	name = 'Zhang San'
	current_input = {
	"role": "user",
	"content": "Hello, my name is Zhang San, and my mobile number is 13800138000..."
	}

	# For full implementation details, please refer to the GitHub repository.
	```

	---

	## 📚 Citation

	```bibtex
	@misc{chen2026memprivacyprivacypreservingpersonalizedmemory,
	title={MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents},
	author={Yining Chen and Jihao Zhao and Bo Tang and Haofen Wang and Yue Zhang and Fei Huang and Feiyu Xiong and Zhiyu Li},
	year={2026},
	eprint={2605.09530},
	archivePrefix={arXiv},
	primaryClass={cs.CR},
	url={https://arxiv.org/abs/2605.09530},
	}
	```

	## Disclaimers

	This project is intended for privacy research and evaluation. Do not use it to process real user secrets without proper security controls, threat modeling, and compliance review. Always follow local laws and organizational policies.