Text Generation
Transformers
Safetensors
English
Chinese
qwen3
privacy
privacy-detection
memory
personalized-memory
memory-system
memory-management
agent
agent-memory
information-security
information-extraction
edge-cloud
conversational
text-generation-inference
Instructions to use IAAR-Shanghai/MemPrivacy-4B-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IAAR-Shanghai/MemPrivacy-4B-RL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="IAAR-Shanghai/MemPrivacy-4B-RL") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("IAAR-Shanghai/MemPrivacy-4B-RL") model = AutoModelForCausalLM.from_pretrained("IAAR-Shanghai/MemPrivacy-4B-RL") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use IAAR-Shanghai/MemPrivacy-4B-RL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "IAAR-Shanghai/MemPrivacy-4B-RL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IAAR-Shanghai/MemPrivacy-4B-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/IAAR-Shanghai/MemPrivacy-4B-RL
- SGLang
How to use IAAR-Shanghai/MemPrivacy-4B-RL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "IAAR-Shanghai/MemPrivacy-4B-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IAAR-Shanghai/MemPrivacy-4B-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "IAAR-Shanghai/MemPrivacy-4B-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IAAR-Shanghai/MemPrivacy-4B-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use IAAR-Shanghai/MemPrivacy-4B-RL with Docker Model Runner:
docker model run hf.co/IAAR-Shanghai/MemPrivacy-4B-RL
File size: 6,062 Bytes
144bbaf 15c152d 8042792 15c152d 4e5fd47 15c152d 4e5fd47 15c152d 144bbaf 15c152d 144bbaf efd9298 144bbaf 8042792 144bbaf 8042792 144bbaf 8042792 144bbaf 8042792 144bbaf 118b161 ba8411d 8042792 e2cc7d0 8042792 e2cc7d0 8042792 e2cc7d0 8042792 e2cc7d0 8042792 9d5399c 8042792 9d5399c 8042792 144bbaf 8042792 4e5fd47 af26c2f 4e5fd47 8042792 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | ---
base_model:
- Qwen3-4B
language:
- en
- zh
license: cc-by-nc-nd-4.0
pipeline_tag: text-generation
library_name: transformers
tags:
- privacy
- privacy-detection
- memory
- personalized-memory
- memory-system
- memory-management
- agent
- agent-memory
- information-security
- information-extraction
- edge-cloud
inference: false
---
<h1 align="center">
๐ก๏ธ MemPrivacy-4B-RL
</h1>
<p align="center">
<div style="display: flex; justify-content: center; gap: 10px;">
<a href="https://github.com/MemTensor/MemPrivacy">
<img src="https://img.shields.io/badge/GitHub-Repository-blue?logo=github" alt="GitHub"/>
</a>
<a href="https://huggingface.co/IAAR-Shanghai/MemPrivacy-4B-RL">
<img src="https://img.shields.io/badge/๐ค%20Hugging%20Face-MemPrivacy--4B--RL-yellow" alt="Hugging Face"/>
</a>
<a href="https://arxiv.org/abs/2605.09530">
<img src="https://img.shields.io/badge/Paper-arXiv-red?logo=arxiv" alt="Paper"/>
</a>
</div>
</p>
MemPrivacy-4B-RL is a lightweight, privacy-preserving model developed from the Qwen3-4B base model and further optimized through reinforcement learning. It was introduced in the paper [MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents](https://huggingface.co/papers/2605.09530).
It is designed specifically for personalized memory management in edge-cloud agents, enabling more reliable, adaptive, and privacy-aware memory operations. This model functions as the core local extraction engine within the **MemPrivacy framework**. Instead of relying on aggressive masking that destroys task-relevant semantics, the model accurately identifies privacy-sensitive spans on edge devices, categorizes them according to a four-level privacy taxonomy, and replaces them with semantically structured, type-aware placeholders (e.g., `<Email_1>`) before transmitting data to the cloud.
---
## โจ Key Features & Capabilities
* **High-Precision Privacy Extraction**: Achieves state-of-the-art performance in privacy information extraction, substantially surpassing strong general-purpose reasoning models like GPT-5.2 and Gemini-3.1-Pro.
* **Four-Level Privacy Taxonomy (PL1-PL4)**: Capable of identifying and classifying privacy-relevant content based on identifiability, expected harm, and operational exploitability.
* **Semantic Utility Preservation**: By decoupling privacy protection from semantic destruction, the use of typed placeholders ensures that cloud agents retain the relational and semantic cues required for effective memory formation and retrieval.
* **Edge-Optimized Efficiency**: Designed for resource-constrained local deployment, maintaining high accuracy while significantly reducing inference latency compared to massive general-purpose LLMs.
---
## ๐ Usage Example
The model accepts conversational text alongside basic user identifiers and extracts a structured list of privacy instances, detailing the original text, the specific privacy type, and its corresponding privacy level.
**Input:**
```text
User Name: Zhang San
Dialogue Text: Hello, my name is Zhang San, and my mobile number is 13800138000. I've been having insomnia recently, and the doctor diagnosed me with mild depression. Here is a photo of my prescription. Also, I just received a verification code 89757, please fill it in for me. By the way, I like spicy food and I speak quite directly.
```
**Output:**
```json
[
{
"original_text": "Zhang San",
"privacy_type": "Real Name",
"privacy_level": "PL2"
},
{
"original_text": "13800138000",
"privacy_type": "Phone Number",
"privacy_level": "PL2"
},
{
"original_text": "mild depression",
"privacy_type": "Medical Health",
"privacy_level": "PL3"
},
{
"original_text": "89757",
"privacy_type": "Verification Code",
"privacy_level": "PL4"
}
]
```
### ๐ Structured Privacy Extraction with vLLM
This example shows how to use vLLM to perform structured privacy information extraction from user-AI dialogues, constrained by a JSON schema.
```py
import json
from vllm import LLM, SamplingParams
from vllm.sampling_params import StructuredOutputsParams
from transformers import AutoTokenizer
privacy_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"original_text": {"type": "string"},
"privacy_type": {"type": "string"},
"privacy_level": {
"type": "string",
"enum": ["PL1", "PL2", "PL3", "PL4"]
}
},
"required": ["original_text", "privacy_type", "privacy_level"],
"additionalProperties": False
}
}
model_name_or_path = "IAAR-Shanghai/MemPrivacy-4B-RL"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
sampling_params = SamplingParams(
temperature=0.1,
top_p=0.1,
repetition_penalty=1.05,
max_tokens=6144,
structured_outputs=StructuredOutputsParams(json=privacy_schema)
)
model = LLM(model=model_name_or_path, dtype='float16', gpu_memory_utilization=0.9)
# Example input processing
name = 'Zhang San'
current_input = {
"role": "user",
"content": "Hello, my name is Zhang San, and my mobile number is 13800138000..."
}
# For full implementation details, please refer to the GitHub repository.
```
---
## ๐ Citation
```bibtex
@misc{chen2026memprivacyprivacypreservingpersonalizedmemory,
title={MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents},
author={Yining Chen and Jihao Zhao and Bo Tang and Haofen Wang and Yue Zhang and Fei Huang and Feiyu Xiong and Zhiyu Li},
year={2026},
eprint={2605.09530},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2605.09530},
}
```
## Disclaimers
This project is intended for **privacy research and evaluation**. Do **not** use it to process real user secrets without proper security controls, threat modeling, and compliance review. Always follow local laws and organizational policies. |