Text Generation
Transformers
Safetensors
English
Chinese
qwen3
privacy
privacy-detection
memory
personalized-memory
memory-system
memory-management
agent
agent-memory
information-security
information-extraction
edge-cloud
conversational
text-generation-inference
Instructions to use IAAR-Shanghai/MemPrivacy-4B-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IAAR-Shanghai/MemPrivacy-4B-RL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="IAAR-Shanghai/MemPrivacy-4B-RL") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("IAAR-Shanghai/MemPrivacy-4B-RL") model = AutoModelForCausalLM.from_pretrained("IAAR-Shanghai/MemPrivacy-4B-RL") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use IAAR-Shanghai/MemPrivacy-4B-RL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "IAAR-Shanghai/MemPrivacy-4B-RL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IAAR-Shanghai/MemPrivacy-4B-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/IAAR-Shanghai/MemPrivacy-4B-RL
- SGLang
How to use IAAR-Shanghai/MemPrivacy-4B-RL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "IAAR-Shanghai/MemPrivacy-4B-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IAAR-Shanghai/MemPrivacy-4B-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "IAAR-Shanghai/MemPrivacy-4B-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IAAR-Shanghai/MemPrivacy-4B-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use IAAR-Shanghai/MemPrivacy-4B-RL with Docker Model Runner:
docker model run hf.co/IAAR-Shanghai/MemPrivacy-4B-RL
| base_model: | |
| - Qwen3-4B | |
| language: | |
| - en | |
| - zh | |
| license: cc-by-nc-nd-4.0 | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| tags: | |
| - privacy | |
| - privacy-detection | |
| - memory | |
| - personalized-memory | |
| - memory-system | |
| - memory-management | |
| - agent | |
| - agent-memory | |
| - information-security | |
| - information-extraction | |
| - edge-cloud | |
| inference: false | |
| <h1 align="center"> | |
| ๐ก๏ธ MemPrivacy-4B-RL | |
| </h1> | |
| <p align="center"> | |
| <div style="display: flex; justify-content: center; gap: 10px;"> | |
| <a href="https://github.com/MemTensor/MemPrivacy"> | |
| <img src="https://img.shields.io/badge/GitHub-Repository-blue?logo=github" alt="GitHub"/> | |
| </a> | |
| <a href="https://huggingface.co/IAAR-Shanghai/MemPrivacy-4B-RL"> | |
| <img src="https://img.shields.io/badge/๐ค%20Hugging%20Face-MemPrivacy--4B--RL-yellow" alt="Hugging Face"/> | |
| </a> | |
| <a href="https://arxiv.org/abs/2605.09530"> | |
| <img src="https://img.shields.io/badge/Paper-arXiv-red?logo=arxiv" alt="Paper"/> | |
| </a> | |
| </div> | |
| </p> | |
| MemPrivacy-4B-RL is a lightweight, privacy-preserving model developed from the Qwen3-4B base model and further optimized through reinforcement learning. It was introduced in the paper [MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents](https://huggingface.co/papers/2605.09530). | |
| It is designed specifically for personalized memory management in edge-cloud agents, enabling more reliable, adaptive, and privacy-aware memory operations. This model functions as the core local extraction engine within the **MemPrivacy framework**. Instead of relying on aggressive masking that destroys task-relevant semantics, the model accurately identifies privacy-sensitive spans on edge devices, categorizes them according to a four-level privacy taxonomy, and replaces them with semantically structured, type-aware placeholders (e.g., `<Email_1>`) before transmitting data to the cloud. | |
| --- | |
| ## โจ Key Features & Capabilities | |
| * **High-Precision Privacy Extraction**: Achieves state-of-the-art performance in privacy information extraction, substantially surpassing strong general-purpose reasoning models like GPT-5.2 and Gemini-3.1-Pro. | |
| * **Four-Level Privacy Taxonomy (PL1-PL4)**: Capable of identifying and classifying privacy-relevant content based on identifiability, expected harm, and operational exploitability. | |
| * **Semantic Utility Preservation**: By decoupling privacy protection from semantic destruction, the use of typed placeholders ensures that cloud agents retain the relational and semantic cues required for effective memory formation and retrieval. | |
| * **Edge-Optimized Efficiency**: Designed for resource-constrained local deployment, maintaining high accuracy while significantly reducing inference latency compared to massive general-purpose LLMs. | |
| --- | |
| ## ๐ Usage Example | |
| The model accepts conversational text alongside basic user identifiers and extracts a structured list of privacy instances, detailing the original text, the specific privacy type, and its corresponding privacy level. | |
| **Input:** | |
| ```text | |
| User Name: Zhang San | |
| Dialogue Text: Hello, my name is Zhang San, and my mobile number is 13800138000. I've been having insomnia recently, and the doctor diagnosed me with mild depression. Here is a photo of my prescription. Also, I just received a verification code 89757, please fill it in for me. By the way, I like spicy food and I speak quite directly. | |
| ``` | |
| **Output:** | |
| ```json | |
| [ | |
| { | |
| "original_text": "Zhang San", | |
| "privacy_type": "Real Name", | |
| "privacy_level": "PL2" | |
| }, | |
| { | |
| "original_text": "13800138000", | |
| "privacy_type": "Phone Number", | |
| "privacy_level": "PL2" | |
| }, | |
| { | |
| "original_text": "mild depression", | |
| "privacy_type": "Medical Health", | |
| "privacy_level": "PL3" | |
| }, | |
| { | |
| "original_text": "89757", | |
| "privacy_type": "Verification Code", | |
| "privacy_level": "PL4" | |
| } | |
| ] | |
| ``` | |
| ### ๐ Structured Privacy Extraction with vLLM | |
| This example shows how to use vLLM to perform structured privacy information extraction from user-AI dialogues, constrained by a JSON schema. | |
| ```py | |
| import json | |
| from vllm import LLM, SamplingParams | |
| from vllm.sampling_params import StructuredOutputsParams | |
| from transformers import AutoTokenizer | |
| privacy_schema = { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "original_text": {"type": "string"}, | |
| "privacy_type": {"type": "string"}, | |
| "privacy_level": { | |
| "type": "string", | |
| "enum": ["PL1", "PL2", "PL3", "PL4"] | |
| } | |
| }, | |
| "required": ["original_text", "privacy_type", "privacy_level"], | |
| "additionalProperties": False | |
| } | |
| } | |
| model_name_or_path = "IAAR-Shanghai/MemPrivacy-4B-RL" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) | |
| sampling_params = SamplingParams( | |
| temperature=0.1, | |
| top_p=0.1, | |
| repetition_penalty=1.05, | |
| max_tokens=6144, | |
| structured_outputs=StructuredOutputsParams(json=privacy_schema) | |
| ) | |
| model = LLM(model=model_name_or_path, dtype='float16', gpu_memory_utilization=0.9) | |
| # Example input processing | |
| name = 'Zhang San' | |
| current_input = { | |
| "role": "user", | |
| "content": "Hello, my name is Zhang San, and my mobile number is 13800138000..." | |
| } | |
| # For full implementation details, please refer to the GitHub repository. | |
| ``` | |
| --- | |
| ## ๐ Citation | |
| ```bibtex | |
| @misc{chen2026memprivacyprivacypreservingpersonalizedmemory, | |
| title={MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents}, | |
| author={Yining Chen and Jihao Zhao and Bo Tang and Haofen Wang and Yue Zhang and Fei Huang and Feiyu Xiong and Zhiyu Li}, | |
| year={2026}, | |
| eprint={2605.09530}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CR}, | |
| url={https://arxiv.org/abs/2605.09530}, | |
| } | |
| ``` | |
| ## Disclaimers | |
| This project is intended for **privacy research and evaluation**. Do **not** use it to process real user secrets without proper security controls, threat modeling, and compliance review. Always follow local laws and organizational policies. |