P2P: DeepSeek-R1-8B Fine-Tuned for MBTI Personality Prediction
A QLoRA fine-tuned adapter for predicting Myers-Briggs personality types (MBTI) from social media posts, based on the P2P framework.
Author: Omar Gamal ElKady | ITI - AI Track, Intake 46
Paper
Ma, T., Feng, K., Rong, Y., & Zhao, K. (2025). From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media. In Proceedings of CIKM '25. ACM.
Model Details
- Developed by: Omar Gamal ElKady
- Affiliation: Information Technology Institute (ITI), AI Track, Intake 46
- Model type: LoRA adapter for causal language model
- Language: English
- Base model: DeepSeek-R1-Distill-Llama-8B (4-bit quantized)
- Fine-tuning method: QLoRA via Unsloth
- License: MIT
Training Configuration
| Parameter | Value |
|---|---|
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, v_proj |
| Training Data | PersonalityCafe forum (8,675 users, 50 posts each) |
| Training Samples | 9,418 (after SMOTE oversampling for class balance) |
| Optimizer | AdamW (8-bit) |
| Learning Rate | 1e-4 |
| Batch Size | 8 (effective) |
| Epochs | 3 (with early stopping) |
| Hardware | NVIDIA H200 (Lightning AI) |
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model (4-bit)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit",
device_map="auto",
torch_dtype="float16",
)
# Attach fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "OmarGamal488/P2P-DeepSeek-R1-8B-MBTI-LoRA")
tokenizer = AutoTokenizer.from_pretrained("OmarGamal488/P2P-DeepSeek-R1-8B-MBTI-LoRA")
# Predict MBTI
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Predict the 4-letter Myers-Briggs personality type of the user from their social-media posts. Return only four uppercase letters.
### Input:
love debate theory logic prefer alone question everything analyze pattern fascinated philosophy
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# Output: INTP
Pipeline Overview
This model is the local LLM component of the P2P framework:
User Posts --> [This Model] --> Personality Features + Hidden States
|
v
FAISS (k=5 RAG)
|
v
[DeepSeek-V3 API] --> 4-letter MBTI Type
The model serves two roles in the pipeline:
- Feature Extraction — Reads user posts and generates a structured personality assessment covering E/I, S/N, T/F, J/P dimensions
- Hidden State Encoding — Produces 4096-dim internal representations used for FAISS similarity search
Training Data
PersonalityCafe forum dataset from Kaggle:
- 8,675 users with self-reported MBTI labels
- 50 most recent posts per user
- Preprocessing: lowercase, URL removal, stopword removal, lemmatization
- Split: 60% train / 20% validation / 20% test (stratified by type)
- SMOTE-style oversampling applied to minority classes (minimum 500 per type)
Class Distribution
| Type | Count | Type | Count |
|---|---|---|---|
| INFP | 1,832 | ISFP | 271 |
| INFJ | 1,470 | ENTJ | 231 |
| INTP | 1,304 | ISTJ | 205 |
| INTJ | 1,091 | ENFJ | 190 |
| ENTP | 685 | ISFJ | 166 |
| ENFP | 675 | ESTP | 89 |
| ISTP | 337 | ESFP | 48 |
| ESFJ | 42 | ||
| ESTJ | 39 |
Limitations
- Trained on self-reported MBTI labels which may contain ~30% noise (per MBTIBench, Li et al. 2025)
- Severe class imbalance (47x ratio between INFP and ESTJ) addressed with oversampling but not fully resolved
- Best used as part of the full P2P pipeline with RAG, not as a standalone classifier
- MBTI as a personality framework has limited scientific validity compared to Big Five
Citation
@inproceedings{ma2025p2p,
title={From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media},
author={Ma, Tian and Feng, Kaiyu and Rong, Yu and Zhao, Kangfei},
booktitle={Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
year={2025},
publisher={ACM}
}
Framework Versions
- PEFT 0.18.1
- Transformers 4.57.6
- Unsloth 2026.4.4
- PyTorch 2.10.0
- Downloads last month
- 13
Model tree for OmarGamal48812/P2P-DeepSeek-R1-8B-MBTI-LoRA
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B