P2P: DeepSeek-R1-8B Fine-Tuned for MBTI Personality Prediction

A QLoRA fine-tuned adapter for predicting Myers-Briggs personality types (MBTI) from social media posts, based on the P2P framework.

Author: Omar Gamal ElKady | ITI - AI Track, Intake 46

Paper

Ma, T., Feng, K., Rong, Y., & Zhao, K. (2025). From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media. In Proceedings of CIKM '25. ACM.

Model Details

  • Developed by: Omar Gamal ElKady
  • Affiliation: Information Technology Institute (ITI), AI Track, Intake 46
  • Model type: LoRA adapter for causal language model
  • Language: English
  • Base model: DeepSeek-R1-Distill-Llama-8B (4-bit quantized)
  • Fine-tuning method: QLoRA via Unsloth
  • License: MIT

Training Configuration

Parameter Value
LoRA Rank 16
LoRA Alpha 32
Target Modules q_proj, v_proj
Training Data PersonalityCafe forum (8,675 users, 50 posts each)
Training Samples 9,418 (after SMOTE oversampling for class balance)
Optimizer AdamW (8-bit)
Learning Rate 1e-4
Batch Size 8 (effective)
Epochs 3 (with early stopping)
Hardware NVIDIA H200 (Lightning AI)

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model (4-bit)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit",
    device_map="auto",
    torch_dtype="float16",
)

# Attach fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "OmarGamal488/P2P-DeepSeek-R1-8B-MBTI-LoRA")
tokenizer = AutoTokenizer.from_pretrained("OmarGamal488/P2P-DeepSeek-R1-8B-MBTI-LoRA")

# Predict MBTI
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Predict the 4-letter Myers-Briggs personality type of the user from their social-media posts. Return only four uppercase letters.

### Input:
love debate theory logic prefer alone question everything analyze pattern fascinated philosophy

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# Output: INTP

Pipeline Overview

This model is the local LLM component of the P2P framework:

User Posts --> [This Model] --> Personality Features + Hidden States
                                           |
                                           v
                                    FAISS (k=5 RAG)
                                           |
                                           v
                               [DeepSeek-V3 API] --> 4-letter MBTI Type

The model serves two roles in the pipeline:

  1. Feature Extraction — Reads user posts and generates a structured personality assessment covering E/I, S/N, T/F, J/P dimensions
  2. Hidden State Encoding — Produces 4096-dim internal representations used for FAISS similarity search

Training Data

PersonalityCafe forum dataset from Kaggle:

  • 8,675 users with self-reported MBTI labels
  • 50 most recent posts per user
  • Preprocessing: lowercase, URL removal, stopword removal, lemmatization
  • Split: 60% train / 20% validation / 20% test (stratified by type)
  • SMOTE-style oversampling applied to minority classes (minimum 500 per type)

Class Distribution

Type Count Type Count
INFP 1,832 ISFP 271
INFJ 1,470 ENTJ 231
INTP 1,304 ISTJ 205
INTJ 1,091 ENFJ 190
ENTP 685 ISFJ 166
ENFP 675 ESTP 89
ISTP 337 ESFP 48
ESFJ 42
ESTJ 39

Limitations

  • Trained on self-reported MBTI labels which may contain ~30% noise (per MBTIBench, Li et al. 2025)
  • Severe class imbalance (47x ratio between INFP and ESTJ) addressed with oversampling but not fully resolved
  • Best used as part of the full P2P pipeline with RAG, not as a standalone classifier
  • MBTI as a personality framework has limited scientific validity compared to Big Five

Citation

@inproceedings{ma2025p2p,
  title={From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media},
  author={Ma, Tian and Feng, Kaiyu and Rong, Yu and Zhao, Kangfei},
  booktitle={Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
  year={2025},
  publisher={ACM}
}

Framework Versions

  • PEFT 0.18.1
  • Transformers 4.57.6
  • Unsloth 2026.4.4
  • PyTorch 2.10.0
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OmarGamal48812/P2P-DeepSeek-R1-8B-MBTI-LoRA

Space using OmarGamal48812/P2P-DeepSeek-R1-8B-MBTI-LoRA 1