P2P: DeepSeek-R1-8B Fine-Tuned for MBTI Personality Prediction

A QLoRA fine-tuned adapter for predicting Myers-Briggs personality types (MBTI) from social media posts, based on the P2P framework.

Author: Omar Gamal ElKady | ITI - AI Track, Intake 46

Paper

Ma, T., Feng, K., Rong, Y., & Zhao, K. (2025). From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media. In Proceedings of CIKM '25. ACM.

Model Details

Developed by: Omar Gamal ElKady
Affiliation: Information Technology Institute (ITI), AI Track, Intake 46
Model type: LoRA adapter for causal language model
Language: English
Base model: DeepSeek-R1-Distill-Llama-8B (4-bit quantized)
Fine-tuning method: QLoRA via Unsloth
License: MIT

Training Configuration

Parameter	Value
LoRA Rank	16
LoRA Alpha	32
Target Modules	q_proj, v_proj
Training Data	PersonalityCafe forum (8,675 users, 50 posts each)
Training Samples	9,418 (after SMOTE oversampling for class balance)
Optimizer	AdamW (8-bit)
Learning Rate	1e-4
Batch Size	8 (effective)
Epochs	3 (with early stopping)
Hardware	NVIDIA H200 (Lightning AI)

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model (4-bit)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit",
    device_map="auto",
    torch_dtype="float16",
)

# Attach fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "OmarGamal488/P2P-DeepSeek-R1-8B-MBTI-LoRA")
tokenizer = AutoTokenizer.from_pretrained("OmarGamal488/P2P-DeepSeek-R1-8B-MBTI-LoRA")

# Predict MBTI
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Predict the 4-letter Myers-Briggs personality type of the user from their social-media posts. Return only four uppercase letters.

### Input:
love debate theory logic prefer alone question everything analyze pattern fascinated philosophy

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# Output: INTP

Pipeline Overview

This model is the local LLM component of the P2P framework:

User Posts --> [This Model] --> Personality Features + Hidden States
                                           |
                                           v
                                    FAISS (k=5 RAG)
                                           |
                                           v
                               [DeepSeek-V3 API] --> 4-letter MBTI Type

The model serves two roles in the pipeline:

Feature Extraction — Reads user posts and generates a structured personality assessment covering E/I, S/N, T/F, J/P dimensions
Hidden State Encoding — Produces 4096-dim internal representations used for FAISS similarity search

Training Data

PersonalityCafe forum dataset from Kaggle:

8,675 users with self-reported MBTI labels
50 most recent posts per user
Preprocessing: lowercase, URL removal, stopword removal, lemmatization
Split: 60% train / 20% validation / 20% test (stratified by type)
SMOTE-style oversampling applied to minority classes (minimum 500 per type)

Class Distribution

Type	Count	Type	Count
INFP	1,832	ISFP	271
INFJ	1,470	ENTJ	231
INTP	1,304	ISTJ	205
INTJ	1,091	ENFJ	190
ENTP	685	ISFJ	166
ENFP	675	ESTP	89
ISTP	337	ESFP	48
		ESFJ	42
		ESTJ	39

Limitations

Trained on self-reported MBTI labels which may contain ~30% noise (per MBTIBench, Li et al. 2025)
Severe class imbalance (47x ratio between INFP and ESTJ) addressed with oversampling but not fully resolved
Best used as part of the full P2P pipeline with RAG, not as a standalone classifier
MBTI as a personality framework has limited scientific validity compared to Big Five

Citation

@inproceedings{ma2025p2p,
  title={From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media},
  author={Ma, Tian and Feng, Kaiyu and Rong, Yu and Zhao, Kangfei},
  booktitle={Proceedings of the 34th ACM International Conference on Information and Knowledge Management},
  year={2025},
  publisher={ACM}
}

Framework Versions

PEFT 0.18.1
Transformers 4.57.6
Unsloth 2026.4.4
PyTorch 2.10.0

Downloads last month: 13

Model tree for OmarGamal48812/P2P-DeepSeek-R1-8B-MBTI-LoRA

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Quantized

unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit

Adapter

(31)

this model

OmarGamal48812
/

P2P-DeepSeek-R1-8B-MBTI-LoRA