Evaluation Results
Evaluation was conducted on the DocRED dev set.
Dev Set Performance
- Best Epoch: 29 / 30
- Training Loss: 0.0023
Main Metrics
- Micro F1: 59.25%
- Precision: 63.11%
- Recall: 55.83%
Interpretation
This V2 checkpoint achieves a Micro F1 of 59.25% on the DocRED dev set, with a Precision of 63.11% and Recall of 55.83%.
Compared to V1 (Micro F1 60.71%), V2 shows slightly lower overall F1 but maintains a competitive precision-recall balance.
The relatively higher precision suggests that V2 makes more conservative predictions, reducing false positives at the cost of some recall.
V1 vs V2 Comparison
| Metric | V1 (best_model_f1_56_64) | V2 (best_model_V2) |
|---|---|---|
| Micro F1 | 60.71% | 59.25% |
| Precision | 65.34% | 63.11% |
| Recall | 56.70% | 55.83% |
Notes
- This model is designed for document-level relation extraction on the DocRED benchmark.
- V2 was trained as an ablation/comparison run against V1 to verify reproducibility and threshold sensitivity.
- Performance may vary depending on preprocessing details, threshold settings, and evaluation configuration.
License and Dataset Notice
Code / Model License
This project is built upon several open-source works:
- HuggingFace Transformers / BERT โ Apache License 2.0
- ATLOP โ MIT License
- GAIN โ MIT License
- DREEAM โ based on the original paper and implementation references
Dataset Notice
This model is trained and evaluated on the DocRED dataset.
DocRED is intended for research use. Users should separately review the dataset's original terms and conditions before any redistribution or commercial use.
Intended Use
This repository is intended for:
- academic research
- experimentation on document-level relation extraction
- knowledge graph construction pipelines
- benchmark comparison and ablation studies
It is not guaranteed for production use without additional validation.