TREC_AP_88-90 – Information Retrieval Model

Model description

  • This model is trained on the TREC AP 88–90 newswire collection for ad‑hoc information retrieval and ranking.
  • Input: a text query and one or several candidate documents or passages.
  • Output: a relevance score or generated text used to rank the candidates.

Intended uses & limitations

  • Intended uses:
    • Research on traditional and neural information retrieval.
    • Benchmarking on the TREC AP 88–90 collection.
    • Experiments for the SysCRED project on credibility and ranking.
  • Limitations:
    • English‑only newswire domain; performance may degrade on other domains.
    • Not designed for safety‑critical or high‑stakes decision making.
    • The underlying corpus contains historical biases present in news media of that period.

How to use

  • Example with transformers and a text‑retrieval pipeline:
from transformers import AutoTokenizer, AutoModel
from sentence_transformers import SentenceTransformer, util  # if you use SBERT-style embeddings

model_id = "DomLoyer/TREC_AP_88-90"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)
# encode queries and documents then compute similarity / ranking as in your paper or codebase


# TREC_AP_88-90 Model

## Model Summary

This repository contains resources related to experiments on the **TREC AP 88–90** newswire collection.  
It is intended for research in information retrieval and evaluation of models trained or tested on the AP 1988–1990 subset of TREC.

A snapshot of this work is archived on Zenodo with the DOI: **10.5281/zenodo.17917839**.  
Please refer to the Zenodo record for a citable, versioned release of the code and experimental setup.

## Intended Use

- Evaluation of retrieval models on the AP 88–90 collection.  
- Reproducibility of experiments for IR research.  
- Analysis of ranking performance and credibility-related experiments (SysCRED context).

This repository is **not** a redistribution of the original Associated Press documents.  
Users must obtain the AP 88–90 collection from the official TREC/NIST source and comply with their license.

## Training Data

The experiments are based on the **TREC AP 88–90** newswire data.  
All copyrights for the underlying texts remain with the original rights holders (Associated Press / TREC).

## Files

This repository may contain:

- Configuration files, scripts, and notebooks used for the experiments.  
- Trained models or precomputed indexes derived from the AP 88–90 corpus (without redistributing the raw documents).

## Citation

If you use this repository or the associated Zenodo archive in academic work, please cite:

```bibtex
@dataset{loyer_trec_ap_88_90_zenodo,
  author       = {Dominique Loyer},
  title        = {TREC\_AP\_88-90 Resources},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17917839},
  url          = {https://doi.org/10.5281/zenodo.17917839}
}

Limitations

  • The original AP documents are not included here.
  • Usage is restricted to research and evaluation purposes.

TREC AP 88-90 Implementation Analysis

Executive Summary

Question: Is the TREC AP 88-90 repository implemented in the systemFactChecking repository?

Answer: ✅ YES - The TREC repository is successfully integrated and implemented in the systemFactChecking repository.


1. Repository Overview

1.1 TREC_AP_88-90 Repository

  • Owner: DominiqueLoyer
  • Full Name: DominiqueLoyer/TREC_AP_88-90
  • Purpose: Complete Python implementation of information retrieval models evaluated on TREC AP 88-90 collections
  • Key Features:
    • BM25 ranking model with parameter tuning
    • TF-IDF vector space model
    • Query expansion techniques (RM3, pseudo-relevance feedback)
    • Comprehensive evaluation metrics (MAP, NDCG, Precision@K)
    • Integration with Pyserini for efficient indexing and retrieval
    • Comparative analysis of stemming strategies
  • Dataset: TREC AP (Associated Press) 88-90 collection containing 165,000 documents

1.2 systemFactChecking Repository

  • Owner: DominiqueLoyer
  • Full Name: DominiqueLoyer/systemFactChecking
  • Purpose: Fact Checking System for Information Credibility Verification
  • Description: A neuro-symbolic AI system combining Symbolic AI (rule-based reasoning with OWL ontologies), Neural AI (transformer models), and IR Engine (BM25, TF-IDF, PageRank)
  • Current Version: v2.3.0 (February 2026)
  • DOI: 10.5281/zenodo.18436691

2. TREC Integration Evidence

2.1 Core TREC Modules in systemFactChecking

The following TREC-related modules are implemented in 02_Code/syscred/:

A. trec_retriever.py (14,958 bytes)

  • Purpose: Main TREC retrieval module for evidence gathering
  • Key Features:
    • BM25, TF-IDF, QLD scoring models
    • Pyserini/Lucene integration (optional)
    • Evidence retrieval for fact-checking
    • Pseudo-Relevance Feedback (PRF) query expansion
    • In-memory fallback when Pyserini is unavailable
  • Citation: Based on TREC_AP88-90_5juin2025.py
  • Main Classes:
    • Evidence: Dataclass representing retrieved evidence
    • RetrievalResult: Complete result from evidence retrieval
    • TRECRetriever: Main retriever class with fact-checking interface
  • Key Methods:
    retrieve_evidence(claim, k, model, use_prf) -> RetrievalResult
    batch_retrieve(claims, k, model) -> List[RetrievalResult]
    

B. trec_dataset.py (14,212 bytes)

  • Purpose: TREC AP88-90 dataset loader and topic management
  • Key Features:
    • TREC topic parsing
    • Query relevance judgments (qrels)
    • Dataset management
  • Main Class: TRECDataset, TRECTopic

C. ir_engine.py (12,310 bytes)

  • Purpose: Information Retrieval engine with multiple ranking models
  • Key Features:
    • BM25, TF-IDF, Query Likelihood Dirichlet
    • Porter stemming
    • Stop word removal
    • Pseudo-Relevance Feedback
    • In-memory and Pyserini-based search

D. eval_metrics.py (11,558 bytes)

  • Purpose: TREC evaluation metrics
  • Metrics Implemented:
    • Mean Average Precision (MAP)
    • Normalized Discounted Cumulative Gain (NDCG)
    • Precision@K, Recall@K
    • Mean Reciprocal Rank (MRR)
    • F1 Score

2.2 TREC Test and Demo Files

A. demo_trec.py

  • Complete demonstration of TREC capabilities integrated into SysCRED
  • Shows evidence retrieval, metrics calculation, topic handling
  • Sample outputs with AP88-90 style document IDs

B. test_trec_integration.py (9,814 bytes)

  • Unit tests for TREC integration
  • Validates retriever, dataset loader, and metrics

C. run_trec_benchmark.py (12,828 bytes)

  • Benchmark script for TREC evaluation
  • Performance testing against TREC standards

2.3 Package-Level Integration

The TREC modules are formally integrated into the syscred package (__init__.py):

# TREC Integration (NEW - Feb 2026)
from syscred.trec_retriever import TRECRetriever, Evidence, RetrievalResult
from syscred.trec_dataset import TRECDataset, TRECTopic

__all__ = [
    # ... other exports
    'TRECRetriever',
    'TRECDataset',
    'TRECTopic',
    'Evidence',
    'RetrievalResult',
]

Version: Marked as v2.3.0 (February 2026) with TREC integration noted as "NEW"


3. Technical Implementation Details

3.1 Citation and Attribution

The TREC retriever explicitly references the original TREC_AP88-90 work:

"""
Based on: TREC_AP88-90_5juin2025.py
(c) Dominique S. Loyer - PhD Thesis Prototype
Citation Key: loyerEvaluationModelesRecherche2025
"""

3.2 Shared Components

Both repositories share:

  • BM25 Parameters: k1=0.9, b=0.4 (optimized on AP88-90)
  • Evaluation Metrics: MAP, NDCG, Precision@K, Recall@K
  • Dataset: TREC AP (Associated Press) 88-90 collection (165,000 documents)
  • Preprocessing: Porter stemming, stop word removal
  • Query Expansion: Pseudo-Relevance Feedback (PRF)

3.3 Integration Architecture

systemFactChecking (Fact-Checking System)
    └── syscred/ (Core Package)
        ├── verification_system.py (Main credibility pipeline)
        │   └── Uses TRECRetriever for evidence gathering
        ├── trec_retriever.py (Evidence retrieval)
        │   └── Based on TREC_AP88-90 methodology
        ├── trec_dataset.py (Dataset loader)
        ├── ir_engine.py (BM25, TF-IDF, QLD)
        └── eval_metrics.py (MAP, NDCG, P@K)

3.4 Evidence Retrieval Workflow

  1. Input: Claim to verify (e.g., "Climate change is caused by human activities")
  2. Processing:
    • Preprocess claim (stemming, stop word removal)
    • Search using BM25/TF-IDF/QLD
    • Optionally apply PRF for query expansion
  3. Output: RetrievalResult containing:
    • List of Evidence objects (doc_id, text, score, rank)
    • Search time, model used
    • Expanded query (if PRF applied)

4. Use Cases and Applications

4.1 In systemFactChecking

The TREC retriever serves as the evidence gathering component for:

  • Credibility Verification: Finding supporting/refuting documents for claims
  • Fact-Checking Pipeline: First stage of neuro-symbolic verification
  • Source Validation: Retrieving relevant documents from trusted corpora

4.2 Example Usage

from syscred import TRECRetriever

# Initialize retriever
retriever = TRECRetriever(use_stemming=True, enable_prf=True)

# Retrieve evidence for a claim
result = retriever.retrieve_evidence(
    claim="Climate change is caused by human activities",
    k=10
)

# Process evidence
for evidence in result.evidences:
    print(f"[{evidence.score:.4f}] {evidence.text[:100]}...")

5. Performance Metrics

5.1 TREC_AP_88-90 Repository Benchmarks

From the README:

  • Baseline (BM25, long queries): MAP=0.2205
  • With Query Expansion (RM3): MAP=0.2948 (+34%)
  • Best Configuration: Long query terms + BM25 + RM3 expansion

5.2 Integration in systemFactChecking

The evaluation metrics from TREC are used to validate:

  • Evidence retrieval quality
  • Credibility scoring accuracy
  • System performance benchmarks

6. Cross-Repository File Mapping

TREC_AP_88-90 systemFactChecking Purpose
TREC_AP88-90_5juin2025.py 02_Code/syscred/trec_retriever.py Main retrieval logic
Evaluation metrics 02_Code/syscred/eval_metrics.py MAP, NDCG, P@K, MRR
IR models 02_Code/syscred/ir_engine.py BM25, TF-IDF, QLD
- 02_Code/syscred/trec_dataset.py Dataset loader
- 02_Code/demo_trec.py Demo script
- 02_Code/syscred/test_trec_integration.py Integration tests

7. Code Search Results

GitHub code search found 54 occurrences of "TREC" in the systemFactChecking repository, including:

  1. Module imports and exports
  2. Function implementations
  3. Test cases
  4. Documentation strings
  5. Configuration parameters
  6. Demo scripts

This extensive integration demonstrates that TREC is not just referenced but is a core component of the fact-checking system.


8. Recent Updates (2026)

Version History

  • v2.3.0 (Feb 2026): TREC integration marked as "NEW"
  • v2.2 (Jan 29, 2026): GraphRAG, interactive graph visualization
  • v2.0 (Jan 2026): Complete rewrite with modular architecture
  • v1.0 (Apr 2025): Initial prototype

The TREC integration represents a significant enhancement in v2.3.0, bridging classic Information Retrieval evaluation with modern neuro-symbolic fact-checking.


9. Publications and Documentation

Related to TREC_AP_88-90

  • Evaluation of Information Retrieval Models and Query Expansion on the TREC AP 88-90 Collection
  • Evaluation de Modeles de Recherche d'Information
  • Evaluation de modeles de ponderations pour la recherche d'information sur TREC AP 88-90

Related to systemFactChecking

  • Modeling and Hybrid System for Verification of Sources Credibility
  • Ontology of a Verification System
  • SysCRED Documentation (21,031 bytes)

10. Conclusion

Summary of Findings

✅ CONFIRMED: The TREC AP 88-90 repository is fully implemented and integrated in the systemFactChecking repository.

Key Integration Points

  1. Code Reuse: Core TREC retrieval logic adapted to trec_retriever.py
  2. Methodology: BM25 parameters and evaluation metrics directly transferred
  3. Architecture: TREC forms the IR backbone of the credibility verification system
  4. Documentation: Explicit citations linking the two repositories
  5. Testing: Comprehensive test suite validates TREC integration

Integration Quality

  • Depth: Deep integration as core component, not superficial reference
  • Completeness: All major TREC features (retrieval, metrics, PRF) included
  • Maintenance: Active development with v2.3.0 release in Feb 2026
  • Documentation: Well-documented with citations and examples

Recommendation

The TREC implementation in systemFactChecking is production-ready and represents a successful bridge between classic Information Retrieval (TREC) and modern AI-powered fact-checking systems.


Appendix: Repository Links


Analysis Date: February 3, 2026
Analyst: GitHub Copilot Agent
Analysis Type: Cross-Repository Implementation Verification

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DomLoyer/TREC_AP_88-90

Finetuned
(730)
this model

Dataset used to train DomLoyer/TREC_AP_88-90

Collection including DomLoyer/TREC_AP_88-90