TREC_AP_88-90 – Information Retrieval Model

Model description

This model is trained on the TREC AP 88–90 newswire collection for ad‑hoc information retrieval and ranking.
Input: a text query and one or several candidate documents or passages.
Output: a relevance score or generated text used to rank the candidates.

Intended uses & limitations

Intended uses:
- Research on traditional and neural information retrieval.
- Benchmarking on the TREC AP 88–90 collection.
- Experiments for the SysCRED project on credibility and ranking.
Limitations:
- English‑only newswire domain; performance may degrade on other domains.
- Not designed for safety‑critical or high‑stakes decision making.
- The underlying corpus contains historical biases present in news media of that period.

How to use

Example with transformers and a text‑retrieval pipeline:

from transformers import AutoTokenizer, AutoModel
from sentence_transformers import SentenceTransformer, util  # if you use SBERT-style embeddings

model_id = "DomLoyer/TREC_AP_88-90"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)
# encode queries and documents then compute similarity / ranking as in your paper or codebase


# TREC_AP_88-90 Model

## Model Summary

This repository contains resources related to experiments on the **TREC AP 88–90** newswire collection.  
It is intended for research in information retrieval and evaluation of models trained or tested on the AP 1988–1990 subset of TREC.

A snapshot of this work is archived on Zenodo with the DOI: **10.5281/zenodo.17917839**.  
Please refer to the Zenodo record for a citable, versioned release of the code and experimental setup.

## Intended Use

- Evaluation of retrieval models on the AP 88–90 collection.  
- Reproducibility of experiments for IR research.  
- Analysis of ranking performance and credibility-related experiments (SysCRED context).

This repository is **not** a redistribution of the original Associated Press documents.  
Users must obtain the AP 88–90 collection from the official TREC/NIST source and comply with their license.

## Training Data

The experiments are based on the **TREC AP 88–90** newswire data.  
All copyrights for the underlying texts remain with the original rights holders (Associated Press / TREC).

## Files

This repository may contain:

- Configuration files, scripts, and notebooks used for the experiments.  
- Trained models or precomputed indexes derived from the AP 88–90 corpus (without redistributing the raw documents).

## Citation

If you use this repository or the associated Zenodo archive in academic work, please cite:

```bibtex
@dataset{loyer_trec_ap_88_90_zenodo,
  author       = {Dominique Loyer},
  title        = {TREC\_AP\_88-90 Resources},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17917839},
  url          = {https://doi.org/10.5281/zenodo.17917839}
}

Limitations

The original AP documents are not included here.
Usage is restricted to research and evaluation purposes.

TREC AP 88-90 Implementation Analysis

Executive Summary

Question: Is the TREC AP 88-90 repository implemented in the systemFactChecking repository?

Answer: ✅ YES - The TREC repository is successfully integrated and implemented in the systemFactChecking repository.

1. Repository Overview

1.1 TREC_AP_88-90 Repository

Owner: DominiqueLoyer
Full Name: DominiqueLoyer/TREC_AP_88-90
Purpose: Complete Python implementation of information retrieval models evaluated on TREC AP 88-90 collections
Key Features:
- BM25 ranking model with parameter tuning
- TF-IDF vector space model
- Query expansion techniques (RM3, pseudo-relevance feedback)
- Comprehensive evaluation metrics (MAP, NDCG, Precision@K)
- Integration with Pyserini for efficient indexing and retrieval
- Comparative analysis of stemming strategies
Dataset: TREC AP (Associated Press) 88-90 collection containing 165,000 documents

1.2 systemFactChecking Repository

Owner: DominiqueLoyer
Full Name: DominiqueLoyer/systemFactChecking
Purpose: Fact Checking System for Information Credibility Verification
Description: A neuro-symbolic AI system combining Symbolic AI (rule-based reasoning with OWL ontologies), Neural AI (transformer models), and IR Engine (BM25, TF-IDF, PageRank)
Current Version: v2.3.0 (February 2026)
DOI: 10.5281/zenodo.18436691

2. TREC Integration Evidence

2.1 Core TREC Modules in systemFactChecking

The following TREC-related modules are implemented in 02_Code/syscred/:

A. `trec_retriever.py` (14,958 bytes)

Purpose: Main TREC retrieval module for evidence gathering
Key Features:
- BM25, TF-IDF, QLD scoring models
- Pyserini/Lucene integration (optional)
- Evidence retrieval for fact-checking
- Pseudo-Relevance Feedback (PRF) query expansion
- In-memory fallback when Pyserini is unavailable
Citation: Based on TREC_AP88-90_5juin2025.py
Main Classes:
- Evidence: Dataclass representing retrieved evidence
- RetrievalResult: Complete result from evidence retrieval
- TRECRetriever: Main retriever class with fact-checking interface

Key Methods:

retrieve_evidence(claim, k, model, use_prf) -> RetrievalResult
batch_retrieve(claims, k, model) -> List[RetrievalResult]

B. `trec_dataset.py` (14,212 bytes)

Purpose: TREC AP88-90 dataset loader and topic management
Key Features:
- TREC topic parsing
- Query relevance judgments (qrels)
- Dataset management
Main Class: TRECDataset, TRECTopic

C. `ir_engine.py` (12,310 bytes)

Purpose: Information Retrieval engine with multiple ranking models
Key Features:
- BM25, TF-IDF, Query Likelihood Dirichlet
- Porter stemming
- Stop word removal
- Pseudo-Relevance Feedback
- In-memory and Pyserini-based search

D. `eval_metrics.py` (11,558 bytes)

Purpose: TREC evaluation metrics
Metrics Implemented:
- Mean Average Precision (MAP)
- Normalized Discounted Cumulative Gain (NDCG)
- Precision@K, Recall@K
- Mean Reciprocal Rank (MRR)
- F1 Score

2.2 TREC Test and Demo Files

A. `demo_trec.py`

Complete demonstration of TREC capabilities integrated into SysCRED
Shows evidence retrieval, metrics calculation, topic handling
Sample outputs with AP88-90 style document IDs

B. `test_trec_integration.py` (9,814 bytes)

Unit tests for TREC integration
Validates retriever, dataset loader, and metrics

C. `run_trec_benchmark.py` (12,828 bytes)

Benchmark script for TREC evaluation
Performance testing against TREC standards

2.3 Package-Level Integration

The TREC modules are formally integrated into the syscred package (__init__.py):

# TREC Integration (NEW - Feb 2026)
from syscred.trec_retriever import TRECRetriever, Evidence, RetrievalResult
from syscred.trec_dataset import TRECDataset, TRECTopic

__all__ = [
    # ... other exports
    'TRECRetriever',
    'TRECDataset',
    'TRECTopic',
    'Evidence',
    'RetrievalResult',
]

Version: Marked as v2.3.0 (February 2026) with TREC integration noted as "NEW"

3. Technical Implementation Details

3.1 Citation and Attribution

The TREC retriever explicitly references the original TREC_AP88-90 work:

"""
Based on: TREC_AP88-90_5juin2025.py
(c) Dominique S. Loyer - PhD Thesis Prototype
Citation Key: loyerEvaluationModelesRecherche2025
"""

3.2 Shared Components

Both repositories share:

BM25 Parameters: k1=0.9, b=0.4 (optimized on AP88-90)
Evaluation Metrics: MAP, NDCG, Precision@K, Recall@K
Dataset: TREC AP (Associated Press) 88-90 collection (165,000 documents)
Preprocessing: Porter stemming, stop word removal
Query Expansion: Pseudo-Relevance Feedback (PRF)

3.3 Integration Architecture

systemFactChecking (Fact-Checking System)
    └── syscred/ (Core Package)
        ├── verification_system.py (Main credibility pipeline)
        │   └── Uses TRECRetriever for evidence gathering
        ├── trec_retriever.py (Evidence retrieval)
        │   └── Based on TREC_AP88-90 methodology
        ├── trec_dataset.py (Dataset loader)
        ├── ir_engine.py (BM25, TF-IDF, QLD)
        └── eval_metrics.py (MAP, NDCG, P@K)

3.4 Evidence Retrieval Workflow

Input: Claim to verify (e.g., "Climate change is caused by human activities")
Processing:
- Preprocess claim (stemming, stop word removal)
- Search using BM25/TF-IDF/QLD
- Optionally apply PRF for query expansion
Output: RetrievalResult containing:
- List of Evidence objects (doc_id, text, score, rank)
- Search time, model used
- Expanded query (if PRF applied)

4. Use Cases and Applications

4.1 In systemFactChecking

The TREC retriever serves as the evidence gathering component for:

Credibility Verification: Finding supporting/refuting documents for claims
Fact-Checking Pipeline: First stage of neuro-symbolic verification
Source Validation: Retrieving relevant documents from trusted corpora

4.2 Example Usage

from syscred import TRECRetriever

# Initialize retriever
retriever = TRECRetriever(use_stemming=True, enable_prf=True)

# Retrieve evidence for a claim
result = retriever.retrieve_evidence(
    claim="Climate change is caused by human activities",
    k=10
)

# Process evidence
for evidence in result.evidences:
    print(f"[{evidence.score:.4f}] {evidence.text[:100]}...")

5. Performance Metrics

5.1 TREC_AP_88-90 Repository Benchmarks

From the README:

Baseline (BM25, long queries): MAP=0.2205
With Query Expansion (RM3): MAP=0.2948 (+34%)
Best Configuration: Long query terms + BM25 + RM3 expansion

5.2 Integration in systemFactChecking

The evaluation metrics from TREC are used to validate:

Evidence retrieval quality
Credibility scoring accuracy
System performance benchmarks

6. Cross-Repository File Mapping

TREC_AP_88-90	systemFactChecking	Purpose
`TREC_AP88-90_5juin2025.py`	`02_Code/syscred/trec_retriever.py`	Main retrieval logic
Evaluation metrics	`02_Code/syscred/eval_metrics.py`	MAP, NDCG, P@K, MRR
IR models	`02_Code/syscred/ir_engine.py`	BM25, TF-IDF, QLD
-	`02_Code/syscred/trec_dataset.py`	Dataset loader
-	`02_Code/demo_trec.py`	Demo script
-	`02_Code/syscred/test_trec_integration.py`	Integration tests

7. Code Search Results

GitHub code search found 54 occurrences of "TREC" in the systemFactChecking repository, including:

Module imports and exports
Function implementations
Test cases
Documentation strings
Configuration parameters
Demo scripts

This extensive integration demonstrates that TREC is not just referenced but is a core component of the fact-checking system.

8. Recent Updates (2026)

Version History

v2.3.0 (Feb 2026): TREC integration marked as "NEW"
v2.2 (Jan 29, 2026): GraphRAG, interactive graph visualization
v2.0 (Jan 2026): Complete rewrite with modular architecture
v1.0 (Apr 2025): Initial prototype

The TREC integration represents a significant enhancement in v2.3.0, bridging classic Information Retrieval evaluation with modern neuro-symbolic fact-checking.

9. Publications and Documentation

Related to TREC_AP_88-90

Evaluation of Information Retrieval Models and Query Expansion on the TREC AP 88-90 Collection
Evaluation de Modeles de Recherche d'Information
Evaluation de modeles de ponderations pour la recherche d'information sur TREC AP 88-90

Related to systemFactChecking

Modeling and Hybrid System for Verification of Sources Credibility
Ontology of a Verification System
SysCRED Documentation (21,031 bytes)

10. Conclusion

Summary of Findings

✅ CONFIRMED: The TREC AP 88-90 repository is fully implemented and integrated in the systemFactChecking repository.

Key Integration Points

Code Reuse: Core TREC retrieval logic adapted to trec_retriever.py
Methodology: BM25 parameters and evaluation metrics directly transferred
Architecture: TREC forms the IR backbone of the credibility verification system
Documentation: Explicit citations linking the two repositories
Testing: Comprehensive test suite validates TREC integration

Integration Quality

Depth: Deep integration as core component, not superficial reference
Completeness: All major TREC features (retrieval, metrics, PRF) included
Maintenance: Active development with v2.3.0 release in Feb 2026
Documentation: Well-documented with citations and examples

Recommendation

The TREC implementation in systemFactChecking is production-ready and represents a successful bridge between classic Information Retrieval (TREC) and modern AI-powered fact-checking systems.

Appendix: Repository Links

TREC_AP_88-90: https://github.com/DominiqueLoyer/TREC_AP_88-90
systemFactChecking: https://github.com/DominiqueLoyer/systemFactChecking
TREC DOI: 10.5281/zenodo.17917839
SysCRED DOI: 10.5281/zenodo.18436691

Analysis Date: February 3, 2026
Analyst: GitHub Copilot Agent
Analysis Type: Cross-Repository Implementation Verification

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Text Retrieval

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DomLoyer/TREC_AP_88-90

Base model

google-t5/t5-base

Finetuned

(730)

this model

Dataset used to train DomLoyer/TREC_AP_88-90

Collection including DomLoyer/TREC_AP_88-90

TREC_AP_88-90

Collection

1 item • Updated 11 days ago