OCR Vision Character Model

This model is a character-level language model trained on OCR-extracted text from historical JFK documents.

Overview

This model is based on nanoGPT by Andrej Karpathy and fine-tuned on top of GPT-2. The training data consists of text extracted from declassified JFK documents using Google Vision OCR.

Training Process

  1. Source Documents: PDF files were downloaded from the National Archives JFK document releases
  2. Text Extraction: Google Vision API was used to perform OCR on the PDF documents
  3. Model Training: The extracted text was used to fine-tune a GPT-2 model using the nanoGPT framework

Training Data

The model was trained on text extracted from the following JFK document releases from the National Archives:

All training documents are from the March 18, 2025 JFK document release from the National Archives.

Note: This is a work in progress. Future versions will be trained on all documents from the JFK release.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jpruiz114/jfk-release-2025-small-2025-10-15-v1

Quantized
(89)
this model