YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SOP Heading Enricher

A Next.js application that uses a local LLM (Llama 3 via Ollama) to automatically detect and format headings in SOP (Standard Operating Procedure) documents.

The Problem

Many SOP documents have headings styled identically to body text β€” same font size, no bold formatting. This makes it impossible for automated parsers (like python-docx) to reliably identify section boundaries for RAG chunking.

The Solution

This tool:

  1. Parses your .docx SOP file and extracts all paragraphs
  2. Sends the paragraphs to your locally running Llama 3 model via Ollama
  3. Identifies which paragraphs are headings vs body text
  4. Increases the font size of detected headings in a new .docx file
  5. Lets you review the AI's work in a split-screen view and manually correct any mistakes
  6. Downloads the enriched document ready for your parser

Prerequisites

  • Node.js 18+
  • Ollama installed and running locally (install guide)
  • Llama 3 model pulled in Ollama

Install Ollama and pull Llama 3:

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull Llama 3
ollama pull llama3

# Verify it's running
curl http://localhost:11434/api/tags

Quick Start

# Clone the repo
git clone https://huggingface.co/dwijverma2/sop-heading-enricher
cd sop-heading-enricher

# Install dependencies
npm install

# Start the dev server
npm run dev

Open http://localhost:3000 in your browser.

Usage

  1. Upload β€” Drop a .docx SOP file into the upload area
  2. Enrich β€” Click "Enrich with AI" to run heading detection
  3. Review β€” The split-screen shows original (left) vs enriched (right)
  4. Adjust β€” Toggle checkboxes to correct any misclassified paragraphs
  5. Apply β€” Click "Apply Changes" to regenerate the document with your corrections
  6. Download β€” Click "Download Enriched" to save the formatted file

Configuration

Heading Font Size

Use the dropdown in the control bar to set the heading font size (14pt β€” 28pt). Default is 18pt.

Model Selection

You can use any model available in your Ollama installation. Type the model name in the "Model" field. Default is llama3.

Custom Prompt

Click "Custom Prompt" to override the system prompt used for heading detection. Useful if your SOPs have domain-specific heading patterns.

Ollama URL

By default, the app connects to Ollama at http://localhost:11434. To change this, set the environment variable:

OLLAMA_URL=http://192.168.1.100:11434 npm run dev

Project Structure

sop-heading-enricher/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ parse-docx/route.js    # Upload & parse .docx
β”‚   β”‚   β”œβ”€β”€ enrich/route.js         # LLM heading detection
β”‚   β”‚   β”œβ”€β”€ apply-changes/route.js  # Manual heading corrections
β”‚   β”‚   └── download/route.js       # Download enriched .docx
β”‚   β”œβ”€β”€ globals.css                 # Tailwind + custom styles
β”‚   β”œβ”€β”€ layout.js                   # Root layout
β”‚   └── page.js                     # Main page component
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ UploadArea.js               # Drag & drop file upload
β”‚   β”œβ”€β”€ SplitView.js                # Side-by-side document comparison
β”‚   β”œβ”€β”€ ControlBar.js               # Enrichment controls & settings
β”‚   └── Toast.js                    # Notification toasts
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ docx-service.js             # DOCX parsing & modification (PizZip + XML)
β”‚   β”œβ”€β”€ llm-service.js              # Ollama/Llama3 API integration
β”‚   └── file-store.js               # In-memory session storage
β”œβ”€β”€ package.json
β”œβ”€β”€ next.config.js
β”œβ”€β”€ postcss.config.js
β”œβ”€β”€ jsconfig.json
└── README.md

How It Works (Technical)

DOCX Parsing

A .docx file is a ZIP archive containing XML. We use PizZip to unzip it and fast-xml-parser to parse/modify the document.xml inside. Font sizes in OOXML are stored in half-points (24pt = "48").

LLM Integration

All paragraphs are batched into a single Ollama API call with a structured prompt. We use Ollama's format: "json" option to ensure the model returns valid JSON with heading indices.

Formatting

Detected headings get their w:sz (font size) and w:b (bold) properties modified in the XML tree, then the ZIP is repacked into a new .docx.

Tech Stack

  • Next.js 15 (App Router, JavaScript)
  • Tailwind CSS 4
  • PizZip β€” ZIP manipulation for .docx files
  • fast-xml-parser β€” XML parsing/building for OOXML
  • Ollama β€” Local LLM inference server

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support