Extract Text and Knowledge from Images with Open Vision Language Models
Upload Your Images
Start with a folder of images containing text you want to extract. These could be handwritten recipes, documents, or any images with text content.
Upload them directly to AI Sheets:
The images appear in a spreadsheet format:
Apply AI Actions to Your Columns
Each column can be processed with AI actions. Click the overlay on any column to see available operations:
Image columns support text extraction, visual question answering, object detection, and custom actions. Text columns offer summarization, keyword extraction, and translation.
Extract Text Using OCR
AI Sheets includes a template for text extraction:
Here's an example handwritten recipe:
The default extraction captures all visible text:
MEMORANDUM:
From
To
1 Box Duncan Hines Yellow Cake Mix
1 Box instant lemon pudding
2/3 cups water
1/2 cup Mozola oil
4 eggs
Lemon flavoring to taste.
Put in mixing bowl and beat for 10 min.
and REMEMBER... for Quality PRINTING
CALL OR WRITE
Gatling & Pierce
PRINTERS
TELEPHONE 332-2579
22 YEARS OF SERVICE IN NORTHEASTERN CAROLINA
The default template extracts everything, including headers and footers. For cleaner results, use a custom prompt:
This produces focused recipe details:
- 1 box Duncan Hines Yellow Cake Mix
- 1 box instant lemon pudding
- 2/3 cups water
- 1/2 cup Mazola oil
- 4 eggs
- Lemon flavoring to taste
- Put in mixing bowl and beat for 10 minutes
Compare Vision Language Models for OCR Accuracy
The default model Qwen/Qwen2.5-VL-7B-Instruct handles most tasks well. For complex handwriting, try more powerful models like Qwen/Qwen3-VL-235B-A22B-Reasoning:
Comparison on difficult handwriting:
| Qwen/Qwen2.5-VL-7B-Instruct | Qwen/Qwen3-VL-235B-A22B-Reasoning |
|---|---|
| in large bowl combine meat, onion, bread crumbs 1/2 nutmeg & cheese - as you add sprinkle around. Then blend - Last sprinkle blend again Bake in large pan for 10-15 min. at 350. Let stand 5 min before serving. | in lg bowl combine meat, onion, bread crumbs 1/4 nutmeg & cheese - as you add sprinkle around. then blend - last spinach blend again. Bake in lg pan for 50-60 min. @ 350 - let stand 5 min before serving |
The larger model catches critical details like "spinach" and corrects the cooking time from "10-15 min" to "50-60 min."
Process Extracted Text
After extraction, transform the text into structured formats:
This creates formatted HTML for each recipe:
Transform Images
Apply image-to-image models for visual transformations. Convert to black and white:
Result:
Export Your Dataset
Export the processed dataset to Hugging Face Hub:
The final dataset is available at aisheets/unlocked-recipes.
Resources
Try AI Sheets directly or deploy locally from the GitHub repository. For questions, use the Community tab or open a GitHub issue.












