๐ Github | ๐ค Hugging Face | ๐ Cookbooks
๐ฅ๏ธ Demo
A powerful Arabic OCR model (proficient learner)
๐ Overview
This model is an advanced Arabic OCR system designed to combine deep linguistic understanding with high accuracy in visual text extraction.
The model was trained using a unique strategy focused on:
Reducing the model's active capacity during training Maintaining the stability of visual features Promoting genuine language understanding rather than rote memorization
๐ Key Features
๐น Model size: Approximately 2 GB ๐น Performance: Outperforms much larger models in most tasks ๐น Type: Robust learning model (requires fine-tuning for inference)
โ Deep understanding of Arabic language context โ Intelligent spelling correction โ High visual accuracy in text extraction โ Noise reduction โ Highly stable training behavior โ Strong generalization on non-visual data ๐งช Evaluation Results Metric Value Evaluation loss 0.1041 Training-evaluation gap 0% - 2.5% Excellent stability
๐ This indicates near-perfect training equilibrium with minimal overshoot.
๐ง Training Philosophy
- Reduce Training Capacity
The model was trained using only half its capacity in order to:
Preserve visual representations Prevent image deterioration Improve overall stability 2. From "Memorizing Shapes" to "Learning Rules"
Instead of:
Memorizing word shapes
The model now learns:
Grammar rules and image-text relationships
- Controlling Inference
The training included:
Reducing excessive inference Limiting the linking of complex ideas Reverting processed information to its original size before output
๐ฏ Objective:
Forcing the model to accurately copy text instead of paraphrasing it
- Multilevel Reasoning Capability
The model was given internal inference capabilities during:
Reading the page Analyzing the text Generating output
This leads to:
Better understanding of invisible data Stronger real-world performance โ๏ธ Inference Settings (Very Important)
โ ๏ธ This is a powerful learner โ Requires precise control during inference
๐ฏ Use Cases ๐ OCR for Arabic books ๐ฐ Text extraction from images ๐ Manuscript digitization ๐งพ Document processing ๐ Text enhancement after OCR โ ๏ธ Important Notes The model may attempt autocorrect if not properly constrained. To accurately copy text, use directives such as: Extract the text exactly as it is, without correction or paraphrasing.
๐ฆ Why is the model small?
Despite its small size (approximately 2 GB), its outstanding performance is due to:
Effective training methodology Minimized cognitive noise Focus on patterns Significant Highly Efficient Representation Learning ๐ Conclusion
This model achieves a rare balance between:
Visual Accuracy ๐๏ธ Language Comprehension ๐ง Training Stability โ๏ธ
๐ก It can be considered a sophisticated model for Arabic OCR, competing with larger systems.
| License | Model Size | Python |
|---|---|---|
| Apache-2.0 | 2.2GB | 3.12 |
โ ๏ธ Important Notes
In some cases, the model may attempt to correct the text if it is not properly configured. For exact copying: Use a clear prompt such as: "Extract the text as is, without modification"
โ Do not use high temperature settings โ will cause hallucinations. โ Use "Restricted" settings for optimal accuracy. โ Best suited for OCR tasks, not creative writing. Send feedback Press tab for actions
Recommended Settings It includes:
with torch.no_grad():
generated_ids = model.generate( **inputs, max_new_tokens=512, # Keep repeating the loop do_sample=True, temperature=0.4, top_p=0.9, repetition_penalty=1.1
๐ผ๏ธ Visualizations
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "test_image.png"
},
{
"type": "text",
"text": "Text Recognition:"
}
],
}
]
processor = AutoProcessor.from_pretrained(MODEL_PATH)
model = AutoModelForImageTextToText.from_pretrained(
pretrained_model_name_or_path=MODEL_PATH,
torch_dtype="auto",
device_map="auto",
)
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
inputs.pop("token_type_ids", None)
generated_ids = model.generate(**inputs, max_new_tokens=2018)
output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(output_text)
๐ ๏ธ How to use it web
import gradio as gr
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image
import re
# --- KONFIGURASI MODEL ---
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"
# Deteksi perangkat secara otomatis
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"๐ Mesin OCR dimulai: Device={device} | Dtype={dtype}")
# --- INISIALISASI MODEL (dengan pengecekan error) ---
try:
print("โณ Memuat processor...")
processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
print("โณ Memuat model (mungkin butuh waktu beberapa menit)...")
model = AutoModelForImageTextToText.from_pretrained(
MODEL_PATH,
dtype=dtype,
trust_remote_code=True,
low_cpu_mem_usage=True,
device_map="auto"
)
model.eval()
print("โ
Model siap digunakan!")
except Exception as e:
print(f"โ Gagal memuat model: {e}")
raise # Hentikan eksekusi jika model gagal dimuat
# --- DAFTAR GAMBAR CONTOH (pastikan file-file ini ada di folder yang sama dengan skrip) ---
EXAMPLE_IMAGES = [
]
# --- FUNGSI OCR ---
import re # ุชุฃูุฏ ู
ู ูุฌูุฏ ูุฐุง ูู ุฃุนูู ุงูู
ูู
def proses_intelijen(image):
if image is None:
return "โ ๏ธ Silakan unggah gambar terlebih dahulu."
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Text Recognition:"}
],
}
]
try:
# --- ู
ุนุงูุฌุฉ ุงูุตูุฑุฉ ูุชูููุฏ ุงููุต (ูู
ุง ูู ูู ููุฏู ุงูุฃุตูู) ---
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False
)
hasil = generated_ids[0][len(inputs["input_ids"][0]):]
teks_final = processor.decode(hasil, skip_special_tokens=True)
# ----------------------------------------------------------------
# --- ู
ูุทู ุงูุชูุธูู ุงูู
ุชูุฏู
(ุฅุฒุงูุฉ ุงูุชูุฑุงุฑ ู HTML ูุงูููุงุท) ---
# ----------------------------------------------------------------
# 1. ุญุฐู ูุณูู
HTML ุงููุจูุญุฉ (ู
ุซู <html>, <td>, etc.)
teks_final = re.sub(r'<[^>]+>', '', teks_final)
# 2. ุญุฐู ุงูุชูุฑุงุฑ ุงูู
ุชุชุงูู ููุฌู
ู (ู
ูู
ุฌุฏุงู ูู ุญุงูุชู)
# ูุฐุง ุงูุณุทุฑ ูุจุญุซ ุนู ุฃู ุฌู
ูุฉ ุฃู ู
ุฌู
ูุนุฉ ููู
ุงุช ุชุธูุฑ ู
ุฑุชูู ุฃู ุฃูุซุฑ ู
ุชุชุงููุชูู
# ููุณุชุจุฏููุง ุจู
ุธูุฑ ูุงุญุฏ ููุท.
# (.{10,}?) ูุนูู: ุงูุชูุท ูุตุงู ุทููู 10 ุฃุญุฑู ูุฃูุซุฑ (ูุชุฌูุจ ุชูุฑุงุฑ ุญุฑูู ูุตูุฑุฉ)
# (\s+\1)+ ูุนูู: ู
ุชุจูุนุงู ุจู
ุณุงูุงุช ูููุณ ุงููุต ุงูุณุงุจู ู
ูุฑุฑุงู
teks_final = re.sub(r'(\b.{10,}?)(\s+\1)+', r'\1', teks_final)
# ----------------------------------------------------------------
return teks_final
except Exception as e:
return f"๐จ Terjadi kesalahan: {str(e)}"
# --- ANTARMUKA GRADIO ---
css_custom = """
.container { max-width: 1200px; margin: auto; padding-top: 20px; }
h1 { text-align: center; color: #3b82f6; }
"""
with gr.Blocks(css=css_custom, title="Arabic GLM-OCR") as app:
with gr.Column(elem_classes="container"):
gr.Markdown("# Arabic GLM-OCR")
gr.Markdown("Arabic OCR powered by GLM-OCR.")
with gr.Row():
with gr.Column(scale=1):
input_img = gr.Image(type="pil", label="Upload Gambar", height=450)
scan_btn = gr.Button("๐ MULAI SCAN", variant="primary", size="lg")
with gr.Column(scale=1):
output_txt = gr.Textbox(label="Hasil Teks", lines=24)
# Tambahkan contoh gambar yang bisa diklik
gr.Examples(
examples=EXAMPLE_IMAGES,
inputs=input_img,
outputs=output_txt,
fn=proses_intelijen,
cache_examples=False, # Set ke True jika ingin mempercepat (butuh disk space)
label="Contoh Gambar (klik untuk memuat)"
)
# Hubungkan tombol dengan fungsi
scan_btn.click(fn=proses_intelijen, inputs=input_img, outputs=output_txt)
if __name__ == "__main__":
app.launch() demo.queue().launch(theme=gr.themes.Soft(), allowed_paths=["examples"])
- Downloads last month
- 958
Model tree for sherif1313/Arabic-GLM-OCR-v2
Base model
zai-org/GLM-OCR








