๐Ÿ’œ Github   |   ๐Ÿค— Hugging Face   |   ๐Ÿ“š Cookbooks  
๐Ÿ–ฅ๏ธ Demo  

# ๐Ÿ† sherif1313/Arabic-GLM-OCR-v2

A powerful Arabic OCR model (proficient learner)

๐Ÿ“Œ Overview

This model is an advanced Arabic OCR system designed to combine deep linguistic understanding with high accuracy in visual text extraction.

The model was trained using a unique strategy focused on:

Reducing the model's active capacity during training Maintaining the stability of visual features Promoting genuine language understanding rather than rote memorization

๐Ÿš€ Key Features

๐Ÿ”น Model size: Approximately 2 GB ๐Ÿ”น Performance: Outperforms much larger models in most tasks ๐Ÿ”น Type: Robust learning model (requires fine-tuning for inference)

โœ… Deep understanding of Arabic language context โœ… Intelligent spelling correction โœ… High visual accuracy in text extraction โœ… Noise reduction โœ… Highly stable training behavior โœ… Strong generalization on non-visual data ๐Ÿงช Evaluation Results Metric Value Evaluation loss 0.1041 Training-evaluation gap 0% - 2.5% Excellent stability

๐Ÿ“Œ This indicates near-perfect training equilibrium with minimal overshoot.

๐Ÿง  Training Philosophy

  1. Reduce Training Capacity

The model was trained using only half its capacity in order to:

Preserve visual representations Prevent image deterioration Improve overall stability 2. From "Memorizing Shapes" to "Learning Rules"

Instead of:

Memorizing word shapes

The model now learns:

Grammar rules and image-text relationships

  1. Controlling Inference

The training included:

Reducing excessive inference Limiting the linking of complex ideas Reverting processed information to its original size before output

๐ŸŽฏ Objective:

Forcing the model to accurately copy text instead of paraphrasing it

  1. Multilevel Reasoning Capability

The model was given internal inference capabilities during:

Reading the page Analyzing the text Generating output

This leads to:

Better understanding of invisible data Stronger real-world performance โš™๏ธ Inference Settings (Very Important)

โš ๏ธ This is a powerful learner โ† Requires precise control during inference

๐ŸŽฏ Use Cases ๐Ÿ“„ OCR for Arabic books ๐Ÿ“ฐ Text extraction from images ๐Ÿ“š Manuscript digitization ๐Ÿงพ Document processing ๐Ÿ” Text enhancement after OCR โš ๏ธ Important Notes The model may attempt autocorrect if not properly constrained. To accurately copy text, use directives such as: Extract the text exactly as it is, without correction or paraphrasing.

๐Ÿ“ฆ Why is the model small?

Despite its small size (approximately 2 GB), its outstanding performance is due to:

Effective training methodology Minimized cognitive noise Focus on patterns Significant Highly Efficient Representation Learning ๐Ÿ Conclusion

This model achieves a rare balance between:

Visual Accuracy ๐Ÿ‘๏ธ Language Comprehension ๐Ÿง  Training Stability โš–๏ธ

๐Ÿ’ก It can be considered a sophisticated model for Arabic OCR, competing with larger systems.

License Model Size Python
Apache-2.0 2.2GB 3.12

โš ๏ธ Important Notes

In some cases, the model may attempt to correct the text if it is not properly configured. For exact copying: Use a clear prompt such as: "Extract the text as is, without modification"

โŒ Do not use high temperature settings โ†’ will cause hallucinations. โœ… Use "Restricted" settings for optimal accuracy. โœ… Best suited for OCR tasks, not creative writing. Send feedback Press tab for actions

Recommended Settings It includes:

with torch.no_grad():

generated_ids = model.generate( **inputs, max_new_tokens=512, # Keep repeating the loop do_sample=True, temperature=0.4, top_p=0.9, repetition_penalty=1.1

๐Ÿ–ผ๏ธ Visualizations

๐Ÿ› ๏ธ How to use it

git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "test_image.png"
            },
            {
                "type": "text",
                "text": "Text Recognition:"
            }
        ],
    }
]
processor = AutoProcessor.from_pretrained(MODEL_PATH)
model = AutoModelForImageTextToText.from_pretrained(
    pretrained_model_name_or_path=MODEL_PATH,
    torch_dtype="auto",
    device_map="auto",
)
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)
inputs.pop("token_type_ids", None)
generated_ids = model.generate(**inputs, max_new_tokens=2018)
output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(output_text)

๐Ÿ› ๏ธ How to use it web


import gradio as gr
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
from PIL import Image
import re  

# --- KONFIGURASI MODEL ---
MODEL_PATH = "sherif1313/Arabic-GLM-OCR-v2"

# Deteksi perangkat secara otomatis
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"๐Ÿš€ Mesin OCR dimulai: Device={device} | Dtype={dtype}")

# --- INISIALISASI MODEL (dengan pengecekan error) ---
try:
    print("โณ Memuat processor...")
    processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)

    print("โณ Memuat model (mungkin butuh waktu beberapa menit)...")
    model = AutoModelForImageTextToText.from_pretrained(
        MODEL_PATH,
        dtype=dtype,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        device_map="auto"
    )
    model.eval()
    print("โœ… Model siap digunakan!")
except Exception as e:
    print(f"โŒ Gagal memuat model: {e}")
    raise  # Hentikan eksekusi jika model gagal dimuat

# --- DAFTAR GAMBAR CONTOH (pastikan file-file ini ada di folder yang sama dengan skrip) ---
EXAMPLE_IMAGES = [
    
]

# --- FUNGSI OCR ---
import re  # ุชุฃูƒุฏ ู…ู† ูˆุฌูˆุฏ ู‡ุฐุง ููŠ ุฃุนู„ู‰ ุงู„ู…ู„ู

def proses_intelijen(image):
    if image is None:
        return "โš ๏ธ Silakan unggah gambar terlebih dahulu."

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": "Text Recognition:"}
            ],
        }
    ]

    try:
        # --- ู…ุนุงู„ุฌุฉ ุงู„ุตูˆุฑุฉ ูˆุชูˆู„ูŠุฏ ุงู„ู†ุต (ูƒู…ุง ู‡ูˆ ููŠ ูƒูˆุฏูƒ ุงู„ุฃุตู„ูŠ) ---
        inputs = processor.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=True,
            return_dict=True,
            return_tensors="pt"
        ).to(model.device)

        with torch.no_grad():
            generated_ids = model.generate(
                **inputs,
                max_new_tokens=512,
                do_sample=False
            )

        hasil = generated_ids[0][len(inputs["input_ids"][0]):]
        teks_final = processor.decode(hasil, skip_special_tokens=True)

        # ----------------------------------------------------------------
        # --- ู…ู†ุทู‚ ุงู„ุชู†ุธูŠู ุงู„ู…ุชู‚ุฏู… (ุฅุฒุงู„ุฉ ุงู„ุชูƒุฑุงุฑ ูˆ HTML ูˆุงู„ู†ู‚ุงุท) ---
        # ----------------------------------------------------------------

        # 1. ุญุฐู ูˆุณูˆู… HTML ุงู„ู‚ุจูŠุญุฉ (ู…ุซู„ <html>, <td>, etc.)
        teks_final = re.sub(r'<[^>]+>', '', teks_final)

        # 2. ุญุฐู ุงู„ุชูƒุฑุงุฑ ุงู„ู…ุชุชุงู„ูŠ ู„ู„ุฌู…ู„ (ู…ู‡ู… ุฌุฏุงู‹ ููŠ ุญุงู„ุชูƒ)
        # ู‡ุฐุง ุงู„ุณุทุฑ ูŠุจุญุซ ุนู† ุฃูŠ ุฌู…ู„ุฉ ุฃูˆ ู…ุฌู…ูˆุนุฉ ูƒู„ู…ุงุช ุชุธู‡ุฑ ู…ุฑุชูŠู† ุฃูˆ ุฃูƒุซุฑ ู…ุชุชุงู„ูŠุชูŠู†
        # ูˆูŠุณุชุจุฏู„ู‡ุง ุจู…ุธู‡ุฑ ูˆุงุญุฏ ูู‚ุท.
        # (.{10,}?) ูŠุนู†ูŠ: ุงู„ุชู‚ุท ู†ุตุงู‹ ุทูˆู„ู‡ 10 ุฃุญุฑู ูุฃูƒุซุฑ (ู„ุชุฌู†ุจ ุชูƒุฑุงุฑ ุญุฑูˆู ู‚ุตูŠุฑุฉ)
        # (\s+\1)+ ูŠุนู†ูŠ: ู…ุชุจูˆุนุงู‹ ุจู…ุณุงูุงุช ูˆู†ูุณ ุงู„ู†ุต ุงู„ุณุงุจู‚ ู…ูƒุฑุฑุงู‹
        teks_final = re.sub(r'(\b.{10,}?)(\s+\1)+', r'\1', teks_final)



        # ----------------------------------------------------------------

        return teks_final

    except Exception as e:
        return f"๐Ÿšจ Terjadi kesalahan: {str(e)}"

# --- ANTARMUKA GRADIO ---
css_custom = """
.container { max-width: 1200px; margin: auto; padding-top: 20px; }
h1 { text-align: center; color: #3b82f6; }
"""

with gr.Blocks(css=css_custom, title="Arabic GLM-OCR") as app:
    with gr.Column(elem_classes="container"):
        gr.Markdown("# Arabic GLM-OCR")
        gr.Markdown("Arabic OCR powered by GLM-OCR.")

        with gr.Row():
            with gr.Column(scale=1):
                input_img = gr.Image(type="pil", label="Upload Gambar", height=450)
                scan_btn = gr.Button("๐Ÿš€ MULAI SCAN", variant="primary", size="lg")

            with gr.Column(scale=1):
                output_txt = gr.Textbox(label="Hasil Teks", lines=24)

        # Tambahkan contoh gambar yang bisa diklik
        gr.Examples(
            examples=EXAMPLE_IMAGES,
            inputs=input_img,
            outputs=output_txt,
            fn=proses_intelijen,
            cache_examples=False,  # Set ke True jika ingin mempercepat (butuh disk space)
            label="Contoh Gambar (klik untuk memuat)"
        )

    # Hubungkan tombol dengan fungsi
    scan_btn.click(fn=proses_intelijen, inputs=input_img, outputs=output_txt)

if __name__ == "__main__":
    app.launch()    demo.queue().launch(theme=gr.themes.Soft(), allowed_paths=["examples"])
Downloads last month
958
Safetensors
Model size
1B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for sherif1313/Arabic-GLM-OCR-v2

Base model

zai-org/GLM-OCR
Finetuned
(22)
this model

Spaces using sherif1313/Arabic-GLM-OCR-v2 2