brainventures
/

deplot_kr

image-text-to-text

Model card Files Files and versions

deplot_kr / README.md

dltjwl

Modify : Code and Evaluation Result

c5c962e about 2 years ago

|

history blame contribute delete

2.48 kB

	---
	language:
	- ko
	pipeline_tag: image-to-text
	---

	# deplot_kr

	deplot_kr is a Image-to-Data(Text) model based on the google's pix2struct architecture.
	It was fine-tuned from [DePlot](https://huggingface.co/google/deplot), using korean chart image-text pairs.

	deplot_kr은 google의 pix2struct 구조를 기반으로 한 한국어 image-to-data(텍스트 형태의 데이터 테이블) 모델입니다.
	[DePlot](https://huggingface.co/google/deplot) 모델을 한국어 차트 이미지-텍스트 쌍 데이터세트(30만 개)를 이용하여 fine-tuning 했습니다.

	## How to use

	You can run a prediction by input an image.
	Model predict the data table of text form in the image.

	이미지를 모델에 입력하면 모델은 이미지로부터 표 형태의 데이터 테이블을 예측합니다.

	```python
	from transformers import Pix2StructForConditionalGeneration, AutoProcessor
	from PIL import Image

	processor = AutoProcessor.from_pretrained("brainventures/deplot_kr")
	model = Pix2StructForConditionalGeneration.from_pretrained("brainventures/deplot_kr")

	image_path = "IMAGE_PATH"
	image = Image.open(image_path)

	inputs = processor(images=image, return_tensors="pt")
	pred = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_length=1024)
	print(processor.batch_decode(deplot_generated_ids, skip_special_token=True)[0])

	```

	Model Input Image
	![model_input_image](./sample.jpg)

	Model Output - Prediction

	대상:
	제목: 2011-2021 보건복지 분야 일자리의 <unk>증
	유형: 단일형 일반 세로 <unk>대형
	\| 보건(천 명) \| 복지(천 명)
	1분위 \| 29.7 \| 178.4
	2분위 \| 70.8 \| 97.3
	3분위 \| 86.4 \| 61.3
	4분위 \| 28.2 \| 16.0
	5분위 \| 52.3 \| 0.9



	### Preprocessing

	According to [Liu et al.(2023)](https://arxiv.org/pdf/2212.10505.pdf)...

	- markdown format
	- \| : seperating cells (열 구분)
	- \n : seperating rows (행 구분)


	### Train

	The model was trained in a TPU environment.
	- num_warmup_steps : 1,000
	- num_training_steps : 40,000

	## Evaluation Results

	This model achieves the following results:

	\|metrics name \| % \|
	\|:---\|---:\|
	\| RNSS (Relative Number Set Similarity)\| 98.1615 \|
	\|RMS (Relative Mapping Similarity) Precision \| 83.1615 \|
	\|RMS Recall \| 26.3549 \|
	\| RMS F1 Score \| 31.5633 \|

	## Contact

	For questions and comments, please use the discussion tab or email gloria@brainventur.com