ty1413
/

NetZeroTarget_Classification

Text Classification

text-embeddings-inference

Model card Files Files and versions

NetZeroTarget_Classification / notes.txt

ty1413's picture

Update notes.txt

bd81b64 verified over 1 year ago

history blame contribute delete

730 Bytes

	Description:

	- trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker.
	- text was truncated to 128 tokens before tokenization.
	-



	Problems:
	- keeps outputting the same label regardless of input
	- The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc...
	- truncation might have resulted in loss of data
	- should try text generation task instead
	- too many labels makes model behave poorly.



	Moving Forward:

	- better text preprocessing, remove urls, etc...
	- change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.)
	-