| --- |
| base_model: bert-base-multilingual-uncased |
| datasets: |
| - OFAI/omp |
| license: apache-2.0 |
| tags: |
| - embedding_space_map |
| - BaseLM:bert-base-multilingual-uncased |
| --- |
| |
| # ESM OFAI/omp |
|
|
| <!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| <!-- Provide a longer summary of what this model is. --> |
|
|
| ESM |
|
|
| - **Developed by:** David Schulte |
| - **Model type:** ESM |
| - **Base Model:** bert-base-multilingual-uncased |
| - **Intermediate Task:** OFAI/omp |
| - **ESM architecture:** linear |
| - **Language(s) (NLP):** [More Information Needed] |
| - **License:** Apache-2.0 license |
|
|
| ## Training Details |
|
|
| ### Intermediate Task |
| - **Task ID:** OFAI/omp |
| - **Subset [optional]:** posts_labeled |
| - **Text Column:** Body |
| - **Label Column:** Category |
| - **Dataset Split:** train |
| - **Sample size [optional]:** 10000 |
| - **Sample seed [optional]:** 42 |
| |
| ### Training Procedure [optional] |
| |
| <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
| |
| #### Language Model Training Hyperparameters [optional] |
| - **Epochs:** 3 |
| - **Batch size:** 32 |
| - **Learning rate:** 2e-05 |
| - **Weight Decay:** 0.01 |
| - **Optimizer**: AdamW |
| |
| ### ESM Training Hyperparameters [optional] |
| - **Epochs:** 10 |
| - **Batch size:** 32 |
| - **Learning rate:** 0.001 |
| - **Weight Decay:** 0.01 |
| - **Optimizer**: AdamW |
| |
| |
| ### Additional trainiung details [optional] |
| |
| |
| ## Model evaluation |
| |
| ### Evaluation of fine-tuned language model [optional] |
| |
| |
| ### Evaluation of ESM [optional] |
| MSE: |
| |
| ### Additional evaluation details [optional] |
| |
| |
| |
| ## What are Embedding Space Maps? |
| |
| <!-- This section describes the evaluation protocols and provides the results. --> |
| Embedding Space Maps (ESMs) are neural networks that approximate the effect of fine-tuning a language model on a task. They can be used to quickly transform embeddings from a base model to approximate how a fine-tuned model would embed the the input text. |
| ESMs can be used for intermediate task selection with the ESM-LogME workflow. |
| |
| ## How can I use Embedding Space Maps for Intermediate Task Selection? |
| [](https://pypi.org/project/hf-dataset-selector) |
| |
| We release **hf-dataset-selector**, a Python package for intermediate task selection using Embedding Space Maps. |
| |
| **hf-dataset-selector** fetches ESMs for a given language model and uses it to find the best dataset for applying intermediate training to the target task. ESMs are found by their tags on the Huggingface Hub. |
| |
| ```python |
| from hfselect import Dataset, compute_task_ranking |
| |
| # Load target dataset from the Hugging Face Hub |
| dataset = Dataset.from_hugging_face( |
| name="stanfordnlp/imdb", |
| split="train", |
| text_col="text", |
| label_col="label", |
| is_regression=False, |
| num_examples=1000, |
| seed=42 |
| ) |
| |
| # Fetch ESMs and rank tasks |
| task_ranking = compute_task_ranking( |
| dataset=dataset, |
| model_name="bert-base-multilingual-uncased" |
| ) |
|
|
| # Display top 5 recommendations |
| print(task_ranking[:5]) |
| ``` |
| |
| For more information on how to use ESMs please have a look at the [official Github repository](https://github.com/davidschulte/hf-dataset-selector). |
| |
| ## Citation |
| |
| |
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| If you are using this Embedding Space Maps, please cite our [paper](https://arxiv.org/abs/2410.15148). |
| |
| **BibTeX:** |
| |
| |
| ``` |
| @misc{schulte2024moreparameterefficientselectionintermediate, |
| title={Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning}, |
| author={David Schulte and Felix Hamborg and Alan Akbik}, |
| year={2024}, |
| eprint={2410.15148}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2410.15148}, |
| } |
| ``` |
| |
| |
| **APA:** |
| |
| ``` |
| Schulte, D., Hamborg, F., & Akbik, A. (2024). Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning. arXiv preprint arXiv:2410.15148. |
| ``` |
| |
| ## Additional Information |
| |
| |