| --- |
| language: code |
| tags: |
| - code |
| - translation |
| - codet5 |
| - vbnet |
| - csharp |
| - programming |
| - source-code |
| datasets: |
| - custom |
| license: mit |
| library_name: transformers |
| pipeline_tag: translation |
| model_type: codet5 |
| --- |
| # π CodeT5 VB.NET β C# Translator |
|
|
| This is a fine-tuned version of [Salesforce/CodeT5-base](https://huggingface.co/Salesforce/codet5-base) for translating VB.NET to C#. |
|
|
| --- |
|
|
| # π Evaluation Metrics |
|
|
| **BLEU Score:** 0.4506 |
| - 1-gram: 0.6698 |
| - 2-gram: 0.5402 |
| - 3-gram: 0.4656 |
| - 4-gram: 0.4132 |
| - Brevity penalty: 0.8773 |
| - Length ratio: 0.8843 |
|
|
| **ROUGE Scores:** |
| - ROUGE-1: 0.5836 |
| - ROUGE-2: 0.4586 |
| - ROUGE-L: 0.5378 |
| - ROUGE-Lsum: 0.5781 |
|
|
| --- |
|
|
| # π§ Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| |
| model = AutoModelForSeq2SeqLM.from_pretrained("{repo_id}") |
| tokenizer = AutoTokenizer.from_pretrained("{repo_id}") |
| |
| vb_code = "Dim x As Integer = 5" |
| inputs = tokenizer(f"translate VB.NET to C#: {vb_code}", return_tensors="pt") |
| outputs = model.generate(**inputs) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| # π Dataset Format |
|
|
| Training data was in JSONL with fields: |
| "vb_code": VB.NET input |
| "csharp_code": corresponding C# output |
|
|
| # π License |
|
|
| MIT |
|
|