Task Overview
The rapid advancement of generative models has made it increasingly challenging to distinguish machine-generated code from human-written code, particularly across different programming languages, domains, and generation techniques.
SemEval-2026 Task 13 focuses on developing systems capable of detecting machine-generated code under diverse conditions. The evaluation emphasizes generalization to unseen programming languages, generator families, and application scenarios.
The task is divided into three subtasks.
Subtask A: Binary Machine-Generated Code Detection
Goal:
Given a code snippet, determine whether it is:
- Fully human-written, or
- Fully machine-generated
Training Languages: C++, Python, Java
Training Domain: Algorithmic (e.g., LeetCode-style problems)
Evaluation Settings:
| Setting | Language | Domain |
|---|---|---|
| (i) Seen Languages & Seen Domains | C++, Python, Java | Algorithmic |
| (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS | Algorithmic |
| (iii) Seen Languages & Unseen Domains | C++, Python, Java | Research, Production |
| (iv) Unseen Languages & Domains | Go, PHP, C#, C, JS | Research, Production |
Dataset Size:
- Train: 500,000 samples (238,000 human-written, 262,000 machine-generated)
- Validation: 100,000 samples
Data Format:
Each dataset includes the following fields:
code: The code snippetlabel: Binary label (0 for human-written, 1 for machine-generated)language: Programming language of the snippet
Label mappings are provided in task_A/label_to_id.json and task_A/id_to_label.json.
Evaluation Metric:
The primary metric for Subtask A is Macro F1-score, ensuring balanced performance across both classes.
Submission Format:
Participants must submit a .csv file containing:
id: Unique identifier for each code snippetlabel: Predicted label (0 or 1)
A sample submission file is available in the task_A/ directory.
Baseline Models:
Baseline implementations are provided in the baselines/ directory, including starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.
Restrictions:
- No external training data may be used; only the provided datasets are allowed.
- Specialized AI-generated code detectors are not permitted. General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.
Model tree for dzungpham/graphcodebert-code-classification
Base model
microsoft/unixcoder-base