Task Overview

The rapid advancement of generative models has made it increasingly challenging to distinguish machine-generated code from human-written code, particularly across different programming languages, domains, and generation techniques.

SemEval-2026 Task 13 focuses on developing systems capable of detecting machine-generated code under diverse conditions. The evaluation emphasizes generalization to unseen programming languages, generator families, and application scenarios.

The task is divided into three subtasks.


Subtask A: Binary Machine-Generated Code Detection

Goal:
Given a code snippet, determine whether it is:

  • Fully human-written, or
  • Fully machine-generated

Training Languages: C++, Python, Java
Training Domain: Algorithmic (e.g., LeetCode-style problems)

Evaluation Settings:

Setting Language Domain
(i) Seen Languages & Seen Domains C++, Python, Java Algorithmic
(ii) Unseen Languages & Seen Domains Go, PHP, C#, C, JS Algorithmic
(iii) Seen Languages & Unseen Domains C++, Python, Java Research, Production
(iv) Unseen Languages & Domains Go, PHP, C#, C, JS Research, Production

Dataset Size:

  • Train: 500,000 samples (238,000 human-written, 262,000 machine-generated)
  • Validation: 100,000 samples

Data Format:
Each dataset includes the following fields:

  • code: The code snippet
  • label: Binary label (0 for human-written, 1 for machine-generated)
  • language: Programming language of the snippet

Label mappings are provided in task_A/label_to_id.json and task_A/id_to_label.json.

Evaluation Metric:
The primary metric for Subtask A is Macro F1-score, ensuring balanced performance across both classes.

Submission Format:
Participants must submit a .csv file containing:

  • id: Unique identifier for each code snippet
  • label: Predicted label (0 or 1)

A sample submission file is available in the task_A/ directory.

Baseline Models:
Baseline implementations are provided in the baselines/ directory, including starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.

Restrictions:

  • No external training data may be used; only the provided datasets are allowed.
  • Specialized AI-generated code detectors are not permitted. General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dzungpham/graphcodebert-code-classification

Finetuned
(13)
this model