Task Overview

The rapid advancement of generative models has made it increasingly challenging to distinguish machine-generated code from human-written code, particularly across different programming languages, domains, and generation techniques.

SemEval-2026 Task 13 focuses on developing systems capable of detecting machine-generated code under diverse conditions. The evaluation emphasizes generalization to unseen programming languages, generator families, and application scenarios.

The task is divided into three subtasks.

Subtask A: Binary Machine-Generated Code Detection

Goal:
Given a code snippet, determine whether it is:

Fully human-written, or
Fully machine-generated

Training Languages: C++, Python, Java
Training Domain: Algorithmic (e.g., LeetCode-style problems)

Evaluation Settings:

Setting	Language	Domain
(i) Seen Languages & Seen Domains	C++, Python, Java	Algorithmic
(ii) Unseen Languages & Seen Domains	Go, PHP, C#, C, JS	Algorithmic
(iii) Seen Languages & Unseen Domains	C++, Python, Java	Research, Production
(iv) Unseen Languages & Domains	Go, PHP, C#, C, JS	Research, Production

Dataset Size:

Train: 500,000 samples (238,000 human-written, 262,000 machine-generated)
Validation: 100,000 samples

Data Format:
Each dataset includes the following fields:

code: The code snippet
label: Binary label (0 for human-written, 1 for machine-generated)
language: Programming language of the snippet

Label mappings are provided in task_A/label_to_id.json and task_A/id_to_label.json.

Evaluation Metric:
The primary metric for Subtask A is Macro F1-score, ensuring balanced performance across both classes.

Submission Format:
Participants must submit a .csv file containing:

id: Unique identifier for each code snippet
label: Predicted label (0 or 1)

A sample submission file is available in the task_A/ directory.

Baseline Models:
Baseline implementations are provided in the baselines/ directory, including starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.

Restrictions:

No external training data may be used; only the provided datasets are allowed.
Specialized AI-generated code detectors are not permitted. General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dzungpham/graphcodebert-code-classification

Base model

microsoft/unixcoder-base

Finetuned

(13)

this model