anugrah55 commited on
Commit
03cd10a
·
verified ·
1 Parent(s): e8f2f91

Update Colab badge to point at this Space

Browse files
Files changed (1) hide show
  1. train_opensleuth_grpo.ipynb +14 -14
train_opensleuth_grpo.ipynb CHANGED
@@ -2,12 +2,12 @@
2
  "cells": [
3
  {
4
  "cell_type": "markdown",
5
- "id": "7086c037",
6
  "metadata": {},
7
  "source": [
8
  "# OpenSleuth — GRPO training on a free-tier Colab T4\n",
9
  "\n",
10
- "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/anugrah55/opensleuth/blob/main/colab/train_opensleuth_grpo.ipynb)\n",
11
  "\n",
12
  "**OpenSleuth** is an *Algorithmic Detective* RL environment. An LLM agent reverse-engineers an unknown black-box Python function by **probing** it with inputs and then **submitting** a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant `if/else`.\n",
13
  "\n",
@@ -36,7 +36,7 @@
36
  {
37
  "cell_type": "code",
38
  "execution_count": null,
39
- "id": "9307eb3f",
40
  "metadata": {},
41
  "outputs": [],
42
  "source": [
@@ -57,7 +57,7 @@
57
  {
58
  "cell_type": "code",
59
  "execution_count": null,
60
- "id": "bb6ecbad",
61
  "metadata": {},
62
  "outputs": [],
63
  "source": [
@@ -70,7 +70,7 @@
70
  {
71
  "cell_type": "code",
72
  "execution_count": null,
73
- "id": "6c81d26f",
74
  "metadata": {},
75
  "outputs": [],
76
  "source": [
@@ -148,7 +148,7 @@
148
  {
149
  "cell_type": "code",
150
  "execution_count": null,
151
- "id": "fdd9c63b",
152
  "metadata": {},
153
  "outputs": [],
154
  "source": [
@@ -260,7 +260,7 @@
260
  {
261
  "cell_type": "code",
262
  "execution_count": null,
263
- "id": "c2e1c7e5",
264
  "metadata": {},
265
  "outputs": [],
266
  "source": [
@@ -537,7 +537,7 @@
537
  {
538
  "cell_type": "code",
539
  "execution_count": null,
540
- "id": "88230844",
541
  "metadata": {},
542
  "outputs": [],
543
  "source": [
@@ -574,7 +574,7 @@
574
  {
575
  "cell_type": "code",
576
  "execution_count": null,
577
- "id": "14ca2743",
578
  "metadata": {},
579
  "outputs": [],
580
  "source": [
@@ -617,7 +617,7 @@
617
  {
618
  "cell_type": "code",
619
  "execution_count": null,
620
- "id": "202de2fb",
621
  "metadata": {},
622
  "outputs": [],
623
  "source": [
@@ -687,7 +687,7 @@
687
  {
688
  "cell_type": "code",
689
  "execution_count": null,
690
- "id": "03875ee7",
691
  "metadata": {},
692
  "outputs": [],
693
  "source": [
@@ -702,7 +702,7 @@
702
  {
703
  "cell_type": "code",
704
  "execution_count": null,
705
- "id": "7bd608a9",
706
  "metadata": {},
707
  "outputs": [],
708
  "source": [
@@ -724,7 +724,7 @@
724
  {
725
  "cell_type": "code",
726
  "execution_count": null,
727
- "id": "a5ab224e",
728
  "metadata": {},
729
  "outputs": [],
730
  "source": [
@@ -775,7 +775,7 @@
775
  },
776
  {
777
  "cell_type": "markdown",
778
- "id": "728aaee9",
779
  "metadata": {},
780
  "source": [
781
  "## Next steps\n",
 
2
  "cells": [
3
  {
4
  "cell_type": "markdown",
5
+ "id": "52f6a469",
6
  "metadata": {},
7
  "source": [
8
  "# OpenSleuth — GRPO training on a free-tier Colab T4\n",
9
  "\n",
10
+ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/spaces/anugrah55/opensleuth-colab/blob/main/train_opensleuth_grpo.ipynb)\n",
11
  "\n",
12
  "**OpenSleuth** is an *Algorithmic Detective* RL environment. An LLM agent reverse-engineers an unknown black-box Python function by **probing** it with inputs and then **submitting** a Python replica. The environment scores submissions by domain-aware fuzz-testing against the hidden reference, with a complexity penalty so the agent can't just memorise its probes inside a giant `if/else`.\n",
13
  "\n",
 
36
  {
37
  "cell_type": "code",
38
  "execution_count": null,
39
+ "id": "765d1f38",
40
  "metadata": {},
41
  "outputs": [],
42
  "source": [
 
57
  {
58
  "cell_type": "code",
59
  "execution_count": null,
60
+ "id": "2bfa6d1e",
61
  "metadata": {},
62
  "outputs": [],
63
  "source": [
 
70
  {
71
  "cell_type": "code",
72
  "execution_count": null,
73
+ "id": "fb82de78",
74
  "metadata": {},
75
  "outputs": [],
76
  "source": [
 
148
  {
149
  "cell_type": "code",
150
  "execution_count": null,
151
+ "id": "73e199af",
152
  "metadata": {},
153
  "outputs": [],
154
  "source": [
 
260
  {
261
  "cell_type": "code",
262
  "execution_count": null,
263
+ "id": "953947fc",
264
  "metadata": {},
265
  "outputs": [],
266
  "source": [
 
537
  {
538
  "cell_type": "code",
539
  "execution_count": null,
540
+ "id": "3236b1da",
541
  "metadata": {},
542
  "outputs": [],
543
  "source": [
 
574
  {
575
  "cell_type": "code",
576
  "execution_count": null,
577
+ "id": "ccdba521",
578
  "metadata": {},
579
  "outputs": [],
580
  "source": [
 
617
  {
618
  "cell_type": "code",
619
  "execution_count": null,
620
+ "id": "fffee452",
621
  "metadata": {},
622
  "outputs": [],
623
  "source": [
 
687
  {
688
  "cell_type": "code",
689
  "execution_count": null,
690
+ "id": "871b7fc9",
691
  "metadata": {},
692
  "outputs": [],
693
  "source": [
 
702
  {
703
  "cell_type": "code",
704
  "execution_count": null,
705
+ "id": "11008e19",
706
  "metadata": {},
707
  "outputs": [],
708
  "source": [
 
724
  {
725
  "cell_type": "code",
726
  "execution_count": null,
727
+ "id": "3c3b99bb",
728
  "metadata": {},
729
  "outputs": [],
730
  "source": [
 
775
  },
776
  {
777
  "cell_type": "markdown",
778
+ "id": "cdbdfdd7",
779
  "metadata": {},
780
  "source": [
781
  "## Next steps\n",