ODIN Claude Sonnet 4.6 commited on
Commit
2be1f47
·
1 Parent(s): 20862bb

Add HuggingFace Spaces deployment support

Browse files

- Root app.py: HF Spaces entrypoint, auto-downloads data from KoopaK/OdinDB
on first cold start via snapshot_download, marks completion with .hf_downloaded
- src/agents/app.py: expose module-level demo + _figures_dir for import
- README.md: add HF Spaces YAML frontmatter (sdk: gradio, app_file: app.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (3) hide show
  1. README.md +183 -171
  2. app.py +40 -0
  3. src/agents/app.py +10 -12
README.md CHANGED
@@ -1,171 +1,183 @@
1
- # ODIN — Operational Drilling Intelligence Network
2
-
3
- > Multi-agent AI system for subsurface and drilling engineering analysis
4
- > Built on the public Equinor Volve Field dataset · SPE GCS 2026 ML Challenge
5
-
6
- ---
7
-
8
- ## Overview
9
-
10
- ODIN is a CrewAI-powered multi-agent system that answers complex drilling engineering questions by reasoning over structured data (WITSML, EDM) and unstructured reports (Daily Drilling Reports). It combines real-time data retrieval, RAG over domain knowledge, and a Gradio chat interface with inline Plotly visualizations.
11
-
12
- **Key capabilities:**
13
- - Drill phase distribution & NPT breakdown analysis
14
- - ROP / WOB / RPM performance profiling
15
- - Cross-well KPI comparison
16
- - BHA configuration review and handover summaries
17
- - Stuck-pipe and wellbore stability root-cause analysis
18
- - Evidence-cited answers with confidence levels
19
-
20
- ---
21
-
22
- ## Architecture
23
-
24
- ```
25
- User Query
26
-
27
-
28
- Orchestrator (orchestrator.py)
29
- │ Classifies query lean or full crew
30
-
31
- ├── LEAN (chart / compare queries, ~40s)
32
- │ Analyst ──► Lead (Odin)
33
-
34
- └── FULL (deep analysis, ~80s)
35
- Lead ──► Analyst ──► Historian ──► Lead (Odin)
36
- ```
37
-
38
- **Agents:**
39
- | Agent | Role |
40
- |---|---|
41
- | **Odin (Lead)** | Synthesizes findings, grounds in Volve KB |
42
- | **Data Analyst** | Runs DDR / WITSML / EDM queries & Python charts |
43
- | **Historian** | Searches operational history, validates stats |
44
-
45
- **Tools available to agents:**
46
- - `DDR_Query` Daily Drilling Report search
47
- - `WITSML_Analyst` Realtime drilling log analysis
48
- - `EDM_Technical_Query` — Casing, BHA, formation data
49
- - `CrossWell_Comparison` — Multi-well KPI comparison
50
- - `VolveHistory_SearchTool` — RAG over Volve campaign history
51
- - `python_interpreter` Pandas + Plotly for custom charts
52
-
53
- ---
54
-
55
- ## Tech Stack
56
-
57
- | Layer | Technology |
58
- |---|---|
59
- | LLM | Google Gemini 2.5 Flash (via `google-generativeai`) |
60
- | Agent framework | CrewAI 1.10 |
61
- | RAG / Vector store | ChromaDB + `sentence-transformers` |
62
- | Data processing | Pandas, NumPy, PDFPlumber |
63
- | Visualisation | Plotly (HTML) + Kaleido (PNG) |
64
- | UI | Gradio 6 |
65
-
66
- ---
67
-
68
- ## Data
69
-
70
- This project uses the **Equinor Volve Field open dataset** (released under the Volve Data Sharing Agreement).
71
-
72
- > Download from: [https://www.equinor.com/energy/volve-data-sharing](https://www.equinor.com/energy/volve-data-sharing)
73
-
74
- After downloading, extract to `data/raw/` and run the ETL pipeline:
75
-
76
- ```bash
77
- python src/data_pipeline/run_pipeline.py
78
- ```
79
-
80
- Then build the knowledge base:
81
-
82
- ```bash
83
- python src/rag/build_volve_db.py
84
- python src/rag/build_openviking_db.py
85
- ```
86
-
87
- ---
88
-
89
- ## Quickstart (judges)
90
-
91
- ```bash
92
- # 1. Clone & install
93
- git clone <repo-url>
94
- cd odin
95
- python -m venv venv
96
- source venv/bin/activate # Windows: venv\Scripts\activate
97
- pip install -r requirements.txt
98
-
99
- # 2. Download runtime data (~400 MB knowledge bases + processed CSVs)
100
- python scripts/download_data.py
101
-
102
- # 3. Add your Gemini API key
103
- cp .env.example .env
104
- # Edit .env: set GOOGLE_API_KEY=<your key>
105
- # Free key at: https://aistudio.google.com/app/apikey
106
-
107
- # 4. Run
108
- python src/agents/app.py
109
- ```
110
-
111
- Open `http://localhost:7860` in your browser.
112
-
113
- ---
114
-
115
- ## Project Structure
116
-
117
- ```
118
- odin/
119
- ├── src/
120
- │ ├── agents/ # Main application
121
- │ │ ├── app.py # Gradio UI (entry point)
122
- │ │ ├── orchestrator.py # Query routing & streaming
123
- │ │ ├── crew.py # CrewAI agent definitions & tasks
124
- │ │ ├── tools.py # DDR / WITSML / EDM / RAG tools
125
- │ │ └── data_tools.py # Python interpreter tool + data helpers
126
- │ │
127
- │ ├── data_pipeline/ # ETL: raw Volve data → processed CSV
128
- │ │ ├── run_pipeline.py
129
- │ │ ├── parse_witsml_logs.py
130
- │ │ ├── parse_ddr_xml.py
131
- │ │ └── parse_edm.py
132
-
133
- ── rag/ # Knowledge base builders
134
- ├── build_volve_db.py
135
- ── build_openviking_db.py
136
-
137
- ── tests/
138
- └── prompts/ # Agent prompt test cases
139
-
140
- ├── data/ # ← NOT in git (download separately)
141
- │ ├── raw/ # Original Volve dataset
142
- │ ├── processed/ # ETL output (CSV / Parquet)
143
- │ └── knowledge_base/ # ChromaDB vector stores
144
-
145
- ── outputs/ # NOT in git (generated at runtime)
146
- ── figures/ # Plotly charts (HTML + PNG)
147
-
148
- ├── requirements.txt
149
- ├── .env.example
150
- └── promptfooconfig.yaml # Evaluation harness (PromptFoo)
151
- ```
152
-
153
- ---
154
-
155
- ## Rate Limits
156
-
157
- The system is tuned for the Gemini free tier (15 RPM):
158
-
159
- | Crew mode | LLM calls | Target time |
160
- |---|---|---|
161
- | Lean (chart / compare) | ~6 calls | ~40s |
162
- | Full (deep analysis) | ~10 calls | ~80s |
163
-
164
- Automatic 429 retry with exponential back-off (10 → 20 → 40 → 60s) is built in.
165
-
166
- ---
167
-
168
- ## License
169
-
170
- Source code: MIT
171
- Volve dataset: [Volve Data Sharing Agreement](https://www.equinor.com/energy/volve-data-sharing) (not included in this repo)
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: ODIN — Operational Drilling Intelligence Network
3
+ emoji: 🛢️
4
+ colorFrom: slate
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 6.9.0
8
+ app_file: app.py
9
+ pinned: true
10
+ license: mit
11
+ ---
12
+
13
+ # ODIN Operational Drilling Intelligence Network
14
+
15
+ > Multi-agent AI system for subsurface and drilling engineering analysis
16
+ > Built on the public Equinor Volve Field dataset · SPE GCS 2026 ML Challenge
17
+
18
+ ---
19
+
20
+ ## Overview
21
+
22
+ ODIN is a CrewAI-powered multi-agent system that answers complex drilling engineering questions by reasoning over structured data (WITSML, EDM) and unstructured reports (Daily Drilling Reports). It combines real-time data retrieval, RAG over domain knowledge, and a Gradio chat interface with inline Plotly visualizations.
23
+
24
+ **Key capabilities:**
25
+ - Drill phase distribution & NPT breakdown analysis
26
+ - ROP / WOB / RPM performance profiling
27
+ - Cross-well KPI comparison
28
+ - BHA configuration review and handover summaries
29
+ - Stuck-pipe and wellbore stability root-cause analysis
30
+ - Evidence-cited answers with confidence levels
31
+
32
+ ---
33
+
34
+ ## Architecture
35
+
36
+ ```
37
+ User Query
38
+
39
+
40
+ Orchestrator (orchestrator.py)
41
+ │ Classifies query lean or full crew
42
+
43
+ ├── LEAN (chart / compare queries, ~40s)
44
+ │ Analyst ──► Lead (Odin)
45
+
46
+ └── FULL (deep analysis, ~80s)
47
+ Lead ──► Analyst ──► Historian ──► Lead (Odin)
48
+ ```
49
+
50
+ **Agents:**
51
+ | Agent | Role |
52
+ |---|---|
53
+ | **Odin (Lead)** | Synthesizes findings, grounds in Volve KB |
54
+ | **Data Analyst** | Runs DDR / WITSML / EDM queries & Python charts |
55
+ | **Historian** | Searches operational history, validates stats |
56
+
57
+ **Tools available to agents:**
58
+ - `DDR_Query` — Daily Drilling Report search
59
+ - `WITSML_Analyst` Realtime drilling log analysis
60
+ - `EDM_Technical_Query` Casing, BHA, formation data
61
+ - `CrossWell_Comparison` Multi-well KPI comparison
62
+ - `VolveHistory_SearchTool` RAG over Volve campaign history
63
+ - `python_interpreter` Pandas + Plotly for custom charts
64
+
65
+ ---
66
+
67
+ ## Tech Stack
68
+
69
+ | Layer | Technology |
70
+ |---|---|
71
+ | LLM | Google Gemini 2.5 Flash (via `google-generativeai`) |
72
+ | Agent framework | CrewAI 1.10 |
73
+ | RAG / Vector store | ChromaDB + `sentence-transformers` |
74
+ | Data processing | Pandas, NumPy, PDFPlumber |
75
+ | Visualisation | Plotly (HTML) + Kaleido (PNG) |
76
+ | UI | Gradio 6 |
77
+
78
+ ---
79
+
80
+ ## Data
81
+
82
+ This project uses the **Equinor Volve Field open dataset** (released under the Volve Data Sharing Agreement).
83
+
84
+ > Download from: [https://www.equinor.com/energy/volve-data-sharing](https://www.equinor.com/energy/volve-data-sharing)
85
+
86
+ After downloading, extract to `data/raw/` and run the ETL pipeline:
87
+
88
+ ```bash
89
+ python src/data_pipeline/run_pipeline.py
90
+ ```
91
+
92
+ Then build the knowledge base:
93
+
94
+ ```bash
95
+ python src/rag/build_volve_db.py
96
+ python src/rag/build_openviking_db.py
97
+ ```
98
+
99
+ ---
100
+
101
+ ## Quickstart (judges)
102
+
103
+ ```bash
104
+ # 1. Clone & install
105
+ git clone <repo-url>
106
+ cd odin
107
+ python -m venv venv
108
+ source venv/bin/activate # Windows: venv\Scripts\activate
109
+ pip install -r requirements.txt
110
+
111
+ # 2. Download runtime data (~400 MB knowledge bases + processed CSVs)
112
+ python scripts/download_data.py
113
+
114
+ # 3. Add your Gemini API key
115
+ cp .env.example .env
116
+ # Edit .env: set GOOGLE_API_KEY=<your key>
117
+ # Free key at: https://aistudio.google.com/app/apikey
118
+
119
+ # 4. Run
120
+ python src/agents/app.py
121
+ ```
122
+
123
+ Open `http://localhost:7860` in your browser.
124
+
125
+ ---
126
+
127
+ ## Project Structure
128
+
129
+ ```
130
+ odin/
131
+ ── src/
132
+ ├── agents/ # Main application
133
+ │ ├── app.py # Gradio UI (entry point)
134
+ ├── orchestrator.py # Query routing & streaming
135
+ │ ├── crew.py # CrewAI agent definitions & tasks
136
+ │ ├── tools.py # DDR / WITSML / EDM / RAG tools
137
+ │ │ └── data_tools.py # Python interpreter tool + data helpers
138
+
139
+ ├── data_pipeline/ # ETL: raw Volve data → processed CSV
140
+ │ │ ├── run_pipeline.py
141
+ ├── parse_witsml_logs.py
142
+ ├── parse_ddr_xml.py
143
+ └── parse_edm.py
144
+
145
+ │ └── rag/ # Knowledge base builders
146
+ ── build_volve_db.py
147
+ └── build_openviking_db.py
148
+
149
+ ├── tests/
150
+ └── prompts/ # Agent prompt test cases
151
+
152
+ ├── data/ # ← NOT in git (download separately)
153
+ │ ├── raw/ # Original Volve dataset
154
+ │ ├── processed/ # ETL output (CSV / Parquet)
155
+ │ └── knowledge_base/ # ChromaDB vector stores
156
+
157
+ ├── outputs/ # NOT in git (generated at runtime)
158
+ │ └── figures/ # Plotly charts (HTML + PNG)
159
+
160
+ ├── requirements.txt
161
+ ├── .env.example
162
+ └── promptfooconfig.yaml # Evaluation harness (PromptFoo)
163
+ ```
164
+
165
+ ---
166
+
167
+ ## Rate Limits
168
+
169
+ The system is tuned for the Gemini free tier (15 RPM):
170
+
171
+ | Crew mode | LLM calls | Target time |
172
+ |---|---|---|
173
+ | Lean (chart / compare) | ~6 calls | ~40s |
174
+ | Full (deep analysis) | ~10 calls | ~80s |
175
+
176
+ Automatic 429 retry with exponential back-off (10 → 20 → 40 → 60s) is built in.
177
+
178
+ ---
179
+
180
+ ## License
181
+
182
+ Source code: MIT
183
+ Volve dataset: [Volve Data Sharing Agreement](https://www.equinor.com/energy/volve-data-sharing) (not included in this repo)
app.py ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HuggingFace Spaces entry point for ODIN.
3
+ Downloads runtime data from KoopaK/OdinDB on first cold start, then launches the app.
4
+ """
5
+ import os
6
+ import sys
7
+ from pathlib import Path
8
+
9
+ ROOT = Path(__file__).parent
10
+ sys.path.insert(0, str(ROOT))
11
+
12
+ # ── Download data from HF Hub if not already present ──────────────────────────
13
+ _data_dir = ROOT / "data"
14
+ _marker = _data_dir / "processed" / ".hf_downloaded"
15
+
16
+ if not _marker.exists():
17
+ print("First run — downloading runtime data from KoopaK/OdinDB …")
18
+ try:
19
+ from huggingface_hub import snapshot_download
20
+ snapshot_download(
21
+ repo_id = "KoopaK/OdinDB",
22
+ repo_type = "dataset",
23
+ local_dir = str(_data_dir),
24
+ ignore_patterns=["*.git*"],
25
+ )
26
+ _marker.parent.mkdir(parents=True, exist_ok=True)
27
+ _marker.touch()
28
+ print("Data download complete.")
29
+ except Exception as e:
30
+ print(f"Warning: data download failed — {e}")
31
+ print("App will start but data tools may return empty results.")
32
+
33
+ # ── Launch ────────────────────────────────────────────────────────────────────
34
+ from src.agents.app import demo, _figures_dir
35
+
36
+ demo.launch(
37
+ server_name = "0.0.0.0",
38
+ server_port = 7860,
39
+ allowed_paths= [str(_figures_dir)],
40
+ )
src/agents/app.py CHANGED
@@ -1051,23 +1051,21 @@ def build_app():
1051
  # ENTRY POINT
1052
  # ─────────────────────────────────────────────────────────────────────────────
1053
 
1054
- if __name__ == "__main__":
1055
  base_proj_dir = Path(__file__).resolve().parents[2]
1056
  figures_dir = base_proj_dir / "outputs" / "figures"
1057
  figures_dir.mkdir(parents=True, exist_ok=True)
 
1058
 
1059
- theme = gr.themes.Soft(
1060
- primary_hue="emerald",
1061
- secondary_hue="slate",
1062
- neutral_hue="slate",
1063
- font=gr.themes.GoogleFont("Inter"),
1064
- )
1065
- app = build_app()
1066
- app.launch(
1067
  server_name="0.0.0.0",
1068
  server_port=7860,
1069
  share=False,
1070
- allowed_paths=[str(figures_dir)],
1071
- theme=theme,
1072
- css=CUSTOM_CSS,
1073
  )
 
1051
  # ENTRY POINT
1052
  # ─────────────────────────────────────────────────────────────────────────────
1053
 
1054
+ def _make_demo():
1055
  base_proj_dir = Path(__file__).resolve().parents[2]
1056
  figures_dir = base_proj_dir / "outputs" / "figures"
1057
  figures_dir.mkdir(parents=True, exist_ok=True)
1058
+ return build_app(), figures_dir
1059
 
1060
+
1061
+ # Module-level demo for HF Spaces (imported by root app.py)
1062
+ demo, _figures_dir = _make_demo()
1063
+
1064
+
1065
+ if __name__ == "__main__":
1066
+ demo.launch(
 
1067
  server_name="0.0.0.0",
1068
  server_port=7860,
1069
  share=False,
1070
+ allowed_paths=[str(_figures_dir)],
 
 
1071
  )