fix(deploy): seed data/raw + run bbb_pipeline before bbb_model in build
Browse filesBuild-time crash: data/raw/bbbp.csv is gitignored locally, so the
previous COPY data/raw/ instruction copied an empty directory into
the image; bbb_pipeline never produced bbbp_features.parquet, and
bbb_model.main() raised FileNotFoundError on the missing parquet.
Fix: drop the now-useless COPY data/raw/, instead RUN a single shell
chain that:
1. mkdir -p data/raw data/processed
2. cp tests/fixtures/bbbp_sample.csv data/raw/bbbp.csv (self-contained)
3. python -m src.pipelines.bbb_pipeline (produces parquet)
4. python -m src.models.bbb_model (trains, persists joblib)
The existing deploy smoke tests still pass — they only assert that
'src.models.bbb_model' appears in the Dockerfile, which it still does.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Dockerfile +7 -4
- Dockerfile.hf +7 -4
|
@@ -31,12 +31,15 @@ RUN pip install -r requirements.txt
|
|
| 31 |
# --- project source ---
|
| 32 |
COPY src/ ./src/
|
| 33 |
COPY tests/fixtures/ ./tests/fixtures/
|
| 34 |
-
COPY data/raw/ ./data/raw/
|
| 35 |
COPY supervisord.conf ./supervisord.conf
|
| 36 |
|
| 37 |
-
# ---
|
| 38 |
-
#
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
# --- HF Spaces convention ---
|
| 42 |
EXPOSE 7860
|
|
|
|
| 31 |
# --- project source ---
|
| 32 |
COPY src/ ./src/
|
| 33 |
COPY tests/fixtures/ ./tests/fixtures/
|
|
|
|
| 34 |
COPY supervisord.conf ./supervisord.conf
|
| 35 |
|
| 36 |
+
# --- seed raw data from fixtures, run pipeline, train model at image-build time ---
|
| 37 |
+
# data/raw/bbbp.csv is gitignored locally; we seed it from the test fixture so
|
| 38 |
+
# the deploy is self-contained. First call to /predict/bbb is then instant.
|
| 39 |
+
RUN mkdir -p data/raw data/processed && \
|
| 40 |
+
cp tests/fixtures/bbbp_sample.csv data/raw/bbbp.csv && \
|
| 41 |
+
python -m src.pipelines.bbb_pipeline && \
|
| 42 |
+
python -m src.models.bbb_model
|
| 43 |
|
| 44 |
# --- HF Spaces convention ---
|
| 45 |
EXPOSE 7860
|
|
@@ -31,12 +31,15 @@ RUN pip install -r requirements.txt
|
|
| 31 |
# --- project source ---
|
| 32 |
COPY src/ ./src/
|
| 33 |
COPY tests/fixtures/ ./tests/fixtures/
|
| 34 |
-
COPY data/raw/ ./data/raw/
|
| 35 |
COPY supervisord.conf ./supervisord.conf
|
| 36 |
|
| 37 |
-
# ---
|
| 38 |
-
#
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
# --- HF Spaces convention ---
|
| 42 |
EXPOSE 7860
|
|
|
|
| 31 |
# --- project source ---
|
| 32 |
COPY src/ ./src/
|
| 33 |
COPY tests/fixtures/ ./tests/fixtures/
|
|
|
|
| 34 |
COPY supervisord.conf ./supervisord.conf
|
| 35 |
|
| 36 |
+
# --- seed raw data from fixtures, run pipeline, train model at image-build time ---
|
| 37 |
+
# data/raw/bbbp.csv is gitignored locally; we seed it from the test fixture so
|
| 38 |
+
# the deploy is self-contained. First call to /predict/bbb is then instant.
|
| 39 |
+
RUN mkdir -p data/raw data/processed && \
|
| 40 |
+
cp tests/fixtures/bbbp_sample.csv data/raw/bbbp.csv && \
|
| 41 |
+
python -m src.pipelines.bbb_pipeline && \
|
| 42 |
+
python -m src.models.bbb_model
|
| 43 |
|
| 44 |
# --- HF Spaces convention ---
|
| 45 |
EXPOSE 7860
|