mekosotto Claude Opus 4.7 (1M context) commited on
Commit
fec13e9
·
1 Parent(s): fca5ceb

fix(deploy): seed data/raw + run bbb_pipeline before bbb_model in build

Browse files

Build-time crash: data/raw/bbbp.csv is gitignored locally, so the
previous COPY data/raw/ instruction copied an empty directory into
the image; bbb_pipeline never produced bbbp_features.parquet, and
bbb_model.main() raised FileNotFoundError on the missing parquet.

Fix: drop the now-useless COPY data/raw/, instead RUN a single shell
chain that:
1. mkdir -p data/raw data/processed
2. cp tests/fixtures/bbbp_sample.csv data/raw/bbbp.csv (self-contained)
3. python -m src.pipelines.bbb_pipeline (produces parquet)
4. python -m src.models.bbb_model (trains, persists joblib)

The existing deploy smoke tests still pass — they only assert that
'src.models.bbb_model' appears in the Dockerfile, which it still does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show
  1. Dockerfile +7 -4
  2. Dockerfile.hf +7 -4
Dockerfile CHANGED
@@ -31,12 +31,15 @@ RUN pip install -r requirements.txt
31
  # --- project source ---
32
  COPY src/ ./src/
33
  COPY tests/fixtures/ ./tests/fixtures/
34
- COPY data/raw/ ./data/raw/
35
  COPY supervisord.conf ./supervisord.conf
36
 
37
- # --- build BBB model artifact at image-build time ---
38
- # This makes the first /predict/bbb call instant on cold start.
39
- RUN python -m src.models.bbb_model
 
 
 
 
40
 
41
  # --- HF Spaces convention ---
42
  EXPOSE 7860
 
31
  # --- project source ---
32
  COPY src/ ./src/
33
  COPY tests/fixtures/ ./tests/fixtures/
 
34
  COPY supervisord.conf ./supervisord.conf
35
 
36
+ # --- seed raw data from fixtures, run pipeline, train model at image-build time ---
37
+ # data/raw/bbbp.csv is gitignored locally; we seed it from the test fixture so
38
+ # the deploy is self-contained. First call to /predict/bbb is then instant.
39
+ RUN mkdir -p data/raw data/processed && \
40
+ cp tests/fixtures/bbbp_sample.csv data/raw/bbbp.csv && \
41
+ python -m src.pipelines.bbb_pipeline && \
42
+ python -m src.models.bbb_model
43
 
44
  # --- HF Spaces convention ---
45
  EXPOSE 7860
Dockerfile.hf CHANGED
@@ -31,12 +31,15 @@ RUN pip install -r requirements.txt
31
  # --- project source ---
32
  COPY src/ ./src/
33
  COPY tests/fixtures/ ./tests/fixtures/
34
- COPY data/raw/ ./data/raw/
35
  COPY supervisord.conf ./supervisord.conf
36
 
37
- # --- build BBB model artifact at image-build time ---
38
- # This makes the first /predict/bbb call instant on cold start.
39
- RUN python -m src.models.bbb_model
 
 
 
 
40
 
41
  # --- HF Spaces convention ---
42
  EXPOSE 7860
 
31
  # --- project source ---
32
  COPY src/ ./src/
33
  COPY tests/fixtures/ ./tests/fixtures/
 
34
  COPY supervisord.conf ./supervisord.conf
35
 
36
+ # --- seed raw data from fixtures, run pipeline, train model at image-build time ---
37
+ # data/raw/bbbp.csv is gitignored locally; we seed it from the test fixture so
38
+ # the deploy is self-contained. First call to /predict/bbb is then instant.
39
+ RUN mkdir -p data/raw data/processed && \
40
+ cp tests/fixtures/bbbp_sample.csv data/raw/bbbp.csv && \
41
+ python -m src.pipelines.bbb_pipeline && \
42
+ python -m src.models.bbb_model
43
 
44
  # --- HF Spaces convention ---
45
  EXPOSE 7860