edgeeda-agent / CHANGELOG_FIXES.md
SamChYe's picture
Publish EdgeEDA agent
aa677e3 verified

Changelog: Immediate and Short-term Fixes

Summary

This document tracks the immediate and short-term fixes implemented based on the repository analysis.


Immediate Fixes (Completed)

1. βœ… Comprehensive Logging Added (src/edgeeda/cli.py)

Changes:

  • Added _setup_logging() function to configure logging to both file and console
  • Log file created at {out_dir}/tuning.log
  • Added detailed logging throughout the tuning loop:
    • Experiment start/configuration
    • Each action proposal (variant, fidelity, knobs)
    • Make command execution results
    • Metadata extraction attempts
    • Reward computation results
    • Summary statistics at completion

Benefits:

  • Full visibility into tuning process
  • Easy debugging of failures
  • Historical record of experiments

2. βœ… SurrogateUCBAgent Knob Storage Fixed (src/edgeeda/agents/surrogate_ucb.py)

Changes:

  • Initialize _variant_knobs dictionary in __init__() instead of lazy initialization
  • Removed hasattr() checks - always use self._variant_knobs
  • Ensures knob values are always available for promotion logic

Benefits:

  • Prevents AttributeError when promoting variants
  • More reliable multi-fidelity optimization
  • Cleaner code without hasattr checks

3. βœ… Configuration Validation Added (src/edgeeda/config.py)

Changes:

  • Added _validate_config() function with comprehensive checks:
    • Budget validation (total_actions > 0, max_expensive >= 0, max_expensive <= total_actions)
    • Fidelities validation (non-empty)
    • Knobs validation (non-empty, min < max, valid types)
    • Reward weights validation (non-empty)
    • Reward candidates validation (at least one list non-empty)

Benefits:

  • Catches configuration errors early
  • Clear error messages for invalid configs
  • Prevents runtime failures from bad configs

4. βœ… Improved Error Messages (src/edgeeda/orfs/runner.py)

Changes:

  • Added RunResult.is_success() method
  • Added RunResult.error_summary() method that:
    • Extracts error lines from stderr
    • Falls back to last few lines if no error keywords found
    • Provides concise error information

Benefits:

  • Better error visibility
  • Easier debugging of failed make commands
  • Structured error information

5. βœ… Robust Metadata Extraction (src/edgeeda/orfs/metrics.py)

Changes:

  • Added logging throughout metadata search process
  • Improved find_best_metadata_json() with:
    • Multiple pattern matching (exact matches first, then patterns)
    • Better error handling for missing directories
    • Debug logging for search process
  • Enhanced load_json() with:
    • Specific exception handling
    • Error logging for different failure modes

Benefits:

  • More reliable metadata discovery
    • Tries exact matches: metadata.json, metrics.json
    • Then pattern matches: *metadata*.json, *metrics*.json
    • Falls back to any JSON file
  • Better debugging when metadata is missing
  • Clear error messages for JSON parsing failures

6. βœ… Retry Logic for Transient Failures (src/edgeeda/orfs/runner.py)

Changes:

  • Added max_retries parameter to run_make() method
  • Implements exponential backoff (2^attempt seconds)
  • Handles:
    • Subprocess failures (retries on non-zero return codes)
    • Timeout exceptions
    • General exceptions during execution

Benefits:

  • Handles transient network/filesystem issues
  • Reduces false failures from temporary problems
  • Configurable retry behavior

Short-term Fixes (Completed)

7. βœ… Unit Tests for Agents (tests/test_agents.py)

New Test File:

  • test_random_search_proposes() - Validates random search action proposals
  • test_random_search_observe() - Tests observe method
  • test_successive_halving_initialization() - Tests SH agent setup
  • test_successive_halving_propose() - Tests action proposals
  • test_successive_halving_promotion() - Tests multi-fidelity promotion
  • test_surrogate_ucb_initialization() - Tests SurrogateUCB setup
  • test_surrogate_ucb_propose() - Tests action proposals
  • test_surrogate_ucb_observe() - Tests observation storage
  • test_surrogate_ucb_knob_storage() - Tests knob storage for promotion
  • test_surrogate_ucb_surrogate_fitting() - Tests surrogate model fitting
  • test_agent_action_consistency() - Tests all agents produce valid actions

Coverage:

  • All three agent types
  • Key agent behaviors (propose, observe, promotion)
  • Edge cases and error handling

8. βœ… Unit Tests for Metrics (tests/test_metrics.py)

New Test File:

  • test_flatten_metrics_*() - Tests metric flattening (simple, complex, empty, leaf values)
  • test_coerce_float_*() - Tests float coercion (int, float, string, invalid)
  • test_pick_first_*() - Tests metric key selection (found, not found, case-insensitive, multiple candidates)
  • test_load_json_*() - Tests JSON loading (valid, invalid, missing)

Coverage:

  • All metrics utility functions
  • Edge cases and error conditions
  • Type coercion and matching logic

9. βœ… Updated Dependencies (requirements.txt)

Changes:

  • Added pytest>=7.0 for running unit tests

Testing

To run the new tests:

# Install pytest if not already installed
pip install pytest

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_agents.py -v
pytest tests/test_metrics.py -v

# Run with coverage
pytest tests/ --cov=edgeeda --cov-report=html

Usage Examples

Using Logging

Logs are automatically created when running edgeeda tune:

edgeeda tune --config configs/gcd_nangate45.yaml --budget 24
# Logs written to: runs/tuning.log

Using Retry Logic

Retry logic is available but defaults to 0 retries. To enable:

# In cli.py, modify run_make calls:
rr = runner.run_make(
    target=make_target,
    design_config=cfg.design.design_config,
    flow_variant=action.variant,
    overrides={k: str(v) for k, v in action.knobs.items()},
    timeout_sec=args.timeout,
    max_retries=2,  # Add this parameter
)

Configuration Validation

Invalid configurations now fail early with clear messages:

# This will raise ValueError:
cfg = load_config("invalid_config.yaml")
# ValueError: total_actions must be > 0, got -5

Files Modified

  1. src/edgeeda/cli.py - Added logging throughout
  2. src/edgeeda/agents/surrogate_ucb.py - Fixed knob storage
  3. src/edgeeda/config.py - Added validation
  4. src/edgeeda/orfs/runner.py - Improved error messages, added retry logic
  5. src/edgeeda/orfs/metrics.py - Enhanced metadata extraction with logging
  6. requirements.txt - Added pytest
  7. tests/test_agents.py - New test file
  8. tests/test_metrics.py - New test file

Next Steps

Recommended Follow-ups:

  1. Run Tests: Install pytest and verify all tests pass
  2. Test Logging: Run a small experiment and verify logs are created
  3. Test Retry Logic: Manually test retry behavior with transient failures
  4. Validate Config: Try invalid configs to see validation in action

Future Enhancements:

  • Add integration tests with mock ORFS runner
  • Add performance benchmarks
  • Add more visualization options
  • Implement parallel execution
  • Add resume from checkpoint functionality

Notes

  • All changes maintain backward compatibility
  • No breaking changes to existing APIs
  • Logging can be disabled by setting log level to WARNING or ERROR
  • Retry logic defaults to 0 (no retries) to maintain current behavior
  • Tests require pytest but don't affect runtime dependencies