Spaces:

DocSA
/

LP_2-AI_Assistant

Running

DocUA Claude Sonnet 4.6 commited on Mar 6

Commit

805748d

1 Parent(s): 028f33a

feat: Add validation to skip duplicate header rows in batch processing

- Skip rows where majority of cell values match column names (header duplicates)
- Add tests_data/ to .gitignore to prevent binary xlsx files from being committed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (2) hide show

.gitignore +1 -0
interface.py +7 -0

.gitignore CHANGED Viewed

@@ -55,6 +55,7 @@ isolated-lp-generation/
 # Ігноруємо тестові файли
 data_test/
 # Ігноруємо додані документи
 Add_docs/

 # Ігноруємо тестові файли
 data_test/
+tests_data/
 # Ігноруємо додані документи
 Add_docs/

interface.py CHANGED Viewed

@@ -418,6 +418,13 @@ async def process_batch_testing(
             court_decision_text = str(row['text'])
             # Generate legal position
             try:
                 legal_position_json = generate_legal_position(

             court_decision_text = str(row['text'])
+            # Skip rows where cell values match column names — these are duplicate header rows
+            col_values = {str(col): str(row[col]) for col in row.index}
+            header_matches = sum(1 for col, val in col_values.items() if val == col)
+            if header_matches >= max(1, len(col_values) // 2):
+                results.append("ПРОПУЩЕНО: рядок містить назви колонок (дублікат заголовка)")
+                continue
             # Generate legal position
             try:
                 legal_position_json = generate_legal_position(