Spaces:

anky2002
/

FORENSIQ

Running

App Files Files Community

anky2002 commited on 14 days ago

Commit

0d484f2

verified ·

1 Parent(s): 60d6a7a

Upload agents/semantic_agent.py with huggingface_hub

Browse files

Files changed (1) hide show

agents/semantic_agent.py +254 -133

agents/semantic_agent.py CHANGED Viewed

@@ -1,5 +1,11 @@
-"""FORENSIQ — Semantic Consistency Agent (23 features via VLM)
-Uses Qwen2.5-VL-72B with expert forensic prompts for deep visual reasoning.
 """
 import os, base64, io, json, re, numpy as np
 from PIL import Image
@@ -44,211 +50,326 @@ def _score(parsed):
     if v=="AUTHENTIC": return -0.4
     return 0.0
-# ═══ SYSTEM PROMPTS (23 features grouped into 5 VLM calls) ═══════════
-SYS_LIGHTING = """You are a world-class forensic photogrammetrist with 20+ years analyzing lighting in images for legal proceedings. You understand radiometry, photometry, and the physics of light transport at an expert level.
-Your analysis capabilities:
-1. SHADOW GEOMETRY: Trace every shadow to its casting object. All shadow vectors must converge to consistent light source position(s). Shadow length encodes sun elevation via tan(θ) = object_height/shadow_length. Penumbra width encodes light source angular size.
-2. INVERSE SQUARE LAW: Light intensity I = P/(4πr²). Surfaces equidistant from a point light must have equal irradiance. Check illumination falloff on flat surfaces (walls, floors, tables).
-3. SPECULAR HIGHLIGHTS: Each specular reflection encodes light source direction via the reflection law (angle of incidence = angle of reflection). Check that specular highlights across different objects in the scene are consistent with the same light source(s).
-4. AMBIENT OCCLUSION: Contact shadows and ambient occlusion should be darkest in concavities and where objects touch surfaces. AI often forgets these subtle cues.
-5. COLOR TEMPERATURE: All illuminated surfaces under the same light should share its color temperature. Mixed lighting (warm/cool) must be physically motivated (window + lamp).
-6. SUBSURFACE SCATTERING: Thin objects (ears, fingers, leaves) backlit by strong light should show red/warm translucency. AI rarely gets this right.
-7. CAUSTICS: Light through transparent objects (glass, water) creates caustic patterns. If present, they must match the refracting geometry.
-8. INTER-REFLECTIONS: Colored surfaces bounce colored light onto nearby surfaces. A red wall should tint nearby white objects slightly red.
-Report ALL violations with specific image region references. Be precise and clinical."""
-USR_LIGHTING = """Perform a complete lighting forensic analysis of this image.
-For each of these 8 sub-analyses, provide a separate assessment:
-1. Shadow Direction Convergence — trace visible shadows, do they converge?
-2. Inverse Square Law — does light intensity fall off naturally?
-3. Specular Highlight Consistency — are reflections physically consistent?
-4. Ambient Occlusion — are contact shadows present and correct?
-5. Color Temperature Consistency — does illumination color match across the scene?
-6. Subsurface Scattering — if thin translucent objects are visible, is SSS correct?
-7. Caustics — if transparent objects are present, are caustics correct?
-8. Inter-reflections — do colored surfaces bounce light correctly?
 Respond in JSON:
 {
-    "shadow_convergent": true/false,
-    "inverse_square_ok": true/false,
     "specular_consistent": true/false,
-    "ambient_occlusion_ok": true/false,
     "color_temp_consistent": true/false,
     "sss_correct": true/false/null,
     "caustics_correct": true/false/null,
     "interreflections_ok": true/false/null,
-    "anomalies": ["specific anomaly descriptions"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
-    "explanation": "detailed reasoning citing specific image regions"
 }"""
-SYS_ANATOMY = """You are a forensic anatomist and medical illustrator with encyclopedic knowledge of human body structure. AI-generated images violate anatomy in specific, detectable ways.
-Your detection capabilities:
-1. HANDS: Exactly 5 fingers per hand. Each finger has 3 phalanges (thumb: 2). Joints bend in ONE direction only. Nails are on the dorsal side. Thumb opposes other fingers. Palm lines, knuckle creases, and tendons must be consistent.
-2. FACIAL STRUCTURE: Bilateral symmetry (not perfect, but close). Eyes at same height, same size, same iris color. Ears at eye level, same size and shape. Teeth follow dental arch. Nostrils are symmetric.
-3. BODY PROPORTIONS: Head ≈ 1/7.5 of body height. Arm span ≈ height. Legs ≈ 50% of height. Elbow at waist level. Knee at mid-leg.
-4. SKIN TEXTURE: Consistent pore density. Wrinkles follow muscle fiber directions. No texture discontinuities.
-5. HAIR: Consistent direction of growth. No floating strands disconnected from scalp. Hairline follows natural patterns.
-6. EYES: Catchlight reflections should match between eyes and match the lighting direction. Iris has consistent color and pattern. Sclera is white with subtle veins.
-7. CLOTHING/ACCESSORIES: Fabric drapes under gravity. Seams are continuous. Buttons/zippers are physically connected. Jewelry doesn't float.
-Count fingers explicitly. Note any impossible joint angles. Check ear symmetry precisely."""
-USR_ANATOMY = """Perform a thorough anatomical forensic analysis of this image.
-Analyze each of these 7 categories:
-1. HAND ANATOMY — Count fingers on each visible hand. Check joint angles, nail placement, proportions.
-2. FACIAL SYMMETRY — Check eye alignment, ear symmetry, nose/mouth centering, teeth.
-3. BODY PROPORTIONS — Check limb ratios, joint positions, head-to-body ratio.
-4. SKIN & TEXTURE — Check pore consistency, wrinkle patterns, texture continuity.
-5. HAIR — Check growth direction, hairline, strand connectivity.
-6. EYE DETAILS — Check catchlights, iris consistency, sclera, eyelash direction.
-7. CLOTHING PHYSICS — Check fabric draping, seam continuity, accessory placement.
-If NO people are visible, set contains_people=false.
 Respond in JSON:
 {
     "contains_people": true/false,
     "hands_correct": true/false/null,
-    "finger_count": "e.g. 'Left: 5, Right: 5' or 'Left: 6 (extra pinky)'",
     "face_symmetric": true/false/null,
     "proportions_ok": true/false/null,
     "skin_natural": true/false/null,
     "hair_natural": true/false/null,
     "eyes_consistent": true/false/null,
     "clothing_ok": true/false/null,
-    "anomalies": ["specific anatomical errors"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
-    "explanation": "detailed reasoning with specific observations"
 }"""
-SYS_PHYSICS = """You are a forensic physicist specializing in physical plausibility analysis. Generative AI learns visual patterns but does NOT understand physics. Your job is to find violations.
-Your analysis domains:
-1. MATERIAL BRDF: Metals are specular and reflect environment. Glass refracts and distorts background. Matte surfaces have diffuse reflection only. Wet surfaces have higher specularity. The same material must have consistent reflectance properties.
-2. PERSPECTIVE GEOMETRY: All parallel lines in 3D converge to the same vanishing point. Vertical lines should remain vertical (unless tilted camera). Objects at the same distance should have the same scale.
-3. GRAVITY & MECHANICS: Objects rest on surfaces, not float. Liquids are level. Fabric drapes downward. Hair falls with gravity (unless in motion). Structures must be load-bearing.
-4. SCALE CONSISTENCY: Known objects (people, cars, doors, furniture) have known sizes. Check relative proportions.
-5. TRANSPARENCY & REFRACTION: Glass distorts what's behind it. Water refracts objects below the surface. Transparency should be consistent with material thickness.
-6. CONTACT & INTERACTION: Objects touching surfaces have contact shadows. Weight deforms soft surfaces. Reflections on surfaces show correct geometry.
-7. MOTION CONSISTENCY: If motion blur is present, it should be consistent with object velocity and direction. Frozen motion should show physically plausible pose.
-8. DEPTH ORDERING: Objects closer should occlude objects farther. No impossible overlaps."""
-USR_PHYSICS = """Analyze this image for violations of physical laws across 8 domains:
-1. Material BRDF consistency — are surface reflections physically correct?
-2. Perspective geometry — do parallel lines converge correctly?
-3. Gravity and mechanics — do objects obey gravity?
-4. Scale consistency — are objects proportional?
-5. Transparency/refraction — do transparent objects distort correctly?
-6. Contact and interaction — correct shadows and deformation?
-7. Motion consistency — is blur/motion physically plausible?
-8. Depth ordering — correct occlusion?
 Respond in JSON:
 {
-    "brdf_consistent": true/false,
     "perspective_correct": true/false,
     "gravity_ok": true/false,
     "scale_consistent": true/false,
     "transparency_ok": true/false/null,
-    "contact_correct": true/false,
     "motion_ok": true/false/null,
     "depth_ordering_ok": true/false,
-    "anomalies": ["specific physics violations"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
-    "explanation": "detailed reasoning"
 }"""
-SYS_CONTEXT = """You are a forensic scene analyst who evaluates whether an image's content is contextually plausible. AI-generated images often combine elements that shouldn't coexist.
-Your analysis:
-1. TEMPORAL CONSISTENCY: Season (foliage, clothing), time of day (sky, shadows, lighting), era (technology, fashion).
-2. GEOGRAPHIC CONSISTENCY: Architecture style matches vegetation. Road markings match country. Signs are in expected language.
-3. WEATHER CONSISTENCY: Sky matches ground conditions. Wet ground → overcast or recent rain. Snow → cold-weather attire.
-4. SOCIAL PLAUSIBILITY: People's attire matches setting. Group interactions are natural. No impossible crowd configurations.
-5. OBJECT RELATIONSHIPS: Furniture is functional. Appliances are connected. Tools are held correctly."""
-USR_CONTEXT = """Analyze contextual plausibility across 5 domains:
-1. Temporal — season, time of day, era consistency
-2. Geographic — architecture, vegetation, signage consistency
-3. Weather — sky vs ground conditions
-4. Social — attire, interactions, crowd plausibility
-5. Object relationships — functional arrangement
 Respond in JSON:
 {
-    "temporal_consistent": true/false,
     "geographic_consistent": true/false,
     "weather_consistent": true/false,
-    "social_plausible": true/false,
     "objects_functional": true/false,
-    "anomalies": ["specific contextual violations"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
-    "explanation": "reasoning"
 }"""
 def run_semantic_agent(img):
-    findings,scores=[],[]
-    vlm_ok=True
-    for sys_p,usr_p,name,features in [
-        (SYS_LIGHTING, USR_LIGHTING, "Lighting Physics", ["Shadow Convergence","Inverse Square Law","Specular Consistency","Ambient Occlusion","Color Temperature","Subsurface Scattering","Caustics","Inter-reflections"]),
-        (SYS_ANATOMY, USR_ANATOMY, "Anatomical Analysis", ["Hand Anatomy","Facial Symmetry","Body Proportions","Skin Texture","Hair","Eye Details","Clothing Physics"]),
-        (SYS_PHYSICS, USR_PHYSICS, "Physical Plausibility", ["Material BRDF","Perspective Geometry","Gravity","Scale","Transparency","Contact","Motion","Depth Ordering"]),
     ]:
         try:
-            resp=_vlm(img,sys_p,usr_p)
             if resp and not resp.startswith("VLM_ERROR"):
-                parsed=_parse(resp)
-                sc=_score(parsed)
-                if name=="Anatomical Analysis" and not parsed.get("contains_people",True):
-                    sc=0.0
-                # Create sub-findings for each feature
-                anomalies=parsed.get("anomalies",[])
                 for feat in features:
-                    findings.append({"test":feat,"score":sc/len(features),"note":parsed.get("explanation","")[:100],"parent":name})
-                    scores.append(sc/len(features))
-                findings.append({"test":name,"vlm_analysis":parsed,"anomalies":anomalies,
-                                "score":sc,"confidence":parsed.get("confidence",0.5),
-                                "note":parsed.get("explanation","")[:200]})
                 scores.append(sc)
             else:
-                vlm_ok=False
                 for feat in features:
-                    findings.append({"test":feat,"score":0.0,"note":"VLM unavailable","vlm_error":True})
                     scores.append(0.0)
         except Exception as e:
-            findings.append({"test":name,"error":str(e),"score":0})
-    # Context plausibility (separate call)
     try:
-        resp=_vlm(img,SYS_CONTEXT,USR_CONTEXT)
         if resp and not resp.startswith("VLM_ERROR"):
-            parsed=_parse(resp); sc=_score(parsed)
-            for feat in ["Temporal","Geographic","Weather","Social","Object Relations"]:
-                findings.append({"test":feat+" Plausibility","score":sc/5,"note":parsed.get("explanation","")[:100]})
-                scores.append(sc/5)
-        else: vlm_ok=False
-    except: pass
-    avg=float(np.mean(scores)) if scores else 0.0
-    conf=min(1.0,0.4+0.5*abs(avg))
-    if not vlm_ok: conf*=0.3
-    viol=[f["test"] for f in findings if f.get("score",0)>0.15 and "parent" not in f]
-    comp=[f["test"] for f in findings if f.get("score",0)<-0.1 and "parent" not in f]
-    rat=f"Semantic violations: {', '.join(viol[:5])}." if viol else f"Semantically consistent: {', '.join(comp[:5])}." if comp else "Semantic inconclusive."
     for f in findings:
-        if f.get("note") and "parent" not in f: rat+=f" [{f['test']}]: {f['note'][:100]}."
-    return AgentEvidence("Semantic Consistency Agent",np.clip(avg,-1,1),conf,
-                         0.0 if vlm_ok else 0.8, rat, [f for f in findings if "parent" not in f])

+"""FORENSIQ — Semantic Consistency Agent (31 features via VLM)
+Uses Qwen2.5-VL-72B with calibrated forensic prompts.
+Design principles applied from review:
+- Qualitative inconsistency detection, NOT metric estimation from 2D images
+- Explicit phenomenon ownership: Lighting owns illumination, Physics owns geometry/materials
+- Confidence calibration instructions in every prompt
+- Expanded Context prompt (5→8 sub-features)
 """
 import os, base64, io, json, re, numpy as np
 from PIL import Image
     if v=="AUTHENTIC": return -0.4
     return 0.0
+# ── Shared calibration instruction appended to every prompt ──────────
+CONFIDENCE_CALIBRATION = """
+CONFIDENCE CALIBRATION — CRITICAL:
+Your confidence score MUST follow these rules:
+- Default to 0.5 if you are uncertain or the evidence is ambiguous.
+- Only use 0.7+ if you observe an UNAMBIGUOUS, SPECIFIC violation (e.g., a hand with 6 clearly countable fingers, shadows pointing in opposite directions from same light source).
+- Only use 0.3 or below if the image is clearly, unambiguously consistent with reality and you can articulate exactly why.
+- Use 0.4-0.6 for most images. Most images are ambiguous. Do NOT inflate confidence.
+- If a sub-analysis is not applicable (no people, no text, no transparent objects), set that field to null and do NOT let it affect your overall confidence.
+VLMs systematically overstate confidence. Resist this bias. When in doubt, stay near 0.5."""
+# ═══════════════════════════════════════════════════════════════════════
+# PROMPT 1: LIGHTING (8 features)
+# Owns: ALL illumination phenomena — shadows, highlights, light color,
+#        light transport (SSS, caustics, inter-reflections)
+# Does NOT own: material reflectance (that's Physics), geometry (Physics)
+# ═══════════════════════════════════════════════════════════════════════
+SYS_LIGHTING = """You are a forensic lighting analyst. You detect QUALITATIVE inconsistencies in illumination that indicate AI generation or manipulation. You work from visual appearance, not metric measurement.
+IMPORTANT: You are analyzing a 2D image. You CANNOT compute exact distances, angles, or irradiance values. Instead, you look for VISIBLE INCONSISTENCIES that would be obvious to a trained observer:
+Your 8 analysis domains (you OWN these — no other agent covers them):
+1. SHADOW DIRECTION: Do shadows from different objects in the scene appear to point toward consistent light source position(s)? Look for shadows that diverge when they should converge, or shadows pointing in incompatible directions. You do NOT need to compute exact angles — just assess whether the overall shadow pattern is self-consistent.
+2. SHADOW QUALITY: Are shadow edges (penumbra) consistent with the apparent light source? A small bright light produces hard shadows; overcast sky produces soft shadows. Do ALL shadows in the scene share the same hardness/softness? Mixed hard and soft shadows without explanation (e.g., multiple lights) is suspicious.
+3. SPECULAR HIGHLIGHTS: Bright reflections on shiny surfaces encode the light direction. If multiple shiny objects are visible, do their highlights appear to come from the same direction? If a person has catchlights in their eyes, do both eyes show highlights in the same position?
+4. AMBIENT OCCLUSION: Where objects meet surfaces (feet on floor, cup on table, book on shelf), there should be subtle darkening at the contact line. AI images frequently omit contact shadows or place them incorrectly. Check: are contact shadows present where objects touch?
+5. COLOR TEMPERATURE: Light from a single source should tint all surfaces the same hue. Look for: one side of a face warm-toned while the other is cool-toned without a motivating second light source. Indoor scenes with mixed warm/cool illumination should have visible light sources to explain it.
+6. SUBSURFACE SCATTERING: If you can see thin body parts (ears, nostrils, fingers between a light) backlit by a strong source, they should glow warm/red from blood beneath the skin. If present, is it consistent with the light direction? If absent when expected, flag it.
+7. CAUSTICS: If glass, water, or transparent objects are present near a surface, look for projected light patterns. Their absence in a brightly lit scene with transparent objects is mildly suspicious. If caustics ARE visible, do they match the shape and position of the transparent object?
+8. INTER-REFLECTIONS: Strongly colored surfaces near neutral surfaces should tint them. A red blanket next to a white wall should cast a subtle red tint. Look for color bleeding that's present OR suspiciously absent.""" + CONFIDENCE_CALIBRATION
+USR_LIGHTING = """Analyze this image for lighting inconsistencies across all 8 domains.
+For each, give a QUALITATIVE assessment based on what you can visually observe — do NOT attempt to compute metric values like exact angles or irradiance.
 Respond in JSON:
 {
+    "shadow_direction_consistent": true/false,
+    "shadow_quality_consistent": true/false,
     "specular_consistent": true/false,
+    "ambient_occlusion_present": true/false,
     "color_temp_consistent": true/false,
     "sss_correct": true/false/null,
     "caustics_correct": true/false/null,
     "interreflections_ok": true/false/null,
+    "anomalies": ["specific anomaly descriptions with image region references"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
+    "explanation": "detailed reasoning citing what you observe, not what you compute"
 }"""
+# ═══════════════════════════════════════════════════════════════════════
+# PROMPT 2: ANATOMY (7 features)
+# ═══════════════════════════════════════════════════════════════════════
+SYS_ANATOMY = """You are a forensic anatomist. You detect anatomical errors in images that indicate AI generation.
+DETECTION PROTOCOL:
+1. HANDS — This is your highest-priority check. Procedure:
+   a) Locate every visible hand in the image.
+   b) For each hand, COUNT fingers individually: thumb, index, middle, ring, pinky. State the count explicitly.
+   c) Verify each finger has correct joint count (thumb: 2 joints, others: 3 joints).
+   d) Check that joints bend only in anatomically possible directions.
+   e) Verify nails are on the correct (dorsal) side of each finger.
+   f) If hands are partially occluded, note what's visible vs. hidden.
+2. FACIAL SYMMETRY — Flag asymmetry ONLY if it would be noticeable to a casual observer at normal viewing distance. Natural faces have subtle asymmetry; AI faces often have GROSS asymmetry (one ear significantly higher/larger, one eye noticeably different shape, jawline shifted). Do NOT flag sub-pixel or barely perceptible differences.
+3. BODY PROPORTIONS — Check against standard human ratios: head ≈ 1/7.5 of height, elbow at waist, fingertips at mid-thigh. Flag only OBVIOUS violations (forearm twice the length of upper arm, head clearly too large).
+4. SKIN TEXTURE — Look for abrupt texture changes: one patch of skin with visible pores adjacent to a smooth patch. Check for texture that transitions unnaturally between face regions.
+5. HAIR — Look for: strands that float disconnected from the scalp, hairline that dissolves into skin without natural transition, inconsistent hair direction (some strands defy gravity without explanation).
+6. EYE DETAILS — Catchlight reflections must appear in the same relative position in both eyes (same light source). Both irises should have the same color. Eyelashes should radiate outward from the lid margin.
+7. CLOTHING — Fabric must drape under gravity. Seams must be continuous (not disappearing/reappearing). Buttons must have buttonholes. Jewelry must connect to the body.""" + CONFIDENCE_CALIBRATION
+USR_ANATOMY = """Perform anatomical forensic analysis.
+MANDATORY: If hands are visible, explicitly count each finger on each hand. State your count clearly (e.g., "Left hand: thumb, index, middle, ring, pinky = 5 fingers").
+If NO people are visible, set contains_people=false and skip all other fields.
 Respond in JSON:
 {
     "contains_people": true/false,
     "hands_correct": true/false/null,
+    "finger_count": "explicit count per hand, e.g. 'Left: 5 (thumb,index,middle,ring,pinky), Right: not visible'",
     "face_symmetric": true/false/null,
     "proportions_ok": true/false/null,
     "skin_natural": true/false/null,
     "hair_natural": true/false/null,
     "eyes_consistent": true/false/null,
     "clothing_ok": true/false/null,
+    "anomalies": ["specific anatomical errors with locations"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
+    "explanation": "reasoning with specific observations — for hands, cite your finger count"
 }"""
+# ═══════════════════════════════════════════════════════════════════════
+# PROMPT 3: PHYSICAL PLAUSIBILITY (8 features)
+# Owns: geometry, material appearance, structural mechanics, object interaction
+# Does NOT own: illumination/shadows (that's Lighting), anatomy (that's Anatomy)
+# Explicit partition from Lighting: this agent checks materials, perspective, and
+#   structural physics. It does NOT re-analyze shadows, highlights, or light color.
+# ═══════════════════════════════════════════════════════════════════════
+SYS_PHYSICS = """You are a forensic physicist. You detect violations of geometry, material properties, and structural mechanics in images.
+SCOPE — You analyze these 8 domains. You do NOT analyze lighting/shadows/specular highlights (a separate Lighting Agent handles those). Focus ONLY on:
+1. MATERIAL APPEARANCE: Does each material look like what it claims to be? Metals should show environment reflections. Wood should have grain. Fabric should have texture. The SAME material across an image should have consistent appearance. Look for: a "metal" railing that looks like plastic, or glass that doesn't distort the background.
+2. PERSPECTIVE GEOMETRY: Parallel lines in the real world (edges of buildings, railroad tracks, road markings) must converge to consistent vanishing points. Check for: lines that should be parallel but converge to different points, vertical lines that lean inconsistently.
+3. GRAVITY & STRUCTURE: Everything must obey gravity. Objects rest on surfaces, don't float. Liquids have flat surfaces. Cantilevered structures need support. Fabric hangs down. Hair falls down (unless wind/motion is depicted). Look for: floating objects, impossible structural loads, upward-flowing fabric.
+4. SCALE & PROPORTION: Objects with known real-world sizes (people ~1.7m, doors ~2m, cars ~4.5m, chairs ~0.45m seat height) should be proportional to each other. Check for: a person who would be 3m tall next to a door, or a cup the size of a head.
+5. TRANSPARENCY: Glass transmits and distorts. Water refracts. Transparent objects should show what's behind them, distorted appropriately. Frosted glass blurs. Thick glass distorts more. Check for: glass that's perfectly clear with no distortion, or opaque "glass."
+6. CONTACT PHYSICS: Where objects rest on soft surfaces, there should be deformation (cushion under person, mattress under object). Where heavy objects rest on surfaces, the surface should show appropriate response.
+7. MOTION COHERENCE: If motion blur is present, its direction and magnitude should be consistent with the depicted motion. A moving car should have horizontal blur. A falling object should have vertical blur. An image with one object blurred and everything else sharp needs a fast-moving object OR selective focus.
+8. DEPTH & OCCLUSION: Nearer objects must occlude farther ones consistently. No object should appear to be simultaneously in front of AND behind another object. Occlusion boundaries should be clean (no "melting" edges).""" + CONFIDENCE_CALIBRATION
+USR_PHYSICS = """Analyze this image for physics violations.
+SCOPE REMINDER: Do NOT analyze lighting, shadows, or specular highlights — that is handled by a separate agent. Focus on materials, geometry, gravity, scale, transparency, contact, motion, and depth.
 Respond in JSON:
 {
+    "material_consistent": true/false,
     "perspective_correct": true/false,
     "gravity_ok": true/false,
     "scale_consistent": true/false,
     "transparency_ok": true/false/null,
+    "contact_ok": true/false,
     "motion_ok": true/false/null,
     "depth_ordering_ok": true/false,
+    "anomalies": ["specific physics violations — not lighting"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
+    "explanation": "reasoning focused on geometry and material physics"
 }"""
+# ═══════════════════════════════════════════════════════════════════════
+# PROMPT 4: CONTEXT PLAUSIBILITY (8 features — expanded from 5)
+# ═══════════════════════════════════════════════════════════════════════
+SYS_CONTEXT = """You are a forensic scene analyst specializing in contextual coherence. AI-generated images often combine elements that could not physically coexist in the same real photograph.
+Your 8 analysis domains:
+1. TEMPORAL SEASON: Vegetation, foliage color, and flower blooming must match. Snow on the ground requires bare or evergreen trees. Green deciduous leaves + snow is a contradiction. Clothing should match the apparent season.
+2. TIME OF DAY: Sky color/brightness must match shadow lengths and lighting direction. A bright blue sky requires short shadows (midday) or long shadows from a specific direction. Stars visible + brightly lit ground is contradictory.
+3. ERA / TECHNOLOGY ANACHRONISM: Visible technology (phones, cars, screens, signage style) should match the apparent era. A scene with 1950s architecture containing modern smartphones is suspicious. Fashion should match the apparent era of other objects.
+4. GEOGRAPHIC COHERENCE: Architecture style must match vegetation and climate. Tropical palm trees next to Northern European half-timbered houses is impossible. Road markings should match the apparent country (right-hand vs left-hand traffic, line styles). Visible text/signs should be in the expected language for the geography.
+5. WEATHER COHERENCE: Sky conditions must match ground conditions. Wet pavement requires recent rain or overcast sky. Dry dust in the air contradicts standing water. Snow requires freezing conditions (visible breath, winter clothing). Fog obscures distant objects.
+6. ATTIRE-SETTING MATCH: Beach clothing at a business meeting is impossible (unless clearly a party/casual scene). Winter coats in a tropical setting. Formal wear in a construction zone. Analyze whether clothing choices are plausible for the depicted location and activity.
+7. SIGN & LABEL COHERENCE: Visible signs, labels, and text should be appropriate for the scene type. A restaurant should show food-related signage. A hospital should show medical signage. Signs in a residential area should show house numbers, street names. Complete absence of expected signage in a commercial area is mildly suspicious.
+8. OBJECT FUNCTION & ARRANGEMENT: Furniture should be arranged for use (chairs face tables). Appliances should be connected (lamps plugged in, or at least near outlets). Tools should be held or stored correctly. Kitchen items should be in kitchens. Check for: objects that serve no function, impossible arrangements, or items placed where they'd be impractical.""" + CONFIDENCE_CALIBRATION
+USR_CONTEXT = """Analyze contextual plausibility across all 8 domains:
+1. Temporal/Season — vegetation vs clothing vs weather
+2. Time of Day — sky vs shadows vs lighting
+3. Era/Technology — anachronistic objects
+4. Geographic — architecture vs vegetation vs signage language
+5. Weather — sky vs ground conditions vs attire
+6. Attire-Setting — clothing appropriate for location/activity
+7. Sign/Label Coherence — signage matches scene type
+8. Object Arrangement — functional, plausible placement
 Respond in JSON:
 {
+    "season_consistent": true/false,
+    "time_of_day_consistent": true/false,
+    "era_consistent": true/false,
     "geographic_consistent": true/false,
     "weather_consistent": true/false,
+    "attire_setting_match": true/false,
+    "signage_coherent": true/false,
     "objects_functional": true/false,
+    "anomalies": ["specific contextual violations with reasoning"],
     "confidence": 0.0-1.0,
     "verdict": "AUTHENTIC" or "SUSPICIOUS" or "MANIPULATED",
+    "explanation": "detailed reasoning per domain"
 }"""
+# ═══════════════════════════════════════════════════════════════════════
+# AGENT RUNNER
+# ═══════════════════════════════════════════════════════════════════════
+# VLM confidence temperature — applied before feeding into Bayesian Eq.1
+# VLMs systematically overstate confidence; this compresses toward 0.5
+VLM_CONFIDENCE_TEMPERATURE = 2.0
+def _calibrate_vlm_confidence(raw_conf: float) -> float:
+    """Post-process VLM confidence with temperature scaling.
+    Compresses extreme values toward 0.5 to counter VLM overconfidence."""
+    if raw_conf <= 0 or raw_conf >= 1:
+        return 0.5
+    logit = np.log(raw_conf / (1 - raw_conf))
+    scaled = logit / VLM_CONFIDENCE_TEMPERATURE
+    return float(1.0 / (1.0 + np.exp(-scaled)))
 def run_semantic_agent(img):
+    findings, scores = [], []
+    vlm_ok = True
+    for sys_p, usr_p, name, features in [
+        (SYS_LIGHTING, USR_LIGHTING, "Lighting Physics",
+         ["Shadow Direction","Shadow Quality","Specular Consistency","Ambient Occlusion",
+          "Color Temperature","Subsurface Scattering","Caustics","Inter-reflections"]),
+        (SYS_ANATOMY, USR_ANATOMY, "Anatomical Analysis",
+         ["Hand Anatomy","Facial Symmetry","Body Proportions","Skin Texture",
+          "Hair Consistency","Eye Details","Clothing Physics"]),
+        (SYS_PHYSICS, USR_PHYSICS, "Physical Plausibility",
+         ["Material Appearance","Perspective Geometry","Gravity & Structure",
+          "Scale & Proportion","Transparency","Contact Physics","Motion Coherence","Depth & Occlusion"]),
     ]:
         try:
+            resp = _vlm(img, sys_p, usr_p)
             if resp and not resp.startswith("VLM_ERROR"):
+                parsed = _parse(resp)
+                sc = _score(parsed)
+                # Calibrate VLM confidence before storing
+                raw_conf = parsed.get("confidence", 0.5)
+                cal_conf = _calibrate_vlm_confidence(raw_conf)
+                if name == "Anatomical Analysis" and not parsed.get("contains_people", True):
+                    sc = 0.0
+                anomalies = parsed.get("anomalies", [])
                 for feat in features:
+                    findings.append({"test": feat, "score": sc / len(features),
+                                   "note": parsed.get("explanation", "")[:100], "parent": name})
+                    scores.append(sc / len(features))
+                findings.append({"test": name, "vlm_analysis": parsed, "anomalies": anomalies,
+                               "score": sc, "confidence": cal_conf,
+                               "raw_vlm_confidence": raw_conf,
+                               "calibrated_confidence": cal_conf,
+                               "note": parsed.get("explanation", "")[:200]})
                 scores.append(sc)
             else:
+                vlm_ok = False
                 for feat in features:
+                    findings.append({"test": feat, "score": 0.0, "note": "VLM unavailable", "vlm_error": True})
                     scores.append(0.0)
         except Exception as e:
+            findings.append({"test": name, "error": str(e), "score": 0})
+    # Context plausibility (expanded to 8 sub-features)
     try:
+        resp = _vlm(img, SYS_CONTEXT, USR_CONTEXT)
         if resp and not resp.startswith("VLM_ERROR"):
+            parsed = _parse(resp)
+            sc = _score(parsed)
+            raw_conf = parsed.get("confidence", 0.5)
+            cal_conf = _calibrate_vlm_confidence(raw_conf)
+            context_features = ["Season Consistency","Time-of-Day","Era/Technology",
+                              "Geographic Coherence","Weather Coherence",
+                              "Attire-Setting Match","Sign/Label Coherence","Object Arrangement"]
+            for feat in context_features:
+                findings.append({"test": feat, "score": sc / len(context_features),
+                               "note": parsed.get("explanation", "")[:100], "parent": "Context"})
+                scores.append(sc / len(context_features))
+            findings.append({"test": "Context Plausibility", "vlm_analysis": parsed,
+                           "score": sc, "confidence": cal_conf,
+                           "note": parsed.get("explanation", "")[:200]})
+            scores.append(sc)
+        else:
+            vlm_ok = False
+    except:
+        pass
+    avg = float(np.mean(scores)) if scores else 0.0
+    conf = min(1.0, 0.4 + 0.5 * abs(avg))
+    if not vlm_ok:
+        conf *= 0.3
+    viol = [f["test"] for f in findings if f.get("score", 0) > 0.15 and "parent" not in f]
+    comp = [f["test"] for f in findings if f.get("score", 0) < -0.1 and "parent" not in f]
+    rat = f"Semantic violations: {', '.join(viol[:5])}." if viol else \
+          f"Semantically consistent: {', '.join(comp[:5])}." if comp else "Semantic inconclusive."
     for f in findings:
+        if f.get("note") and "parent" not in f:
+            rat += f" [{f['test']}]: {f['note'][:100]}."
+    return AgentEvidence("Semantic Consistency Agent", np.clip(avg, -1, 1), conf,
+                         0.0 if vlm_ok else 0.8, rat,
+                         [f for f in findings if "parent" not in f])