Yuhao commited on Mar 19

Commit

52a881a

1 Parent(s): f7f33b5

Restructure inference and add INT4 serving

Files changed (47) hide show

LICENSE +9 -0
README.md +109 -56
inference/.ipynb_checkpoints/deepseek_service-checkpoint.py +0 -384
inference/.ipynb_checkpoints/demo-checkpoint.py +0 -76
inference/.ipynb_checkpoints/inference-checkpoint.py +0 -43
inference/.ipynb_checkpoints/model_utils-checkpoint.py +0 -120
inference/README.md +11 -0
inference/__init__.py +1 -0
inference/__pycache__/app.cpython-311.pyc +0 -0
inference/__pycache__/deepseek_service.cpython-311.pyc +0 -0
inference/__pycache__/model_utils.cpython-311.pyc +0 -0
inference/demo.py +0 -79
inference/full_precision/__init__.py +1 -0
inference/full_precision/__pycache__/app.cpython-311.pyc +0 -0
inference/full_precision/__pycache__/chat.cpython-311.pyc +0 -0
inference/full_precision/__pycache__/deepseek_service.cpython-311.pyc +0 -0
inference/full_precision/__pycache__/demo.cpython-311.pyc +0 -0
inference/full_precision/__pycache__/infer.cpython-311.pyc +0 -0
inference/full_precision/__pycache__/model_utils.cpython-311.pyc +0 -0
inference/{app.py → full_precision/app.py} +162 -256
inference/{chat.py → full_precision/chat.py} +38 -35
inference/{deepseek_service.py → full_precision/deepseek_service.py} +86 -199
inference/full_precision/demo.py +41 -0
inference/full_precision/infer.py +54 -0
inference/{model_utils.py → full_precision/model_utils.py} +103 -57
inference/full_precision/run_api.sh +6 -0
inference/full_precision/run_chat.sh +6 -0
inference/full_precision/run_infer.sh +6 -0
inference/inference.py +0 -43
inference/int4_quantized/__init__.py +1 -0
inference/int4_quantized/__pycache__/app.cpython-311.pyc +0 -0
inference/int4_quantized/__pycache__/chat.cpython-311.pyc +0 -0
inference/int4_quantized/__pycache__/infer.cpython-311.pyc +0 -0
inference/int4_quantized/__pycache__/model_utils.cpython-311.pyc +0 -0
inference/{.ipynb_checkpoints/app-checkpoint.py → int4_quantized/app.py} +181 -260
inference/{.ipynb_checkpoints/chat-checkpoint.py → int4_quantized/chat.py} +37 -38
inference/int4_quantized/infer.py +82 -0
inference/int4_quantized/model_utils.py +538 -0
inference/int4_quantized/run_api.sh +6 -0
inference/int4_quantized/run_chat.sh +6 -0
inference/int4_quantized/run_infer.sh +6 -0
inference/int4_quantized/test_single.sh +6 -0
inference/temp_uploads/.ipynb_checkpoints/temp_d2b1c6f9a43940d2812f10a8cc8bc3ef-checkpoint.jpg +0 -0
inference/temp_uploads/.ipynb_checkpoints/user_1769671453128_43ccc61bfcb64c6bbbabbadfa887591c-checkpoint.jpg +0 -0
inference/temp_uploads/temp_d2b1c6f9a43940d2812f10a8cc8bc3ef.jpg +0 -0
inference/temp_uploads/user_1769671453128_43ccc61bfcb64c6bbbabbadfa887591c.jpg +0 -0
requirements.txt +5 -1

LICENSE ADDED Viewed

	@@ -0,0 +1,9 @@

+SkinGPT-R1 is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
+License summary:
+- Attribution required
+- Non-commercial use only
+- Share adaptations under the same license
+Full license text:
+https://creativecommons.org/licenses/by-nc-sa/4.0/

README.md CHANGED Viewed

@@ -10,94 +10,147 @@ tags:
 # SkinGPT-R1
-**SkinGPT-R1** is a dermatological reasoning vision Language model (VLM).
-## ⚠️ Disclaimer
-This model is **for research and educational use only**. It is **NOT a substitute for professional medical advice, diagnosis, or treatment**.
-## 🛠️ Environment Setup
-To ensure compatibility, we strongly recommend creating a fresh Conda environment.
-### 1. Create Conda Environment
-Create a new environment named skingpt-r1 with Python 3.10:
 ```bash
 conda create -n skingpt-r1 python=3.10 -y
 conda activate skingpt-r1
 ```
-### 2. Install Dependencies
 ```bash
-pip install -r requirements.txt
 ```
-### (Optional) For faster inference on NVIDIA GPUs:
 ```bash
-pip install flash-attn --no-build-isolation
 ```
-## 🚀 Usage
-### Quick Start
-If you just installed the environment and want to check if it works:
-Open ***demo.py*** and Change the ***IMAGE_PATH*** variable to your image file.
 ```bash
-python demo.py
 ```
-### Interactive Chat
-To have a multi-turn conversation (e.g., asking follow-up questions about the diagnosis) in your terminal:
 ```bash
-python chat.py --image ./test_images/lesion.jpg
 ```
-### FastAPI Backend Deployment
-To deploy the model as a backend service (supporting image uploads and session management):
-#### Start the Server
 ```bash
-python app.py
 ```
-#### API Workflow
-Manage sessions via state_id to support multi-user history.
-Upload: POST /v1/upload/{state_id} — Uploads an image for the session.
-Chat: POST /v1/predict/{state_id} — Sends text (JSON: {"message": "..."}) and gets a response.
-Reset: POST /v1/reset/{state_id} — Clears session history and images.
-#### Client Example
-```python
-import requests
-API_URL = "http://localhost:5900"
-STATE_ID = "patient_001"
-# 1. Upload Image
-with open("skin_image.jpg", "rb") as f:
-    requests.post(f"{API_URL}/v1/upload/{STATE_ID}", files={"file": f})
-# 2. Ask for Diagnosis
-response = requests.post(
-    f"{API_URL}/v1/predict/{STATE_ID}",
-    json={"message": "Please analyze this image."}
-)
-print("AI:", response.json()["message"])
-# 3. Ask Follow-up
-response = requests.post(
-    f"{API_URL}/v1/predict/{STATE_ID}",
-    json={"message": "What treatment do you recommend?"}
-)
-print("AI:", response.json()["message"])
-```

 # SkinGPT-R1
+**Update:** We will soon release the **SkinGPT-R1-7B** weights.
+SkinGPT-R1 is a dermatological reasoning vision language model for research and education.
+From **The Chinese University of Hong Kong, Shenzhen (CUHKSZ)**.
+## Disclaimer
+This project is for **research and educational use only**. It is **not** a substitute for professional medical advice, diagnosis, or treatment.
+## License
+This repository is released under **CC BY-NC-SA 4.0**.
+See [LICENSE](/Users/smac/Documents/SkinGPT-R1/LICENSE) for details.
+## Structure
+```text
+SkinGPT-R1/
+├── checkpoints/
+├── inference/
+│   ├── full_precision/
+│   └── int4_quantized/
+├── requirements.txt
+└── README.md
+```
+Checkpoint paths:
+- Full precision: `./checkpoints/full_precision`
+- INT4 quantized: `./checkpoints/int4`
+## Install
 ```bash
 conda create -n skingpt-r1 python=3.10 -y
 conda activate skingpt-r1
+pip install -r requirements.txt
 ```
+## Attention Backend Notes
+This repo uses two attention acceleration paths:
+- `flash_attention_2`: external package, optional
+- `sdpa`: PyTorch native scaled dot product attention
+Recommended choice for this repo:
+- RTX 50 series: use `sdpa`
+- A100 / RTX 3090 / RTX 4090 / H100 and other GPUs explicitly listed by the FlashAttention project: you can try `flash_attention_2`
+Practical notes:
+- The current repo pins `torch==2.4.0`, and SDPA is already built into PyTorch in this version.
+- FlashAttention's official README currently lists Ampere, Ada, and Hopper support for FlashAttention-2. It does not list RTX 50 / Blackwell consumer GPUs in that section, so this repo defaults to `sdpa` for that path.
+- PyTorch 2.5 added a newer cuDNN SDPA backend for H100-class or newer GPUs, but this repo is pinned to PyTorch 2.4, so you should not assume those 2.5-specific gains here.
+If you are on an RTX 5090 and `flash-attn` is unavailable or unstable in your environment, use the INT4 path in this repo, which is already configured with `attn_implementation="sdpa"`.
+## Usage
+### Full Precision
+Single image:
 ```bash
+bash inference/full_precision/run_infer.sh --image ./test_images/lesion.jpg
 ```
+Multi-turn chat:
 ```bash
+bash inference/full_precision/run_chat.sh --image ./test_images/lesion.jpg
 ```
+API service:
+```bash
+bash inference/full_precision/run_api.sh
+```
+Default API port: `5900`
+### INT4 Quantized
+Single image:
 ```bash
+bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
 ```
+Multi-turn chat:
 ```bash
+bash inference/int4_quantized/run_chat.sh --image ./test_images/lesion.jpg
 ```
+API service:
+```bash
+bash inference/int4_quantized/run_api.sh
+```
+Default API port: `5901`
+The INT4 path uses:
+- `bitsandbytes` 4-bit quantization
+- `attn_implementation="sdpa"`
+- the adapter-aware quantized model implementation in `inference/int4_quantized/`
+## GPU Selection
+You do not need to add `CUDA_VISIBLE_DEVICES=0` if the machine has only one visible GPU or if you are fine with the default CUDA device.
+Use it only when you want to pin the process to a specific GPU, for example on a multi-GPU server:
 ```bash
+CUDA_VISIBLE_DEVICES=0 bash inference/int4_quantized/run_infer.sh --image_path ./test_images/lesion.jpg
 ```
+The same pattern also works for:
+- `inference/full_precision/run_infer.sh`
+- `inference/full_precision/run_chat.sh`
+- `inference/full_precision/run_api.sh`
+- `inference/int4_quantized/run_chat.sh`
+- `inference/int4_quantized/run_api.sh`
+## API Endpoints
+Both API services expose the same endpoints:
+- `POST /v1/upload/{state_id}`
+- `POST /v1/predict/{state_id}`
+- `POST /v1/reset/{state_id}`
+- `POST /diagnose/stream`
+- `GET /health`
+## Which One To Use
+- Use `full_precision` when you want the original model path and best fidelity.
+- Use `int4_quantized` when GPU memory is tight or when you are on an environment where `flash-attn` is not the practical option.

inference/.ipynb_checkpoints/deepseek_service-checkpoint.py DELETED Viewed

@@ -1,384 +0,0 @@
-"""
-DeepSeek API Service
-Used to optimize and organize SkinGPT model output results
-"""
-import os
-import re
-from typing import Optional
-from openai import AsyncOpenAI
-class DeepSeekService:
-    """DeepSeek API Service Class"""
-    def __init__(self, api_key: Optional[str] = None):
-        """
-        Initialize DeepSeek service
-        Parameters:
-            api_key: DeepSeek API key, reads from environment variable if not provided
-        """
-        self.api_key = api_key or os.environ.get("DEEPSEEK_API_KEY")
-        self.base_url = "https://api.deepseek.com"
-        self.model = "deepseek-chat"  # Using deepseek-chat model
-        self.client = None
-        self.is_loaded = False
-        print(f"DeepSeek API service initializing...")
-        print(f"API Base URL: {self.base_url}")
-    async def load(self):
-        """Initialize DeepSeek API client"""
-        try:
-            if not self.api_key:
-                print("DeepSeek API key not provided")
-                self.is_loaded = False
-                return
-            # Initialize OpenAI compatible client
-            self.client = AsyncOpenAI(
-                api_key=self.api_key,
-                base_url=self.base_url
-            )
-            self.is_loaded = True
-            print("DeepSeek API service is ready!")
-        except Exception as e:
-            print(f"DeepSeek API service initialization failed: {e}")
-            self.is_loaded = False
-    async def refine_diagnosis(
-        self,
-        raw_answer: str,
-        raw_thinking: Optional[str] = None,
-        language: str = "zh"
-    ) -> dict:
-        """
-        Use DeepSeek API to optimize and organize diagnosis results
-        Parameters:
-            raw_answer: Original diagnosis result
-            raw_thinking: AI thinking process
-            language: Language option
-        Returns:
-            Dictionary containing "description", "analysis_process" and "diagnosis_result"
-        """
-        if not self.is_loaded or self.client is None:
-            error_msg = "API not initialized, cannot generate analysis" if language == "en" else "API未初始化，无法生成分析过程"
-            print("DeepSeek API not initialized, returning original result")
-            return {
-                "success": False,
-                "description": "",
-                "analysis_process": raw_thinking or error_msg,
-                "diagnosis_result": raw_answer,
-                "original_diagnosis": raw_answer,
-                "error": "DeepSeek API not initialized"
-            }
-        try:
-            # Build prompt
-            prompt = self._build_refine_prompt(raw_answer, raw_thinking, language)
-            # Select system prompt based on language
-            if language == "en":
-                system_content = "You are a professional medical text editor. Your task is to polish and organize medical diagnostic text to make it flow smoothly while preserving the original meaning. Output ONLY the formatted result. Do NOT add any explanations, comments, or thoughts. Just follow the format exactly."
-            else:
-                system_content = "你是医学文本整理专家，按照用户要求将用户输入的文本整理成用户想要的格式，不要改写或总结。"
-            # Call DeepSeek API
-            response = await self.client.chat.completions.create(
-                model=self.model,
-                messages=[
-                    {"role": "system", "content": system_content},
-                    {"role": "user", "content": prompt}
-                ],
-                temperature=0.1,
-                max_tokens=2048,
-                top_p=0.8,
-            )
-            # Extract generated text
-            generated_text = response.choices[0].message.content
-            # Parse output
-            parsed = self._parse_refined_output(generated_text, raw_answer, raw_thinking, language)
-            return {
-                "success": True,
-                "description": parsed["description"],
-                "analysis_process": parsed["analysis_process"],
-                "diagnosis_result": parsed["diagnosis_result"],
-                "original_diagnosis": raw_answer,
-                "raw_refined": generated_text
-            }
-        except Exception as e:
-            print(f"DeepSeek API call failed: {e}")
-            error_msg = "API call failed, cannot generate analysis" if language == "en" else "API调用失败，无法生成分析过程"
-            return {
-                "success": False,
-                "description": "",
-                "analysis_process": raw_thinking or error_msg,
-                "diagnosis_result": raw_answer,
-                "original_diagnosis": raw_answer,
-                "error": str(e)
-            }
-    def _build_refine_prompt(self, raw_answer: str, raw_thinking: Optional[str] = None, language: str = "zh") -> str:
-        """
-        Build optimization prompt
-        Parameters:
-            raw_answer: Original diagnosis result
-            raw_thinking: AI thinking process
-            language: Language option, "zh" for Chinese, "en" for English
-        Returns:
-            Built prompt
-        """
-        if language == "en":
-            # English prompt - organize and polish while preserving meaning
-            thinking_text = raw_thinking if raw_thinking else "No analysis process available."
-            prompt = f"""You are a text organization expert. There are two texts that need to be organized. Text 1 is the thinking process of the SkinGPT model, and Text 2 is the diagnosis result given by SkinGPT.
-【Requirements】
-- Preserve the original tone and expression style
-- Text 1 contains the thinking process, Text 2 contains the diagnosis result
-- Extract the image observation part from the thinking process as Description. This should include all factual observations about what was seen in the image, not just a brief summary.
-- For Diagnostic Reasoning: refine and condense the remaining thinking content. Remove redundancies, self-doubt, circular reasoning, and unnecessary repetition. Keep it concise and not too long. Keep the logical chain clear and enhance readability. IMPORTANT: DO NOT include any image description or visual observations in Diagnostic Reasoning. Only include reasoning, analysis, and diagnostic thought process.
-- If [Text 1] content is NOT: No analysis process available. Then organize [Text 1] content accordingly, DO NOT confuse [Text 1] and [Text 2]
-- If [Text 1] content IS: No analysis process available. Then extract the analysis process and description from [Text 2]
-- DO NOT infer or add new medical information, DO NOT output any meta-commentary
-- You may adjust unreasonable statements or remove redundant content to improve clarity
-[Text 1]
-{thinking_text}
-[Text 2]
-{raw_answer}
-【Output】Only output three sections, do not output anything else:
-## Description
-(Extract all image observation content from the thinking process - include all factual descriptions of what was seen)
-## Analysis Process
-(Refined and condensed diagnostic reasoning: remove self-doubt, circular logic, and redundancies. Keep it concise and not too long. Keep logical flow clear. Do NOT include image observations)
-## Diagnosis Result
-(The organized diagnosis result from Text 2)
-【Example】:
-## Description
-The image shows red inflamed patches on the skin with pustules and darker colored spots. The lesions appear as papules and pustules distributed across the affected area, with some showing signs of inflammation and possible post-inflammatory hyperpigmentation.
-## Analysis Process
-These findings are consistent with acne vulgaris, commonly seen during adolescence. The user's age aligns with typical onset for this condition. Treatment recommendations: over-the-counter medications such as benzoyl peroxide or topical antibiotics, avoiding picking at the skin, and consulting a dermatologist if severe. The goal is to control inflammation and prevent scarring.
-## Diagnosis Result
-Possible diagnosis: Acne (pimples) Explanation: Acne is a common skin condition, especially during adolescence, when hormonal changes cause overactive sebaceous glands, which can easily clog pores and form acne. Pathological care recommendations: 1. Keep face clean, wash face 2-3 times daily, use gentle cleansing products. 2. Avoid squeezing acne with hands to prevent worsening inflammation or leaving scars. 3. Avoid using irritating cosmetics and skincare products. 4. Can use topical medications containing salicylic acid, benzoyl peroxide, etc. 5. If necessary, can use oral antibiotics or other treatment methods under doctor's guidance. Precautions: 1. Avoid rubbing or damaging the affected area to prevent infection. 2. Eat less oily and spicy foods, eat more vegetables and fruits. 3. Maintain good rest habits, avoid staying up late. 4. If acne symptoms persist without improvement or show signs of worsening, seek medical attention promptly.
-"""
-        else:
-            # Chinese prompt - translate to Simplified Chinese AND organize/polish
-            thinking_text = raw_thinking if raw_thinking else "No analysis process available."
-            prompt = f"""你是一个文本整理专家。有两段文本需要整理，文本1是SkinGPT模型的思考过程的文本，文本2是SkinGPT给出的诊断结果的文本。
-【要求】
-- 保留原文的语气和表达方式
-- 文本1是思考过程，文本2是诊断结果
-- 从思考过程中提取图像观察部分作为图像描述。需要包含所有关于图片中观察到的事实内容，不要简化或缩短。
-- 对于分析过程：提炼并精简剩余的思考内容，去除冗余、自我怀疑、兜圈子的内容。保持简洁，不要太长。保持逻辑链条清晰，增强可读性。重要：分析过程中不���包含任何图像描述或视觉观察内容，只包含推理、分析和诊断思考过程。
-- 如果【文本1】内容不是：No analysis process available.那么按要求整理【文本1】的内容，不要混淆【文本1】和【文本2】。
-- 如果【文本1】内容是：No analysis process available.那么从【文本2】提炼分析过程和描述。
-- 【文本1】和【文本2】需要翻译成简体中文
-- 禁止推断或添加新的医学信息，禁止输出任何元评论
-- 可以调整不合理的语句或去除冗余内容以提高清晰度
-【文本1】
-{thinking_text}
-【文本2】
-{raw_answer}
-【输出】只输出三个部分，不要输出其他任何内容：
-## 图像描述
-（从思考过程中提取所有图像观察内容，包含所有关于图片的事实描述）
-## 分析过程
-（提炼并精简后的诊断推理：去除自我怀疑、兜圈逻辑和冗余内容。保持简洁，不要太长。保持逻辑流畅。不包含图像观察）
-## 诊断结果
-（整理后的诊断结果）
-【样例】:
-## 图像描述
-图片显示皮肤上有红色发炎的斑块，伴有脓疱和颜色较深的斑点。病变表现为分布在受影响区域的丘疹和脓疱，部分显示出炎症迹象和可能的炎症后色素沉着。
-## 分析过程
-这些表现符合寻常痤疮的特征，青春期常见。用户的年龄与该病症的典型发病年龄相符。治疗建议：使用非处方药物如过氧化苯甲酰或外用抗生素，避免抠抓皮肤，病情严重时咨询皮肤科医生。目标是控制炎症并防止疤痕形成。
-## 诊断结果
-可能的诊断：痤疮（青春痘） 解释：痤疮是一种常见的皮肤病，特别是在青少年期间，由于激素水平的变化导致皮脂腺过度活跃，容易堵塞毛孔，形成痤疮。 病理护理建议：1.保持面部清洁，每天洗脸2-3次，使用温和的洁面产品。 2.避免用手挤压痤疮，以免加重炎症或留下疤痕。 3.避免使用刺激性的化妆品和护肤品。 4.可以使用含有水杨酸、苯氧醇等成分的外用药物治疗。 5.如有需要，可以在医生指导下使用抗生素口服药或其他治疗方法。 注意事项：1. 避免摩擦或损伤患处，以免引起感染。 2. 饮食上应少吃油腻、辛辣食物，多吃蔬菜水果。 3. 保持良好的作息习惯，避免熬夜。 4. 如果痤疮症状持续不见好转或有恶化的趋势，应及时就医。
-"""
-        return prompt
-    def _parse_refined_output(
-        self,
-        generated_text: str,
-        raw_answer: str,
-        raw_thinking: Optional[str] = None,
-        language: str = "zh"
-    ) -> dict:
-        """
-        Parse DeepSeek generated output
-        Parameters:
-            generated_text: DeepSeek generated text
-            raw_answer: Original diagnosis (as fallback)
-            raw_thinking: Original thinking process (as fallback)
-            language: Language option
-        Returns:
-            Dictionary containing description, analysis_process and diagnosis_result
-        """
-        description = ""
-        analysis_process = None
-        diagnosis_result = None
-        if language == "en":
-            # English patterns
-            desc_match = re.search(
-                r'##\s*Description\s*\n([\s\S]*?)(?=##\s*Analysis\s*Process|$)',
-                generated_text,
-                re.IGNORECASE
-            )
-            analysis_match = re.search(
-                r'##\s*Analysis\s*Process\s*\n([\s\S]*?)(?=##\s*Diagnosis\s*Result|$)',
-                generated_text,
-                re.IGNORECASE
-            )
-            result_match = re.search(
-                r'##\s*Diagnosis\s*Result\s*\n([\s\S]*?)$',
-                generated_text,
-                re.IGNORECASE
-            )
-            desc_header = "## Description"
-            analysis_header = "## Analysis Process"
-            result_header = "## Diagnosis Result"
-        else:
-            # Chinese patterns
-            desc_match = re.search(
-                r'##\s*图像描述\s*\n([\s\S]*?)(?=##\s*分析过程|$)',
-                generated_text
-            )
-            analysis_match = re.search(
-                r'##\s*分析过程\s*\n([\s\S]*?)(?=##\s*诊断结果|$)',
-                generated_text
-            )
-            result_match = re.search(
-                r'##\s*诊断结果\s*\n([\s\S]*?)$',
-                generated_text
-            )
-            desc_header = "## 图像描述"
-            analysis_header = "## 分析过程"
-            result_header = "## 诊断结果"
-        # Extract description
-        if desc_match:
-            description = desc_match.group(1).strip()
-            print(f"Successfully parsed description")
-        else:
-            print(f"Description parsing failed")
-            description = ""
-        # Extract analysis process
-        if analysis_match:
-            analysis_process = analysis_match.group(1).strip()
-            print(f"Successfully parsed analysis process")
-        else:
-            print(f"Analysis process parsing failed, trying other methods")
-            # Try to extract from generated text
-            result_pos = generated_text.find(result_header)
-            if result_pos > 0:
-                # Get content before diagnosis result
-                analysis_process = generated_text[:result_pos].strip()
-                # Remove possible headers
-                for header in [desc_header, analysis_header]:
-                    header_escaped = re.escape(header)
-                    analysis_process = re.sub(f'{header_escaped}\\s*\\n?', '', analysis_process).strip()
-            else:
-                # If no format at all, try to get first half
-                mid_point = len(generated_text) // 2
-                analysis_process = generated_text[:mid_point].strip()
-            # If still empty, use original content (final fallback)
-            if not analysis_process and raw_thinking:
-                print(f"Using original raw_thinking as fallback")
-                analysis_process = raw_thinking
-        # Extract diagnosis result
-        if result_match:
-            diagnosis_result = result_match.group(1).strip()
-            print(f"Successfully parsed diagnosis result")
-        else:
-            print(f"Diagnosis result parsing failed, trying other methods")
-            # Try to extract from generated text
-            result_pos = generated_text.find(result_header)
-            if result_pos > 0:
-                diagnosis_result = generated_text[result_pos:].strip()
-                # Remove possible header
-                result_header_escaped = re.escape(result_header)
-                diagnosis_result = re.sub(f'^{result_header_escaped}\\s*\\n?', '', diagnosis_result).strip()
-            else:
-                # If no format at all, get second half
-                mid_point = len(generated_text) // 2
-                diagnosis_result = generated_text[mid_point:].strip()
-            # If still empty, use original content (final fallback)
-            if not diagnosis_result:
-                print(f"Using original raw_answer as fallback")
-                diagnosis_result = raw_answer
-        return {
-            "description": description,
-            "analysis_process": analysis_process,
-            "diagnosis_result": diagnosis_result
-        }
-# Global DeepSeek service instance (lazy loading)
-_deepseek_service: Optional[DeepSeekService] = None
-async def get_deepseek_service(api_key: Optional[str] = None) -> Optional[DeepSeekService]:
-    """
-    Get DeepSeek service instance (singleton pattern)
-    Parameters:
-        api_key: Optional API key to use
-    Returns:
-        DeepSeekService instance, or None if API initialization fails
-    """
-    global _deepseek_service
-    if _deepseek_service is None:
-        try:
-            _deepseek_service = DeepSeekService(api_key=api_key)
-            await _deepseek_service.load()
-            if not _deepseek_service.is_loaded:
-                print("DeepSeek API service initialization failed, will use fallback mode")
-                return _deepseek_service  # Return instance but marked as not loaded
-        except Exception as e:
-            print(f"DeepSeek service initialization failed: {e}")
-            return None
-    return _deepseek_service

inference/.ipynb_checkpoints/demo-checkpoint.py DELETED Viewed

@@ -1,76 +0,0 @@
-import torch
-from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
-from qwen_vl_utils import process_vision_info
-from PIL import Image
-# === Configuration ===
-MODEL_PATH = "../checkpoint"
-IMAGE_PATH = "test_image.jpg" # Please replace with your actual image path
-PROMPT = "You are a professional AI dermatology assistant. Please analyze this skin image and provide a diagnosis."
-def main():
-    print(f"Loading model from {MODEL_PATH}...")
-    # 1. Load Model
-    try:
-        model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-            MODEL_PATH,
-            torch_dtype=torch.bfloat16,
-            device_map="auto",
-            trust_remote_code=True
-        )
-        processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
-    except Exception as e:
-        print(f"Error loading model: {e}")
-        return
-    # 2. Check Image
-    import os
-    if not os.path.exists(IMAGE_PATH):
-        print(f"Warning: Image not found at '{IMAGE_PATH}'. Please edit IMAGE_PATH in demo.py")
-        # Create a dummy image for code demonstration purposes if needed, or just return
-        return
-    # 3. Prepare Inputs
-    messages = [
-        {
-            "role": "user",
-            "content": [
-                {"type": "image", "image": IMAGE_PATH},
-                {"type": "text", "text": PROMPT},
-            ],
-        }
-    ]
-    print("Processing...")
-    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-    image_inputs, video_inputs = process_vision_info(messages)
-    inputs = processor(
-        text=[text],
-        images=image_inputs,
-        videos=video_inputs,
-        padding=True,
-        return_tensors="pt",
-    ).to(model.device)
-    # 4. Generate
-    with torch.no_grad():
-        generated_ids = model.generate(
-            **inputs,
-            max_new_tokens=1024,
-            temperature=0.7,
-            top_p=0.9
-        )
-    # 5. Decode
-    output_text = processor.batch_decode(
-        generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
-    )
-    print("\n=== Diagnosis Result ===")
-    print(output_text[0])
-    print("========================")
-if __name__ == "__main__":
-    main()

inference/.ipynb_checkpoints/inference-checkpoint.py DELETED Viewed

@@ -1,43 +0,0 @@
-import argparse
-from model_utils import SkinGPTModel
-import os
-def main():
-    parser = argparse.ArgumentParser(description="SkinGPT-R1 Single Inference")
-    parser.add_argument("--image", type=str, required=True, help="Path to the image")
-    parser.add_argument("--model_path", type=str, default="../checkpoint")
-    parser.add_argument("--prompt", type=str, default="Please analyze this skin image and provide a diagnosis.")
-    args = parser.parse_args()
-    if not os.path.exists(args.image):
-        print(f"Error: Image not found at {args.image}")
-        return
-    # 1. 加载模型 (复用 model_utils)
-    # 这样你就不用在这里重复写 transformers 的加载代码了
-    bot = SkinGPTModel(args.model_path)
-    # 2. 构造单轮消息
-    system_prompt = "You are a professional AI dermatology assistant."
-    messages = [
-        {
-            "role": "user",
-            "content": [
-                {"type": "image", "image": args.image},
-                {"type": "text", "text": f"{system_prompt}\n\n{args.prompt}"}
-            ]
-        }
-    ]
-    # 3. 推理
-    print(f"\nAnalyzing {args.image}...")
-    response = bot.generate_response(messages)
-    print("-" * 40)
-    print("Result:")
-    print(response)
-    print("-" * 40)
-if __name__ == "__main__":
-    main()

inference/.ipynb_checkpoints/model_utils-checkpoint.py DELETED Viewed

@@ -1,120 +0,0 @@
-# model_utils.py
-import torch
-from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, TextIteratorStreamer
-from qwen_vl_utils import process_vision_info
-from PIL import Image
-import os
-from threading import Thread
-class SkinGPTModel:
-    def __init__(self, model_path, device=None):
-        self.model_path = model_path
-        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
-        print(f"Loading model from {model_path} on {self.device}...")
-        self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-            model_path,
-            torch_dtype=torch.bfloat16 if self.device != "cpu" else torch.float32,
-            attn_implementation="flash_attention_2" if self.device == "cuda" else None,
-            device_map="auto" if self.device != "mps" else None,
-            trust_remote_code=True
-        )
-        if self.device == "mps":
-            self.model = self.model.to(self.device)
-        self.processor = AutoProcessor.from_pretrained(
-            model_path,
-            trust_remote_code=True,
-            min_pixels=256*28*28,
-            max_pixels=1280*28*28
-        )
-        print("Model loaded successfully.")
-    def generate_response(self, messages, max_new_tokens=1024, temperature=0.7):
-        """
-        处理多轮对话的历史消息列表并生成回复
-        messages format:
-        [
-            {'role': 'user', 'content': [{'type': 'image', 'image': 'path...'}, {'type': 'text', 'text': '...'}]},
-            {'role': 'assistant', 'content': [{'type': 'text', 'text': '...'}]}
-        ]
-        """
-        # 预处理文本模板
-        text = self.processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-        # 预处理视觉信息
-        image_inputs, video_inputs = process_vision_info(messages)
-        inputs = self.processor(
-            text=[text],
-            images=image_inputs,
-            videos=video_inputs,
-            padding=True,
-            return_tensors="pt",
-        ).to(self.model.device)
-        with torch.no_grad():
-            generated_ids = self.model.generate(
-                **inputs,
-                max_new_tokens=max_new_tokens,
-                temperature=temperature,
-                top_p=0.9,
-                do_sample=True
-            )
-        # 解码输出 (去除输入的token)
-        generated_ids_trimmed = [
-            out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
-        ]
-        output_text = self.processor.batch_decode(
-            generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
-        )
-        return output_text[0]
-    def generate_response_stream(self, messages, max_new_tokens=2048, temperature=0.7):
-        """
-        流式生成响应
-        返回一个生成器，逐个yield生成的文本chunk
-        """
-        # 预处理文本模板
-        text = self.processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-        # 预处理视觉信息
-        image_inputs, video_inputs = process_vision_info(messages)
-        inputs = self.processor(
-            text=[text],
-            images=image_inputs,
-            videos=video_inputs,
-            padding=True,
-            return_tensors="pt",
-        ).to(self.model.device)
-        # 创建 TextIteratorStreamer 用于流式输出
-        streamer = TextIteratorStreamer(
-            self.processor.tokenizer,
-            skip_prompt=True,
-            skip_special_tokens=True
-        )
-        # 准备生成参数
-        generation_kwargs = {
-            **inputs,
-            "max_new_tokens": max_new_tokens,
-            "temperature": temperature,
-            "top_p": 0.9,
-            "do_sample": True,
-            "streamer": streamer,
-        }
-        # 在单独的线程中运行生成
-        thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
-        thread.start()
-        # 逐个yield生成的文本
-        for text_chunk in streamer:
-            yield text_chunk
-        thread.join()

inference/README.md ADDED Viewed

	@@ -0,0 +1,11 @@

+# Inference
+Two runtime tracks are provided:
+- `full_precision/`: single-image inference, multi-turn chat, and FastAPI service
+- `int4_quantized/`: single-image inference, multi-turn chat, and FastAPI service for the INT4 path
+Checkpoint paths:
+- `./checkpoints/full_precision`
+- `./checkpoints/int4`

inference/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Inference entrypoints for SkinGPT-R1."""

inference/__pycache__/app.cpython-311.pyc DELETED Viewed

Binary file (17.8 kB)

inference/__pycache__/deepseek_service.cpython-311.pyc DELETED Viewed

Binary file (18.3 kB)

inference/__pycache__/model_utils.cpython-311.pyc DELETED Viewed

Binary file (5.39 kB)

inference/demo.py DELETED Viewed

@@ -1,79 +0,0 @@
-import torch
-from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
-from qwen_vl_utils import process_vision_info
-from PIL import Image
-# === Configuration ===
-MODEL_PATH = "../checkpoint"
-IMAGE_PATH = "test_image.jpg" # Please replace with your actual image path
-PROMPT = "You are a professional AI dermatology assistant. Please analyze this skin image and provide a diagnosis."
-def main():
-    print(f"Loading model from {MODEL_PATH}...")
-    # 1. Load Model
-    try:
-        model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-            MODEL_PATH,
-            torch_dtype=torch.bfloat16,
-            device_map="auto",
-            trust_remote_code=True
-        )
-        processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
-    except Exception as e:
-        print(f"Error loading model: {e}")
-        return
-    # 2. Check Image
-    import os
-    if not os.path.exists(IMAGE_PATH):
-        print(f"Warning: Image not found at '{IMAGE_PATH}'. Please edit IMAGE_PATH in demo.py")
-        # Create a dummy image for code demonstration purposes if needed, or just return
-        return
-    # 3. Prepare Inputs
-    messages = [
-        {
-            "role": "user",
-            "content": [
-                {"type": "image", "image": IMAGE_PATH},
-                {"type": "text", "text": PROMPT},
-            ],
-        }
-    ]
-    print("Processing...")
-    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-    image_inputs, video_inputs = process_vision_info(messages)
-    inputs = processor(
-        text=[text],
-        images=image_inputs,
-        videos=video_inputs,
-        padding=True,
-        return_tensors="pt",
-    ).to(model.device)
-    # 4. Generate
-    with torch.no_grad():
-        generated_ids = model.generate(
-            **inputs,
-            max_new_tokens=1024,
-            temperature=0.7,
-            repetition_penalty=1.2,
-            no_repeat_ngram_size=3,
-            top_p=0.9,
-            do_sample=True
-        )
-    # 5. Decode
-    output_text = processor.batch_decode(
-        generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
-    )
-    print("\n=== Diagnosis Result ===")
-    print(output_text[0])
-    print("========================")
-if __name__ == "__main__":
-    main()

inference/full_precision/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Full-precision inference package for SkinGPT-R1."""

inference/full_precision/__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (17.4 kB). View file

inference/full_precision/__pycache__/chat.cpython-311.pyc ADDED Viewed

Binary file (3.58 kB). View file

inference/full_precision/__pycache__/deepseek_service.cpython-311.pyc ADDED Viewed

Binary file (12.7 kB). View file

inference/full_precision/__pycache__/demo.cpython-311.pyc ADDED Viewed

Binary file (1.8 kB). View file

inference/full_precision/__pycache__/infer.cpython-311.pyc ADDED Viewed

Binary file (2.63 kB). View file

inference/full_precision/__pycache__/model_utils.cpython-311.pyc ADDED Viewed

Binary file (7.37 kB). View file

inference/{app.py → full_precision/app.py} RENAMED Viewed

@@ -1,133 +1,81 @@
-# app.py
-import uvicorn
 import os
 import shutil
 import uuid
-import json
-import re
-import asyncio
-from typing import Optional
-from io import BytesIO
 from contextlib import asynccontextmanager
-from PIL import Image
-from fastapi import FastAPI, UploadFile, File, Form, HTTPException, Request
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import StreamingResponse
-from fastapi.concurrency import run_in_threadpool
-from model_utils import SkinGPTModel
-from deepseek_service import get_deepseek_service, DeepSeekService
-# === Configuration ===
-MODEL_PATH = "../checkpoint"
-TEMP_DIR = "./temp_uploads"
-os.makedirs(TEMP_DIR, exist_ok=True)
-# DeepSeek API Key
-DEEPSEEK_API_KEY = os.environ.get("DEEPSEEK_API_KEY", "sk-b221f29be052460f9e0fe12d88dd343c")
-# Global DeepSeek service instance
 deepseek_service: Optional[DeepSeekService] = None
-@asynccontextmanager
-async def lifespan(app: FastAPI):
-    """应用生命周期管理"""
-    # 启动时初始化 DeepSeek 服务
-    await init_deepseek()
-    yield
-    print("\nShutting down service...")
-app = FastAPI(
-    title="SkinGPT-R1 皮肤诊断系统",
-    description="智能皮肤诊断助手",
-    version="1.0.0",
-    lifespan=lifespan
-)
-# CORS配置 - 允许前端访问
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["http://localhost:3000", "http://localhost:5173", "http://127.0.0.1:5173", "*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-# 全局变量存储状态
-# chat_states: 存储对话历史 (List of messages for Qwen)
-# pending_images: 存储已上传但尚未发送给LLM的图片路径 (State ID -> Image Path)
-chat_states = {}
-pending_images = {}
-def parse_diagnosis_result(raw_text: str) -> dict:
-    """
-    解析诊断结果中的think和answer标签
-    参数:
-    - raw_text: 原始诊断文本
-    返回:
-    - dict: 包含thinking, answer, raw字段的字典
-    """
-    import re
-    # 尝试匹配完整的标签
-    think_match = re.search(r'<think>([\s\S]*?)</think>', raw_text)
-    answer_match = re.search(r'<answer>([\s\S]*?)</answer>', raw_text)
-    thinking = None
-    answer = None
-    # 处理think标签
-    if think_match:
-        thinking = think_match.group(1).strip()
-    else:
-        # 尝试匹配未闭合的think标签（输出被截断的情况）
-        unclosed_think = re.search(r'<think>([\s\S]*?)(?=<answer>|$)', raw_text)
         if unclosed_think:
             thinking = unclosed_think.group(1).strip()
-    # 处理answer标签
-    if answer_match:
-        answer = answer_match.group(1).strip()
-    else:
-        # 尝试匹配未闭合的answer标签
-        unclosed_answer = re.search(r'<answer>([\s\S]*?)$', raw_text)
         if unclosed_answer:
             answer = unclosed_answer.group(1).strip()
-    # 如果仍然没有找到answer，清理原始文本作为answer
     if not answer:
-        # 移除所有标签及其内容
-        cleaned = re.sub(r'<think>[\s\S]*?</think>', '', raw_text)
-        cleaned = re.sub(r'<think>[\s\S]*', '', cleaned)  # 移除未闭合的think
-        cleaned = re.sub(r'</?answer>', '', cleaned)  # 移除answer标签
-        cleaned = cleaned.strip()
-        answer = cleaned if cleaned else raw_text
-    # 清理可能残留的标签
-    if answer:
-        answer = re.sub(r'</?think>|</?answer>', '', answer).strip()
-    if thinking:
-        thinking = re.sub(r'</?think>|</?answer>', '', thinking).strip()
-    # 处理 "Final Answer:" 格式，提取其后的内容
     if answer:
-        final_answer_match = re.search(r'Final Answer:\s*([\s\S]*)', answer, re.IGNORECASE)
         if final_answer_match:
             answer = final_answer_match.group(1).strip()
-    return {
-        "thinking": thinking if thinking else None,
-        "answer": answer,
-        "raw": raw_text
-    }
 print("Initializing Model Service...")
-# 全局加载模型
 gpt_model = SkinGPTModel(MODEL_PATH)
 print("Service Ready.")
-# 初始化 DeepSeek 服务（异步）
 async def init_deepseek():
     global deepseek_service
     print("\nInitializing DeepSeek service...")
@@ -137,120 +85,116 @@ async def init_deepseek():
     else:
         print("DeepSeek service not available, will return raw results")
 @app.post("/v1/upload/{state_id}")
 async def upload_file(state_id: str, file: UploadFile = File(...), survey: str = Form(None)):
-    """
-    接收图片上传。
-    逻辑：将图片保存到本地临时目录，并标记该 state_id 有一张待处理图片。
-    """
     try:
-        # 1. 保存图片到本地临时文件
         file_extension = file.filename.split(".")[-1] if "." in file.filename else "jpg"
         unique_name = f"{state_id}_{uuid.uuid4().hex}.{file_extension}"
-        file_path = os.path.join(TEMP_DIR, unique_name)
-        with open(file_path, "wb") as buffer:
             shutil.copyfileobj(file.file, buffer)
-        # 2. 记录图片路径等待下一次 predict 调用时使用
-        # 如果是多图模式，这里可以改成 list，目前演示单图覆盖或更新
-        pending_images[state_id] = file_path
-        # 3. 初始化对话状态（如果是新会话）
         if state_id not in chat_states:
             chat_states[state_id] = []
-        return {"message": "Image uploaded successfully", "path": file_path}
-    except Exception as e:
-        raise HTTPException(status_code=500, detail=f"Upload failed: {str(e)}")
 @app.post("/v1/predict/{state_id}")
 async def v1_predict(request: Request, state_id: str):
-    """
-    接收文本并执行推理。
-    逻辑：检查是否有待处理图片。如果有，将其与文本组合成 multimodal 消息。
-    """
     try:
         data = await request.json()
-    except:
-        raise HTTPException(status_code=400, detail="Invalid JSON")
     user_message = data.get("message", "")
     if not user_message:
         raise HTTPException(status_code=400, detail="Missing 'message' field")
-    # 获取或初始化历史
     history = chat_states.get(state_id, [])
-    # 构建当前轮次的用户内容
     current_content = []
-    # 1. 检查是否有刚刚上传的图片
     if state_id in pending_images:
-        img_path = pending_images.pop(state_id) # 取出并移除
         current_content.append({"type": "image", "image": img_path})
-        # 如果是第一次对话，加上 System Prompt
         if not history:
-             system_prompt = "You are a professional AI dermatology assistant. "
-             user_message = f"{system_prompt}\n\n{user_message}"
-    # 2. 添加文本
     current_content.append({"type": "text", "text": user_message})
-    # 3. 更新历史
     history.append({"role": "user", "content": current_content})
     chat_states[state_id] = history
-    # 4. 运行推理 (在线程池中运行以防阻塞)
     try:
-        response_text = await run_in_threadpool(
-            gpt_model.generate_response,
-            messages=history
-        )
-    except Exception as e:
-        # 回滚历史（移除刚才出错的用户提问）
         chat_states[state_id].pop()
-        raise HTTPException(status_code=500, detail=f"Inference error: {str(e)}")
-    # 5. 将回复加入历史
     history.append({"role": "assistant", "content": [{"type": "text", "text": response_text}]})
     chat_states[state_id] = history
     return {"message": response_text}
 @app.post("/v1/reset/{state_id}")
 async def reset_chat(state_id: str):
-    """清除会话状态"""
     if state_id in chat_states:
         del chat_states[state_id]
     if state_id in pending_images:
-        # 可选：删除临时文件
         try:
-            os.remove(pending_images[state_id])
-        except:
             pass
         del pending_images[state_id]
     return {"message": "Chat history reset"}
 @app.get("/")
 async def root():
-    """根路径"""
     return {
-        "name": "SkinGPT-R1 皮肤诊断系统",
-        "version": "1.0.0",
         "status": "running",
-        "description": "智能皮肤诊断助手"
     }
 @app.get("/health")
 async def health_check():
-    """健康检查"""
-    return {
-        "status": "healthy",
-        "model_loaded": True
-    }
 @app.post("/diagnose/stream")
 async def diagnose_stream(
@@ -258,126 +202,89 @@ async def diagnose_stream(
     text: str = Form(...),
     language: str = Form("zh"),
 ):
-    """
-    SSE流式诊断接口（用于前端）
-    支持图片上传和文本输入，返回真正的流式响应
-    使用 DeepSeek API 优化输出格式
-    """
-    from queue import Queue, Empty
-    from threading import Thread
     language = language if language in ("zh", "en") else "zh"
-    # 处理图片
     pil_image = None
-    temp_image_path = None
     if image:
         contents = await image.read()
         pil_image = Image.open(BytesIO(contents)).convert("RGB")
-    # 创建队列用于线程间通信
     result_queue = Queue()
-    # 用于存储完整响应和解析结果
     generation_result = {"full_response": [], "parsed": None, "temp_image_path": None}
     def run_generation():
-        """在后台线程中运行流式生成"""
         full_response = []
         try:
-            # 构建消息
             messages = []
             current_content = []
-            # 添加系统提示
-            system_prompt = "You are a professional AI dermatology assistant." if language == "en" else "你是一个专业的AI皮肤科助手。"
-            # 如果有图片，保存到临时文件
             if pil_image:
-                generation_result["temp_image_path"] = os.path.join(TEMP_DIR, f"temp_{uuid.uuid4().hex}.jpg")
-                pil_image.save(generation_result["temp_image_path"])
-                current_content.append({"type": "image", "image": generation_result["temp_image_path"]})
-            # 添加文本
-            prompt = f"{system_prompt}\n\n{text}"
-            current_content.append({"type": "text", "text": prompt})
             messages.append({"role": "user", "content": current_content})
-            # 流式生成 - 每个 chunk 立即放入队列
             for chunk in gpt_model.generate_response_stream(
                 messages=messages,
                 max_new_tokens=2048,
-                temperature=0.7
             ):
                 full_response.append(chunk)
                 result_queue.put(("delta", chunk))
-            # 解析结果
             response_text = "".join(full_response)
-            parsed = parse_diagnosis_result(response_text)
             generation_result["full_response"] = full_response
-            generation_result["parsed"] = parsed
-            # 标记生成完成
             result_queue.put(("generation_done", None))
-        except Exception as e:
-            result_queue.put(("error", str(e)))
     async def event_generator():
-        """异步生成SSE事件"""
-        # 在后台线程启动生成（非阻塞）
         gen_thread = Thread(target=run_generation)
         gen_thread.start()
         loop = asyncio.get_event_loop()
-        # 从队列中读取并发送流式内容
         while True:
             try:
-                # 非阻塞获取
                 msg_type, data = await loop.run_in_executor(
-                    None,
-                    lambda: result_queue.get(timeout=0.1)
                 )
                 if msg_type == "generation_done":
-                    # 流式生成完成，准备处理最终结果
                     break
-                elif msg_type == "delta":
-                    yield_chunk = json.dumps({"type": "delta", "text": data}, ensure_ascii=False)
-                    yield f"data: {yield_chunk}\n\n"
                 elif msg_type == "error":
                     yield f"data: {json.dumps({'type': 'error', 'message': data}, ensure_ascii=False)}\n\n"
                     gen_thread.join()
                     return
             except Empty:
-                # 队列暂时为空，继续等待
                 await asyncio.sleep(0.01)
-                continue
         gen_thread.join()
-        # 获取解析结果
         parsed = generation_result["parsed"]
         if not parsed:
-            yield f"data: {json.dumps({'type': 'error', 'message': 'Failed to parse response'}, ensure_ascii=False)}\n\n"
             return
         raw_thinking = parsed["thinking"]
         raw_answer = parsed["answer"]
-        # 使用 DeepSeek 优化结果
         refined_by_deepseek = False
         description = None
         thinking = raw_thinking
         answer = raw_answer
         if deepseek_service and deepseek_service.is_loaded:
             try:
-                print(f"Calling DeepSeek to refine diagnosis (language={language})...")
                 refined = await deepseek_service.refine_diagnosis(
                     raw_answer=raw_answer,
                     raw_thinking=raw_thinking,
@@ -388,36 +295,35 @@ async def diagnose_stream(
                     thinking = refined["analysis_process"]
                     answer = refined["diagnosis_result"]
                     refined_by_deepseek = True
-                    print(f"DeepSeek refinement completed successfully")
-            except Exception as e:
-                print(f"DeepSeek refinement failed, using original: {e}")
         else:
             print("DeepSeek service not available, using raw results")
-        success_msg = "Diagnosis completed" if language == "en" else "诊断完成"
-        # 返回格式与参考项目保持一致
         final_payload = {
-            "description": description,              # 图片描述（从 thinking 中提取）
-            "thinking": thinking,                    # 分析过程（DeepSeek 优化后）
-            "answer": answer,                        # 诊断结果（DeepSeek 优化后）
-            "raw": parsed["raw"],                    # 原始响应
-            "refined_by_deepseek": refined_by_deepseek,  # 是否被 DeepSeek 优化
             "success": True,
-            "message": success_msg
         }
-        yield_final = json.dumps({"type": "final", "result": final_payload}, ensure_ascii=False)
-        yield f"data: {yield_final}\n\n"
-        # 清理临时图片
         temp_path = generation_result.get("temp_image_path")
-        if temp_path and os.path.exists(temp_path):
             try:
-                os.remove(temp_path)
-            except:
                 pass
     return StreamingResponse(event_generator(), media_type="text/event-stream")
-if __name__ == '__main__':
-    uvicorn.run("app:app", host="0.0.0.0", port=5900, reload=False)

+from __future__ import annotations
+import asyncio
+import json
 import os
 import shutil
 import uuid
 from contextlib import asynccontextmanager
+from io import BytesIO
+from pathlib import Path
+from queue import Empty, Queue
+from threading import Thread
+from typing import Optional
+import uvicorn
+from fastapi import FastAPI, File, Form, HTTPException, Request, UploadFile
+from fastapi.concurrency import run_in_threadpool
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import StreamingResponse
+from PIL import Image
+try:
+    from .deepseek_service import DeepSeekService, get_deepseek_service
+    from .model_utils import DEFAULT_MODEL_PATH, SkinGPTModel, resolve_model_path
+except ImportError:
+    from deepseek_service import DeepSeekService, get_deepseek_service
+    from model_utils import DEFAULT_MODEL_PATH, SkinGPTModel, resolve_model_path
+MODEL_PATH = resolve_model_path(DEFAULT_MODEL_PATH)
+TEMP_DIR = Path(__file__).resolve().parents[1] / "temp_uploads"
+TEMP_DIR.mkdir(parents=True, exist_ok=True)
+DEEPSEEK_API_KEY = os.environ.get("DEEPSEEK_API_KEY")
 deepseek_service: Optional[DeepSeekService] = None
+def parse_diagnosis_result(raw_text: str) -> dict:
+    import re
+    think_match = re.search(r"<think>([\s\S]*?)</think>", raw_text)
+    answer_match = re.search(r"<answer>([\s\S]*?)</answer>", raw_text)
+    thinking = think_match.group(1).strip() if think_match else None
+    answer = answer_match.group(1).strip() if answer_match else None
+    if not thinking:
+        unclosed_think = re.search(r"<think>([\s\S]*?)(?=<answer>|$)", raw_text)
         if unclosed_think:
             thinking = unclosed_think.group(1).strip()
+    if not answer:
+        unclosed_answer = re.search(r"<answer>([\s\S]*?)$", raw_text)
         if unclosed_answer:
             answer = unclosed_answer.group(1).strip()
     if not answer:
+        cleaned = re.sub(r"<think>[\s\S]*?</think>", "", raw_text)
+        cleaned = re.sub(r"<think>[\s\S]*", "", cleaned)
+        cleaned = re.sub(r"</?answer>", "", cleaned)
+        answer = cleaned.strip() or raw_text
     if answer:
+        answer = re.sub(r"</?think>|</?answer>", "", answer).strip()
+        final_answer_match = re.search(r"Final Answer:\s*([\s\S]*)", answer, re.IGNORECASE)
         if final_answer_match:
             answer = final_answer_match.group(1).strip()
+    if thinking:
+        thinking = re.sub(r"</?think>|</?answer>", "", thinking).strip()
+    return {"thinking": thinking or None, "answer": answer, "raw": raw_text}
 print("Initializing Model Service...")
 gpt_model = SkinGPTModel(MODEL_PATH)
 print("Service Ready.")
 async def init_deepseek():
     global deepseek_service
     print("\nInitializing DeepSeek service...")
     else:
         print("DeepSeek service not available, will return raw results")
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    await init_deepseek()
+    yield
+    print("\nShutting down service...")
+app = FastAPI(
+    title="SkinGPT-R1 Full Precision API",
+    description="Full-precision dermatology assistant backend",
+    version="1.1.0",
+    lifespan=lifespan,
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["http://localhost:3000", "http://localhost:5173", "http://127.0.0.1:5173", "*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+chat_states = {}
+pending_images = {}
 @app.post("/v1/upload/{state_id}")
 async def upload_file(state_id: str, file: UploadFile = File(...), survey: str = Form(None)):
+    del survey
     try:
         file_extension = file.filename.split(".")[-1] if "." in file.filename else "jpg"
         unique_name = f"{state_id}_{uuid.uuid4().hex}.{file_extension}"
+        file_path = TEMP_DIR / unique_name
+        with file_path.open("wb") as buffer:
             shutil.copyfileobj(file.file, buffer)
+        pending_images[state_id] = str(file_path)
         if state_id not in chat_states:
             chat_states[state_id] = []
+        return {"message": "Image uploaded successfully", "path": str(file_path)}
+    except Exception as exc:
+        raise HTTPException(status_code=500, detail=f"Upload failed: {exc}") from exc
 @app.post("/v1/predict/{state_id}")
 async def v1_predict(request: Request, state_id: str):
     try:
         data = await request.json()
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail="Invalid JSON") from exc
     user_message = data.get("message", "")
     if not user_message:
         raise HTTPException(status_code=400, detail="Missing 'message' field")
     history = chat_states.get(state_id, [])
     current_content = []
     if state_id in pending_images:
+        img_path = pending_images.pop(state_id)
         current_content.append({"type": "image", "image": img_path})
         if not history:
+            user_message = f"You are a professional AI dermatology assistant.\n\n{user_message}"
     current_content.append({"type": "text", "text": user_message})
     history.append({"role": "user", "content": current_content})
     chat_states[state_id] = history
     try:
+        response_text = await run_in_threadpool(gpt_model.generate_response, messages=history)
+    except Exception as exc:
         chat_states[state_id].pop()
+        raise HTTPException(status_code=500, detail=f"Inference error: {exc}") from exc
     history.append({"role": "assistant", "content": [{"type": "text", "text": response_text}]})
     chat_states[state_id] = history
     return {"message": response_text}
 @app.post("/v1/reset/{state_id}")
 async def reset_chat(state_id: str):
     if state_id in chat_states:
         del chat_states[state_id]
     if state_id in pending_images:
         try:
+            Path(pending_images[state_id]).unlink(missing_ok=True)
+        except Exception:
             pass
         del pending_images[state_id]
     return {"message": "Chat history reset"}
 @app.get("/")
 async def root():
     return {
+        "name": "SkinGPT-R1 Full Precision API",
+        "version": "1.1.0",
         "status": "running",
+        "description": "Full-precision dermatology assistant",
     }
 @app.get("/health")
 async def health_check():
+    return {"status": "healthy", "model_loaded": True}
 @app.post("/diagnose/stream")
 async def diagnose_stream(
     text: str = Form(...),
     language: str = Form("zh"),
 ):
     language = language if language in ("zh", "en") else "zh"
     pil_image = None
     if image:
         contents = await image.read()
         pil_image = Image.open(BytesIO(contents)).convert("RGB")
     result_queue = Queue()
     generation_result = {"full_response": [], "parsed": None, "temp_image_path": None}
     def run_generation():
         full_response = []
         try:
             messages = []
             current_content = []
+            system_prompt = (
+                "You are a professional AI dermatology assistant."
+                if language == "en"
+                else "你是一个专业的AI皮肤科助手。"
+            )
             if pil_image:
+                temp_image_path = TEMP_DIR / f"temp_{uuid.uuid4().hex}.jpg"
+                pil_image.save(temp_image_path)
+                generation_result["temp_image_path"] = str(temp_image_path)
+                current_content.append({"type": "image", "image": str(temp_image_path)})
+            current_content.append({"type": "text", "text": f"{system_prompt}\n\n{text}"})
             messages.append({"role": "user", "content": current_content})
             for chunk in gpt_model.generate_response_stream(
                 messages=messages,
                 max_new_tokens=2048,
+                temperature=0.7,
             ):
                 full_response.append(chunk)
                 result_queue.put(("delta", chunk))
             response_text = "".join(full_response)
             generation_result["full_response"] = full_response
+            generation_result["parsed"] = parse_diagnosis_result(response_text)
             result_queue.put(("generation_done", None))
+        except Exception as exc:
+            result_queue.put(("error", str(exc)))
     async def event_generator():
         gen_thread = Thread(target=run_generation)
         gen_thread.start()
         loop = asyncio.get_event_loop()
         while True:
             try:
                 msg_type, data = await loop.run_in_executor(
+                    None,
+                    lambda: result_queue.get(timeout=0.1),
                 )
                 if msg_type == "generation_done":
                     break
+                if msg_type == "delta":
+                    yield f"data: {json.dumps({'type': 'delta', 'text': data}, ensure_ascii=False)}\n\n"
                 elif msg_type == "error":
                     yield f"data: {json.dumps({'type': 'error', 'message': data}, ensure_ascii=False)}\n\n"
                     gen_thread.join()
                     return
             except Empty:
                 await asyncio.sleep(0.01)
         gen_thread.join()
         parsed = generation_result["parsed"]
         if not parsed:
+            yield "data: {\"type\": \"error\", \"message\": \"Failed to parse response\"}\n\n"
             return
         raw_thinking = parsed["thinking"]
         raw_answer = parsed["answer"]
         refined_by_deepseek = False
         description = None
         thinking = raw_thinking
         answer = raw_answer
         if deepseek_service and deepseek_service.is_loaded:
             try:
                 refined = await deepseek_service.refine_diagnosis(
                     raw_answer=raw_answer,
                     raw_thinking=raw_thinking,
                     thinking = refined["analysis_process"]
                     answer = refined["diagnosis_result"]
                     refined_by_deepseek = True
+            except Exception as exc:
+                print(f"DeepSeek refinement failed, using original: {exc}")
         else:
             print("DeepSeek service not available, using raw results")
         final_payload = {
+            "description": description,
+            "thinking": thinking,
+            "answer": answer,
+            "raw": parsed["raw"],
+            "refined_by_deepseek": refined_by_deepseek,
             "success": True,
+            "message": "Diagnosis completed" if language == "en" else "诊断完成",
         }
+        yield f"data: {json.dumps({'type': 'final', 'result': final_payload}, ensure_ascii=False)}\n\n"
         temp_path = generation_result.get("temp_image_path")
+        if temp_path:
             try:
+                Path(temp_path).unlink(missing_ok=True)
+            except Exception:
                 pass
     return StreamingResponse(event_generator(), media_type="text/event-stream")
+def main() -> None:
+    uvicorn.run("app:app", host="0.0.0.0", port=5900, reload=False)
+if __name__ == "__main__":
+    main()

inference/{chat.py → full_precision/chat.py} RENAMED Viewed

@@ -1,48 +1,53 @@
-# chat.py
 import argparse
-import os
-from model_utils import SkinGPTModel
-def main():
-    parser = argparse.ArgumentParser(description="SkinGPT-R1 Multi-turn Chat")
-    parser.add_argument("--model_path", type=str, default="../checkpoint")
     parser.add_argument("--image", type=str, required=True, help="Path to initial image")
-    args = parser.parse_args()
-    # 初始化模型
-    bot = SkinGPTModel(args.model_path)
-    # 初始化对话历史
-    # 系统提示词
-    system_prompt = "You are a professional AI dermatology assistant. Analyze the skin condition carefully."
-    # 构造第一条包含图片的消息
-    if not os.path.exists(args.image):
         print(f"Error: Image {args.image} not found.")
         return
-    history = [
-        {
-            "role": "user",
-            "content": [
-                {"type": "image", "image": args.image},
-                {"type": "text", "text": f"{system_prompt}\n\nPlease analyze this image."}
-            ]
-        }
-    ]
     print("\n=== SkinGPT-R1 Chat (Type 'exit' to quit) ===")
     print(f"Image loaded: {args.image}")
-    # 获取第一轮诊断
     print("\nModel is thinking...", end="", flush=True)
-    response = bot.generate_response(history)
     print(f"\rAssistant: {response}\n")
-    # 将助手的回复加入历史
     history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
-    # 进入多轮对话循环
     while True:
         try:
             user_input = input("User: ")
@@ -51,18 +56,16 @@ def main():
             if not user_input.strip():
                 continue
-            # 加入用户的新问题
             history.append({"role": "user", "content": [{"type": "text", "text": user_input}]})
             print("Model is thinking...", end="", flush=True)
-            response = bot.generate_response(history)
             print(f"\rAssistant: {response}\n")
-            # 加入助手的新回复
             history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
         except KeyboardInterrupt:
             break
 if __name__ == "__main__":
-    main()

+from __future__ import annotations
 import argparse
+from pathlib import Path
+try:
+    from .model_utils import (
+        DEFAULT_MODEL_PATH,
+        SkinGPTModel,
+        build_single_turn_messages,
+        resolve_model_path,
+    )
+except ImportError:
+    from model_utils import (
+        DEFAULT_MODEL_PATH,
+        SkinGPTModel,
+        build_single_turn_messages,
+        resolve_model_path,
+    )
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="SkinGPT-R1 full-precision multi-turn chat")
+    parser.add_argument("--model_path", type=str, default=DEFAULT_MODEL_PATH)
     parser.add_argument("--image", type=str, required=True, help="Path to initial image")
+    return parser
+def main() -> None:
+    args = build_parser().parse_args()
+    if not Path(args.image).exists():
         print(f"Error: Image {args.image} not found.")
         return
+    model = SkinGPTModel(resolve_model_path(args.model_path))
+    history = build_single_turn_messages(
+        args.image,
+        "Please analyze this image.",
+        system_prompt="You are a professional AI dermatology assistant. Analyze the skin condition carefully.",
+    )
     print("\n=== SkinGPT-R1 Chat (Type 'exit' to quit) ===")
     print(f"Image loaded: {args.image}")
     print("\nModel is thinking...", end="", flush=True)
+    response = model.generate_response(history)
     print(f"\rAssistant: {response}\n")
     history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
     while True:
         try:
             user_input = input("User: ")
             if not user_input.strip():
                 continue
             history.append({"role": "user", "content": [{"type": "text", "text": user_input}]})
             print("Model is thinking...", end="", flush=True)
+            response = model.generate_response(history)
             print(f"\rAssistant: {response}\n")
             history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
         except KeyboardInterrupt:
             break
 if __name__ == "__main__":
+    main()

inference/{deepseek_service.py → full_precision/deepseek_service.py} RENAMED Viewed

@@ -1,75 +1,51 @@
-"""
-DeepSeek API Service
-Used to optimize and organize SkinGPT model output results
-"""
 import os
 import re
 from typing import Optional
 from openai import AsyncOpenAI
 class DeepSeekService:
-    """DeepSeek API Service Class"""
     def __init__(self, api_key: Optional[str] = None):
-        """
-        Initialize DeepSeek service
-        Parameters:
-            api_key: DeepSeek API key, reads from environment variable if not provided
-        """
         self.api_key = api_key or os.environ.get("DEEPSEEK_API_KEY")
         self.base_url = "https://api.deepseek.com"
-        self.model = "deepseek-chat"  # Using deepseek-chat model
         self.client = None
         self.is_loaded = False
-        print(f"DeepSeek API service initializing...")
         print(f"API Base URL: {self.base_url}")
     async def load(self):
-        """Initialize DeepSeek API client"""
         try:
             if not self.api_key:
                 print("DeepSeek API key not provided")
                 self.is_loaded = False
                 return
-            # Initialize OpenAI compatible client
-            self.client = AsyncOpenAI(
-                api_key=self.api_key,
-                base_url=self.base_url
-            )
             self.is_loaded = True
             print("DeepSeek API service is ready!")
-        except Exception as e:
-            print(f"DeepSeek API service initialization failed: {e}")
             self.is_loaded = False
     async def refine_diagnosis(
-        self,
         raw_answer: str,
         raw_thinking: Optional[str] = None,
-        language: str = "zh"
     ) -> dict:
-        """
-        Use DeepSeek API to optimize and organize diagnosis results
-        Parameters:
-            raw_answer: Original diagnosis result
-            raw_thinking: AI thinking process
-            language: Language option
-        Returns:
-            Dictionary containing "description", "analysis_process" and "diagnosis_result"
-        """
         if not self.is_loaded or self.client is None:
-            error_msg = "API not initialized, cannot generate analysis" if language == "en" else "API未初始化，无法生成分析过程"
             print("DeepSeek API not initialized, returning original result")
             return {
                 "success": False,
@@ -77,74 +53,67 @@ class DeepSeekService:
                 "analysis_process": raw_thinking or error_msg,
                 "diagnosis_result": raw_answer,
                 "original_diagnosis": raw_answer,
-                "error": "DeepSeek API not initialized"
             }
         try:
-            # Build prompt
             prompt = self._build_refine_prompt(raw_answer, raw_thinking, language)
-            # Select system prompt based on language
-            if language == "en":
-                system_content = "You are a professional medical text editor. Your task is to polish and organize medical diagnostic text to make it flow smoothly while preserving the original meaning. Output ONLY the formatted result. Do NOT add any explanations, comments, or thoughts. Just follow the format exactly."
-            else:
-                system_content = "你是医学文本整理专家，按照用户要求将用户输入的文本整理成用户想要的格式，不要改写或总结。"
-            # Call DeepSeek API
             response = await self.client.chat.completions.create(
                 model=self.model,
                 messages=[
                     {"role": "system", "content": system_content},
-                    {"role": "user", "content": prompt}
                 ],
                 temperature=0.1,
                 max_tokens=2048,
                 top_p=0.8,
             )
-            # Extract generated text
             generated_text = response.choices[0].message.content
-            # Parse output
             parsed = self._parse_refined_output(generated_text, raw_answer, raw_thinking, language)
             return {
                 "success": True,
                 "description": parsed["description"],
                 "analysis_process": parsed["analysis_process"],
                 "diagnosis_result": parsed["diagnosis_result"],
                 "original_diagnosis": raw_answer,
-                "raw_refined": generated_text
             }
-        except Exception as e:
-            print(f"DeepSeek API call failed: {e}")
-            error_msg = "API call failed, cannot generate analysis" if language == "en" else "API调用失败，无法生成分析过程"
             return {
                 "success": False,
                 "description": "",
                 "analysis_process": raw_thinking or error_msg,
                 "diagnosis_result": raw_answer,
                 "original_diagnosis": raw_answer,
-                "error": str(e)
             }
-    def _build_refine_prompt(self, raw_answer: str, raw_thinking: Optional[str] = None, language: str = "zh") -> str:
-        """
-        Build optimization prompt
-        Parameters:
-            raw_answer: Original diagnosis result
-            raw_thinking: AI thinking process
-            language: Language option, "zh" for Chinese, "en" for English
-        Returns:
-            Built prompt
-        """
         if language == "en":
-            # English prompt - organize and polish while preserving meaning
-            thinking_text = raw_thinking if raw_thinking else "No analysis process available."
-            prompt = f"""You are a text organization expert. There are two texts that need to be organized. Text 1 is the thinking process of the SkinGPT model, and Text 2 is the diagnosis result given by SkinGPT.
 【Requirements】
 - Preserve the original tone and expression style
@@ -171,21 +140,9 @@ class DeepSeekService:
 ## Diagnosis Result
 (The organized diagnosis result from Text 2)
-【Example】:
-## Description
-The image shows red inflamed patches on the skin with pustules and darker colored spots. The lesions appear as papules and pustules distributed across the affected area, with some showing signs of inflammation and possible post-inflammatory hyperpigmentation.
-## Analysis Process
-These findings are consistent with acne vulgaris, commonly seen during adolescence. The user's age aligns with typical onset for this condition. Treatment recommendations: over-the-counter medications such as benzoyl peroxide or topical antibiotics, avoiding picking at the skin, and consulting a dermatologist if severe. The goal is to control inflammation and prevent scarring.
-## Diagnosis Result
-Possible diagnosis: Acne (pimples) Explanation: Acne is a common skin condition, especially during adolescence, when hormonal changes cause overactive sebaceous glands, which can easily clog pores and form acne. Pathological care recommendations: 1. Keep face clean, wash face 2-3 times daily, use gentle cleansing products. 2. Avoid squeezing acne with hands to prevent worsening inflammation or leaving scars. 3. Avoid using irritating cosmetics and skincare products. 4. Can use topical medications containing salicylic acid, benzoyl peroxide, etc. 5. If necessary, can use oral antibiotics or other treatment methods under doctor's guidance. Precautions: 1. Avoid rubbing or damaging the affected area to prevent infection. 2. Eat less oily and spicy foods, eat more vegetables and fruits. 3. Maintain good rest habits, avoid staying up late. 4. If acne symptoms persist without improvement or show signs of worsening, seek medical attention promptly.
 """
-        else:
-            # Chinese prompt - translate to Simplified Chinese AND organize/polish
-            thinking_text = raw_thinking if raw_thinking else "No analysis process available."
-            prompt = f"""你是一个文本整理专家。有两段文本需要整理，文本1是SkinGPT模型的思考过程的文本，文本2是SkinGPT给出的诊断结果的文本。
 【要求】
 - 保留原文的语气和表达方式
@@ -198,7 +155,6 @@ Possible diagnosis: Acne (pimples) Explanation: Acne is a common skin condition,
 - 禁止推断或添加新的医学信息，禁止输出任何元评论
 - 可以调整不合理的语句或去除冗余内容以提高清晰度
 【文本1】
 {thinking_text}
@@ -214,171 +170,102 @@ Possible diagnosis: Acne (pimples) Explanation: Acne is a common skin condition,
 ## 诊断结果
 （整理后的诊断结果）
-【样例】:
-## 图像描述
-图片显示皮肤上有红色发炎的斑块，伴有脓疱和颜色较深的斑点。病变表现为分布在受影响区域的丘疹和脓疱，部分显示出炎症迹象和可能的炎症后色素沉着。
-## 分析过程
-这些表现符合寻常痤疮的特征，青春期常见。用户的年龄与该病症的典型发病年龄相符。治疗建议：使用非处方药物如过氧化苯甲酰或外用抗生素，避免抠抓皮肤，病情严重时咨询皮肤科医生。目标是控制炎症并防止疤痕形成。
-## 诊断结果
-可能的诊断：痤疮（青春痘） 解释：痤疮是一种常见的皮肤病，特别是在青少年期间，由于激素水平的变化导致皮脂腺过度活跃，容易堵塞毛孔，形成痤疮。 病理护理建议：1.保持面部清洁，每天洗脸2-3次，使用温和的洁面产品。 2.避免用手挤压痤疮，以免加重炎症或留下疤痕。 3.避免使用刺激性的化妆品和护肤品。 4.可以使用含有水杨酸、苯氧醇等成分的外用药物治疗。 5.如有需要，可以在医生指导下使用抗生素口服药或其他治疗方法。 注意事项：1. 避免摩擦或损伤患处，以免引起感染。 2. 饮食上应少吃油腻、辛辣食物，多吃蔬菜水果。 3. 保持良好的作息习惯，避免熬夜。 4. 如果痤疮症状持续不见好转或有恶化的趋势，应及时就医。
 """
-        return prompt
     def _parse_refined_output(
-        self,
-        generated_text: str,
         raw_answer: str,
         raw_thinking: Optional[str] = None,
-        language: str = "zh"
     ) -> dict:
-        """
-        Parse DeepSeek generated output
-        Parameters:
-            generated_text: DeepSeek generated text
-            raw_answer: Original diagnosis (as fallback)
-            raw_thinking: Original thinking process (as fallback)
-            language: Language option
-        Returns:
-            Dictionary containing description, analysis_process and diagnosis_result
-        """
         description = ""
         analysis_process = None
         diagnosis_result = None
         if language == "en":
-            # English patterns
             desc_match = re.search(
-                r'##\s*Description\s*\n([\s\S]*?)(?=##\s*Analysis\s*Process|$)',
                 generated_text,
-                re.IGNORECASE
             )
             analysis_match = re.search(
-                r'##\s*Analysis\s*Process\s*\n([\s\S]*?)(?=##\s*Diagnosis\s*Result|$)',
                 generated_text,
-                re.IGNORECASE
             )
             result_match = re.search(
-                r'##\s*Diagnosis\s*Result\s*\n([\s\S]*?)$',
                 generated_text,
-                re.IGNORECASE
             )
             desc_header = "## Description"
             analysis_header = "## Analysis Process"
             result_header = "## Diagnosis Result"
         else:
-            # Chinese patterns
-            desc_match = re.search(
-                r'##\s*图像描述\s*\n([\s\S]*?)(?=##\s*分析过程|$)',
-                generated_text
-            )
-            analysis_match = re.search(
-                r'##\s*分析过程\s*\n([\s\S]*?)(?=##\s*诊断结果|$)',
-                generated_text
-            )
-            result_match = re.search(
-                r'##\s*诊断结果\s*\n([\s\S]*?)$',
-                generated_text
-            )
             desc_header = "## 图像描述"
             analysis_header = "## 分析过程"
             result_header = "## 诊断结果"
-        # Extract description
         if desc_match:
             description = desc_match.group(1).strip()
-            print(f"Successfully parsed description")
         else:
-            print(f"Description parsing failed")
             description = ""
-        # Extract analysis process
         if analysis_match:
             analysis_process = analysis_match.group(1).strip()
-            print(f"Successfully parsed analysis process")
         else:
-            print(f"Analysis process parsing failed, trying other methods")
-            # Try to extract from generated text
             result_pos = generated_text.find(result_header)
             if result_pos > 0:
-                # Get content before diagnosis result
                 analysis_process = generated_text[:result_pos].strip()
-                # Remove possible headers
                 for header in [desc_header, analysis_header]:
-                    header_escaped = re.escape(header)
-                    analysis_process = re.sub(f'{header_escaped}\\s*\\n?', '', analysis_process).strip()
             else:
-                # If no format at all, try to get first half
-                mid_point = len(generated_text) // 2
-                analysis_process = generated_text[:mid_point].strip()
-            # If still empty, use original content (final fallback)
             if not analysis_process and raw_thinking:
-                print(f"Using original raw_thinking as fallback")
                 analysis_process = raw_thinking
-        # Extract diagnosis result
         if result_match:
             diagnosis_result = result_match.group(1).strip()
-            print(f"Successfully parsed diagnosis result")
         else:
-            print(f"Diagnosis result parsing failed, trying other methods")
-            # Try to extract from generated text
             result_pos = generated_text.find(result_header)
             if result_pos > 0:
                 diagnosis_result = generated_text[result_pos:].strip()
-                # Remove possible header
-                result_header_escaped = re.escape(result_header)
-                diagnosis_result = re.sub(f'^{result_header_escaped}\\s*\\n?', '', diagnosis_result).strip()
             else:
-                # If no format at all, get second half
-                mid_point = len(generated_text) // 2
-                diagnosis_result = generated_text[mid_point:].strip()
-            # If still empty, use original content (final fallback)
             if not diagnosis_result:
-                print(f"Using original raw_answer as fallback")
                 diagnosis_result = raw_answer
         return {
             "description": description,
             "analysis_process": analysis_process,
-            "diagnosis_result": diagnosis_result
         }
-# Global DeepSeek service instance (lazy loading)
 _deepseek_service: Optional[DeepSeekService] = None
 async def get_deepseek_service(api_key: Optional[str] = None) -> Optional[DeepSeekService]:
-    """
-    Get DeepSeek service instance (singleton pattern)
-    Parameters:
-        api_key: Optional API key to use
-    Returns:
-        DeepSeekService instance, or None if API initialization fails
-    """
     global _deepseek_service
     if _deepseek_service is None:
         try:
             _deepseek_service = DeepSeekService(api_key=api_key)
             await _deepseek_service.load()
             if not _deepseek_service.is_loaded:
                 print("DeepSeek API service initialization failed, will use fallback mode")
-                return _deepseek_service  # Return instance but marked as not loaded
-        except Exception as e:
-            print(f"DeepSeek service initialization failed: {e}")
             return None
     return _deepseek_service

+from __future__ import annotations
 import os
 import re
 from typing import Optional
 from openai import AsyncOpenAI
 class DeepSeekService:
+    """OpenAI-compatible DeepSeek refinement service."""
     def __init__(self, api_key: Optional[str] = None):
         self.api_key = api_key or os.environ.get("DEEPSEEK_API_KEY")
         self.base_url = "https://api.deepseek.com"
+        self.model = "deepseek-chat"
         self.client = None
         self.is_loaded = False
+        print("DeepSeek API service initializing...")
         print(f"API Base URL: {self.base_url}")
     async def load(self):
         try:
             if not self.api_key:
                 print("DeepSeek API key not provided")
                 self.is_loaded = False
                 return
+            self.client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
             self.is_loaded = True
             print("DeepSeek API service is ready!")
+        except Exception as exc:
+            print(f"DeepSeek API service initialization failed: {exc}")
             self.is_loaded = False
     async def refine_diagnosis(
+        self,
         raw_answer: str,
         raw_thinking: Optional[str] = None,
+        language: str = "zh",
     ) -> dict:
         if not self.is_loaded or self.client is None:
+            error_msg = (
+                "API not initialized, cannot generate analysis"
+                if language == "en"
+                else "API未初始化，无法生成分析过程"
+            )
             print("DeepSeek API not initialized, returning original result")
             return {
                 "success": False,
                 "analysis_process": raw_thinking or error_msg,
                 "diagnosis_result": raw_answer,
                 "original_diagnosis": raw_answer,
+                "error": "DeepSeek API not initialized",
             }
         try:
             prompt = self._build_refine_prompt(raw_answer, raw_thinking, language)
+            system_content = (
+                "You are a professional medical text editor. Your task is to polish and organize "
+                "medical diagnostic text to make it flow smoothly while preserving the original "
+                "meaning. Output ONLY the formatted result. Do NOT add any explanations, comments, "
+                "or thoughts. Just follow the format exactly."
+                if language == "en"
+                else "你是医学文本整理专家，按照用户要求将用户输入的文本整理成用户想要的格式，不要改写或总结。"
+            )
             response = await self.client.chat.completions.create(
                 model=self.model,
                 messages=[
                     {"role": "system", "content": system_content},
+                    {"role": "user", "content": prompt},
                 ],
                 temperature=0.1,
                 max_tokens=2048,
                 top_p=0.8,
             )
             generated_text = response.choices[0].message.content
             parsed = self._parse_refined_output(generated_text, raw_answer, raw_thinking, language)
             return {
                 "success": True,
                 "description": parsed["description"],
                 "analysis_process": parsed["analysis_process"],
                 "diagnosis_result": parsed["diagnosis_result"],
                 "original_diagnosis": raw_answer,
+                "raw_refined": generated_text,
             }
+        except Exception as exc:
+            print(f"DeepSeek API call failed: {exc}")
+            error_msg = (
+                "API call failed, cannot generate analysis"
+                if language == "en"
+                else "API调用失败，无法生成分析过程"
+            )
             return {
                 "success": False,
                 "description": "",
                 "analysis_process": raw_thinking or error_msg,
                 "diagnosis_result": raw_answer,
                 "original_diagnosis": raw_answer,
+                "error": str(exc),
             }
+    def _build_refine_prompt(
+        self,
+        raw_answer: str,
+        raw_thinking: Optional[str] = None,
+        language: str = "zh",
+    ) -> str:
+        thinking_text = raw_thinking if raw_thinking else "No analysis process available."
         if language == "en":
+            return f"""You are a text organization expert. There are two texts that need to be organized. Text 1 is the thinking process of the SkinGPT model, and Text 2 is the diagnosis result given by SkinGPT.
 【Requirements】
 - Preserve the original tone and expression style
 ## Diagnosis Result
 (The organized diagnosis result from Text 2)
 """
+        return f"""你是一个文本整理专家。有两段文本需要整理，文本1是SkinGPT模型的思考过程的文本，文本2是SkinGPT给出的诊断结果的文本。
 【要求】
 - 保留原文的语气和表达方式
 - 禁止推断或添加新的医学信息，禁止输出任何元评论
 - 可以调整不合理的语句或去除冗余内容以提高清晰度
 【文本1】
 {thinking_text}
 ## 诊断结果
 （整理后的诊断结果）
 """
     def _parse_refined_output(
+        self,
+        generated_text: str,
         raw_answer: str,
         raw_thinking: Optional[str] = None,
+        language: str = "zh",
     ) -> dict:
         description = ""
         analysis_process = None
         diagnosis_result = None
         if language == "en":
             desc_match = re.search(
+                r"##\s*Description\s*\n([\s\S]*?)(?=##\s*Analysis\s*Process|$)",
                 generated_text,
+                re.IGNORECASE,
             )
             analysis_match = re.search(
+                r"##\s*Analysis\s*Process\s*\n([\s\S]*?)(?=##\s*Diagnosis\s*Result|$)",
                 generated_text,
+                re.IGNORECASE,
             )
             result_match = re.search(
+                r"##\s*Diagnosis\s*Result\s*\n([\s\S]*?)$",
                 generated_text,
+                re.IGNORECASE,
             )
             desc_header = "## Description"
             analysis_header = "## Analysis Process"
             result_header = "## Diagnosis Result"
         else:
+            desc_match = re.search(r"##\s*图像描述\s*\n([\s\S]*?)(?=##\s*分析过程|$)", generated_text)
+            analysis_match = re.search(r"##\s*分析过程\s*\n([\s\S]*?)(?=##\s*诊断结果|$)", generated_text)
+            result_match = re.search(r"##\s*诊断结果\s*\n([\s\S]*?)$", generated_text)
             desc_header = "## 图像描述"
             analysis_header = "## 分析过程"
             result_header = "## 诊断结果"
         if desc_match:
             description = desc_match.group(1).strip()
         else:
             description = ""
         if analysis_match:
             analysis_process = analysis_match.group(1).strip()
         else:
             result_pos = generated_text.find(result_header)
             if result_pos > 0:
                 analysis_process = generated_text[:result_pos].strip()
                 for header in [desc_header, analysis_header]:
+                    analysis_process = re.sub(f"{re.escape(header)}\\s*\\n?", "", analysis_process).strip()
             else:
+                analysis_process = generated_text[: len(generated_text) // 2].strip()
             if not analysis_process and raw_thinking:
                 analysis_process = raw_thinking
         if result_match:
             diagnosis_result = result_match.group(1).strip()
         else:
             result_pos = generated_text.find(result_header)
             if result_pos > 0:
                 diagnosis_result = generated_text[result_pos:].strip()
+                diagnosis_result = re.sub(
+                    f"^{re.escape(result_header)}\\s*\\n?",
+                    "",
+                    diagnosis_result,
+                ).strip()
             else:
+                diagnosis_result = generated_text[len(generated_text) // 2 :].strip()
             if not diagnosis_result:
                 diagnosis_result = raw_answer
         return {
             "description": description,
             "analysis_process": analysis_process,
+            "diagnosis_result": diagnosis_result,
         }
 _deepseek_service: Optional[DeepSeekService] = None
 async def get_deepseek_service(api_key: Optional[str] = None) -> Optional[DeepSeekService]:
     global _deepseek_service
     if _deepseek_service is None:
         try:
             _deepseek_service = DeepSeekService(api_key=api_key)
             await _deepseek_service.load()
             if not _deepseek_service.is_loaded:
                 print("DeepSeek API service initialization failed, will use fallback mode")
+                return _deepseek_service
+        except Exception as exc:
+            print(f"DeepSeek service initialization failed: {exc}")
             return None
     return _deepseek_service

inference/full_precision/demo.py ADDED Viewed

	@@ -0,0 +1,41 @@

+from __future__ import annotations
+from pathlib import Path
+try:
+    from .model_utils import (
+        DEFAULT_MODEL_PATH,
+        SkinGPTModel,
+        build_single_turn_messages,
+        resolve_model_path,
+    )
+except ImportError:
+    from model_utils import (
+        DEFAULT_MODEL_PATH,
+        SkinGPTModel,
+        build_single_turn_messages,
+        resolve_model_path,
+    )
+IMAGE_PATH = "test_image.jpg"
+PROMPT = "Please analyze this skin image and provide a diagnosis."
+def main() -> None:
+    if not Path(IMAGE_PATH).exists():
+        print(f"Warning: Image not found at '{IMAGE_PATH}'. Please edit IMAGE_PATH in demo.py")
+        return
+    model = SkinGPTModel(resolve_model_path(DEFAULT_MODEL_PATH))
+    messages = build_single_turn_messages(IMAGE_PATH, PROMPT)
+    print("Processing...")
+    output_text = model.generate_response(messages)
+    print("\n=== Diagnosis Result ===")
+    print(output_text)
+    print("========================")
+if __name__ == "__main__":
+    main()

inference/full_precision/infer.py ADDED Viewed

	@@ -0,0 +1,54 @@

+from __future__ import annotations
+import argparse
+from pathlib import Path
+try:
+    from .model_utils import (
+        DEFAULT_MODEL_PATH,
+        SkinGPTModel,
+        build_single_turn_messages,
+        resolve_model_path,
+    )
+except ImportError:
+    from model_utils import (
+        DEFAULT_MODEL_PATH,
+        SkinGPTModel,
+        build_single_turn_messages,
+        resolve_model_path,
+    )
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="SkinGPT-R1 full-precision single inference")
+    parser.add_argument("--image", type=str, required=True, help="Path to the image")
+    parser.add_argument("--model_path", type=str, default=DEFAULT_MODEL_PATH)
+    parser.add_argument(
+        "--prompt",
+        type=str,
+        default="Please analyze this skin image and provide a diagnosis.",
+    )
+    return parser
+def main() -> None:
+    args = build_parser().parse_args()
+    if not Path(args.image).exists():
+        print(f"Error: Image not found at {args.image}")
+        return
+    model = SkinGPTModel(resolve_model_path(args.model_path))
+    messages = build_single_turn_messages(args.image, args.prompt)
+    print(f"\nAnalyzing {args.image}...")
+    response = model.generate_response(messages)
+    print("-" * 40)
+    print("Result:")
+    print(response)
+    print("-" * 40)
+if __name__ == "__main__":
+    main()

inference/{model_utils.py → full_precision/model_utils.py} RENAMED Viewed

@@ -1,51 +1,96 @@
-# model_utils.py
 import torch
-from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, TextIteratorStreamer
 from qwen_vl_utils import process_vision_info
-from PIL import Image
-import os
-from threading import Thread
 class SkinGPTModel:
-    def __init__(self, model_path, device=None):
-        self.model_path = model_path
         self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
-        print(f"Loading model from {model_path} on {self.device}...")
         self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-            model_path,
             torch_dtype=torch.bfloat16 if self.device != "cpu" else torch.float32,
             attn_implementation="flash_attention_2" if self.device == "cuda" else None,
             device_map="auto" if self.device != "mps" else None,
-            trust_remote_code=True
         )
         if self.device == "mps":
             self.model = self.model.to(self.device)
         self.processor = AutoProcessor.from_pretrained(
-            model_path,
-            trust_remote_code=True,
-            min_pixels=256*28*28,
-            max_pixels=1280*28*28
         )
         print("Model loaded successfully.")
-    def generate_response(self, messages, max_new_tokens=1024, temperature=0.7, repetition_penalty=1.2, no_repeat_ngram_size=3):
-        """
-        处理多轮对话的历史消息列表并生成回复
-        messages format:
-        [
-            {'role': 'user', 'content': [{'type': 'image', 'image': 'path...'}, {'type': 'text', 'text': '...'}]},
-            {'role': 'assistant', 'content': [{'type': 'text', 'text': '...'}]}
-        ]
-        """
-        # 预处理文本模板
-        text = self.processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-        # 预处理视觉信息
         image_inputs, video_inputs = process_vision_info(messages)
         inputs = self.processor(
             text=[text],
             images=image_inputs,
@@ -62,30 +107,35 @@ class SkinGPTModel:
                 repetition_penalty=repetition_penalty,
                 no_repeat_ngram_size=no_repeat_ngram_size,
                 top_p=0.9,
-                do_sample=True
             )
-        # 解码输出 (去除输入的token)
         generated_ids_trimmed = [
-            out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
         ]
         output_text = self.processor.batch_decode(
-            generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
         )
         return output_text[0]
-    def generate_response_stream(self, messages, max_new_tokens=1024, temperature=0.7, repetition_penalty=1.2, no_repeat_ngram_size=3):
-        """
-        流式生成响应
-        返回一个生成器，逐个yield生成的文本chunk
-        """
-        # 预处理文本模板
-        text = self.processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-        # 预处理视觉信息
         image_inputs, video_inputs = process_vision_info(messages)
         inputs = self.processor(
             text=[text],
             images=image_inputs,
@@ -93,15 +143,13 @@ class SkinGPTModel:
             padding=True,
             return_tensors="pt",
         ).to(self.model.device)
-        # 创建 TextIteratorStreamer 用于流式输出
         streamer = TextIteratorStreamer(
             self.processor.tokenizer,
             skip_prompt=True,
-            skip_special_tokens=True
         )
-        # 准备生成参数
         generation_kwargs = {
             **inputs,
             "max_new_tokens": max_new_tokens,
@@ -112,13 +160,11 @@ class SkinGPTModel:
             "do_sample": True,
             "streamer": streamer,
         }
-        # 在单独的线程中运行生成
         thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
         thread.start()
-        # 逐个yield生成的文本
         for text_chunk in streamer:
             yield text_chunk
-        thread.join()

+from __future__ import annotations
+from pathlib import Path
+from threading import Thread
+from typing import List
 import torch
 from qwen_vl_utils import process_vision_info
+from transformers import (
+    AutoProcessor,
+    Qwen2_5_VLForConditionalGeneration,
+    TextIteratorStreamer,
+)
+DEFAULT_MODEL_PATH = "./checkpoints/full_precision"
+DEFAULT_SYSTEM_PROMPT = "You are a professional AI dermatology assistant."
+def resolve_model_path(model_path: str = DEFAULT_MODEL_PATH) -> str:
+    """Resolve a model path for both cloned-repo and local-dev layouts."""
+    raw_path = Path(model_path).expanduser()
+    repo_root = Path(__file__).resolve().parents[2]
+    candidates = [raw_path]
+    if not raw_path.is_absolute():
+        candidates.append(Path.cwd() / raw_path)
+        candidates.append(repo_root / raw_path)
+        if raw_path.parts and raw_path.parts[0] == repo_root.name:
+            candidates.append(repo_root.joinpath(*raw_path.parts[1:]))
+    for candidate in candidates:
+        if candidate.exists():
+            return str(candidate)
+    return str(raw_path)
+def build_single_turn_messages(
+    image_path: str,
+    prompt: str,
+    system_prompt: str = DEFAULT_SYSTEM_PROMPT,
+) -> List[dict]:
+    return [
+        {
+            "role": "user",
+            "content": [
+                {"type": "image", "image": image_path},
+                {"type": "text", "text": f"{system_prompt}\n\n{prompt}"},
+            ],
+        }
+    ]
 class SkinGPTModel:
+    def __init__(self, model_path: str = DEFAULT_MODEL_PATH, device: str | None = None):
+        resolved_model_path = resolve_model_path(model_path)
+        self.model_path = resolved_model_path
         self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
+        print(f"Loading model from {resolved_model_path} on {self.device}...")
         self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+            resolved_model_path,
             torch_dtype=torch.bfloat16 if self.device != "cpu" else torch.float32,
             attn_implementation="flash_attention_2" if self.device == "cuda" else None,
             device_map="auto" if self.device != "mps" else None,
+            trust_remote_code=True,
         )
         if self.device == "mps":
             self.model = self.model.to(self.device)
         self.processor = AutoProcessor.from_pretrained(
+            resolved_model_path,
+            trust_remote_code=True,
+            min_pixels=256 * 28 * 28,
+            max_pixels=1280 * 28 * 28,
         )
         print("Model loaded successfully.")
+    def generate_response(
+        self,
+        messages,
+        max_new_tokens: int = 1024,
+        temperature: float = 0.7,
+        repetition_penalty: float = 1.2,
+        no_repeat_ngram_size: int = 3,
+    ) -> str:
+        text = self.processor.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True,
+        )
         image_inputs, video_inputs = process_vision_info(messages)
         inputs = self.processor(
             text=[text],
             images=image_inputs,
                 repetition_penalty=repetition_penalty,
                 no_repeat_ngram_size=no_repeat_ngram_size,
                 top_p=0.9,
+                do_sample=True,
             )
         generated_ids_trimmed = [
+            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
         ]
         output_text = self.processor.batch_decode(
+            generated_ids_trimmed,
+            skip_special_tokens=True,
+            clean_up_tokenization_spaces=False,
         )
         return output_text[0]
+    def generate_response_stream(
+        self,
+        messages,
+        max_new_tokens: int = 1024,
+        temperature: float = 0.7,
+        repetition_penalty: float = 1.2,
+        no_repeat_ngram_size: int = 3,
+    ):
+        text = self.processor.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True,
+        )
         image_inputs, video_inputs = process_vision_info(messages)
         inputs = self.processor(
             text=[text],
             images=image_inputs,
             padding=True,
             return_tensors="pt",
         ).to(self.model.device)
         streamer = TextIteratorStreamer(
             self.processor.tokenizer,
             skip_prompt=True,
+            skip_special_tokens=True,
         )
         generation_kwargs = {
             **inputs,
             "max_new_tokens": max_new_tokens,
             "do_sample": True,
             "streamer": streamer,
         }
         thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
         thread.start()
         for text_chunk in streamer:
             yield text_chunk
+        thread.join()

inference/full_precision/run_api.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+set -euo pipefail
+PYTHON_EXE="${PYTHON_EXE:-python}"
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+"${PYTHON_EXE}" "${SCRIPT_DIR}/app.py"

inference/full_precision/run_chat.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+set -euo pipefail
+PYTHON_EXE="${PYTHON_EXE:-python}"
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+"${PYTHON_EXE}" "${SCRIPT_DIR}/chat.py" "$@"

inference/full_precision/run_infer.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+set -euo pipefail
+PYTHON_EXE="${PYTHON_EXE:-python}"
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+"${PYTHON_EXE}" "${SCRIPT_DIR}/infer.py" "$@"

inference/inference.py DELETED Viewed

@@ -1,43 +0,0 @@
-import argparse
-from model_utils import SkinGPTModel
-import os
-def main():
-    parser = argparse.ArgumentParser(description="SkinGPT-R1 Single Inference")
-    parser.add_argument("--image", type=str, required=True, help="Path to the image")
-    parser.add_argument("--model_path", type=str, default="../checkpoint")
-    parser.add_argument("--prompt", type=str, default="Please analyze this skin image and provide a diagnosis.")
-    args = parser.parse_args()
-    if not os.path.exists(args.image):
-        print(f"Error: Image not found at {args.image}")
-        return
-    # 1. 加载模型 (复用 model_utils)
-    # 这样你就不用在这里重复写 transformers 的加载代码了
-    bot = SkinGPTModel(args.model_path)
-    # 2. 构造单轮消息
-    system_prompt = "You are a professional AI dermatology assistant."
-    messages = [
-        {
-            "role": "user",
-            "content": [
-                {"type": "image", "image": args.image},
-                {"type": "text", "text": f"{system_prompt}\n\n{args.prompt}"}
-            ]
-        }
-    ]
-    # 3. 推理
-    print(f"\nAnalyzing {args.image}...")
-    response = bot.generate_response(messages)
-    print("-" * 40)
-    print("Result:")
-    print(response)
-    print("-" * 40)
-if __name__ == "__main__":
-    main()

inference/int4_quantized/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """INT4 quantized inference package for SkinGPT-R1."""

inference/int4_quantized/__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (18 kB). View file

inference/int4_quantized/__pycache__/chat.cpython-311.pyc ADDED Viewed

Binary file (3.49 kB). View file

inference/int4_quantized/__pycache__/infer.cpython-311.pyc ADDED Viewed

Binary file (4.48 kB). View file

inference/int4_quantized/__pycache__/model_utils.cpython-311.pyc ADDED Viewed

Binary file (28.9 kB). View file

inference/{.ipynb_checkpoints/app-checkpoint.py → int4_quantized/app.py} RENAMED Viewed

@@ -1,133 +1,97 @@
-# app.py
-import uvicorn
 import os
 import shutil
 import uuid
-import json
-import re
-import asyncio
-from typing import Optional
-from io import BytesIO
 from contextlib import asynccontextmanager
-from PIL import Image
-from fastapi import FastAPI, UploadFile, File, Form, HTTPException, Request
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import StreamingResponse
-from fastapi.concurrency import run_in_threadpool
-from model_utils import SkinGPTModel
-from deepseek_service import get_deepseek_service, DeepSeekService
-# === Configuration ===
-MODEL_PATH = "../checkpoint"
-TEMP_DIR = "./temp_uploads"
-os.makedirs(TEMP_DIR, exist_ok=True)
-# DeepSeek API Key
-DEEPSEEK_API_KEY = os.environ.get("DEEPSEEK_API_KEY", "sk-b221f29be052460f9e0fe12d88dd343c")
-# Global DeepSeek service instance
 deepseek_service: Optional[DeepSeekService] = None
-@asynccontextmanager
-async def lifespan(app: FastAPI):
-    """应用生命周期管理"""
-    # 启动时初始化 DeepSeek 服务
-    await init_deepseek()
-    yield
-    print("\nShutting down service...")
-app = FastAPI(
-    title="SkinGPT-R1 皮肤诊断系统",
-    description="智能皮肤诊断助手",
-    version="1.0.0",
-    lifespan=lifespan
-)
-# CORS配置 - 允许前端访问
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["http://localhost:3000", "http://localhost:5173", "http://127.0.0.1:5173", "*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-# 全局变量存储状态
-# chat_states: 存储对话历史 (List of messages for Qwen)
-# pending_images: 存储已上传但尚未发送给LLM的图片路径 (State ID -> Image Path)
-chat_states = {}
-pending_images = {}
-def parse_diagnosis_result(raw_text: str) -> dict:
-    """
-    解析诊断结果中的think和answer标签
-    参数:
-    - raw_text: 原始诊断文本
-    返回:
-    - dict: 包含thinking, answer, raw字段的字典
-    """
-    import re
-    # 尝试匹配完整的标签
-    think_match = re.search(r'<think>([\s\S]*?)</think>', raw_text)
-    answer_match = re.search(r'<answer>([\s\S]*?)</answer>', raw_text)
-    thinking = None
-    answer = None
-    # 处理think标签
-    if think_match:
-        thinking = think_match.group(1).strip()
-    else:
-        # 尝试匹配未闭合的think标签（输出被截断的情况）
-        unclosed_think = re.search(r'<think>([\s\S]*?)(?=<answer>|$)', raw_text)
         if unclosed_think:
             thinking = unclosed_think.group(1).strip()
-    # 处理answer标签
-    if answer_match:
-        answer = answer_match.group(1).strip()
-    else:
-        # 尝试匹配未闭合的answer标签
-        unclosed_answer = re.search(r'<answer>([\s\S]*?)$', raw_text)
         if unclosed_answer:
             answer = unclosed_answer.group(1).strip()
-    # 如果仍然没有找到answer，清理原始文本作为answer
     if not answer:
-        # 移除所有标签及其内容
-        cleaned = re.sub(r'<think>[\s\S]*?</think>', '', raw_text)
-        cleaned = re.sub(r'<think>[\s\S]*', '', cleaned)  # 移除未闭合的think
-        cleaned = re.sub(r'</?answer>', '', cleaned)  # 移除answer标签
-        cleaned = cleaned.strip()
-        answer = cleaned if cleaned else raw_text
-    # 清理可能残留的标签
-    if answer:
-        answer = re.sub(r'</?think>|</?answer>', '', answer).strip()
-    if thinking:
-        thinking = re.sub(r'</?think>|</?answer>', '', thinking).strip()
-    # 处理 "Final Answer:" 格式，提取其后的内容
     if answer:
-        final_answer_match = re.search(r'Final Answer:\s*([\s\S]*)', answer, re.IGNORECASE)
         if final_answer_match:
             answer = final_answer_match.group(1).strip()
-    return {
-        "thinking": thinking if thinking else None,
-        "answer": answer,
-        "raw": raw_text
-    }
-print("Initializing Model Service...")
-# 全局加载模型
-gpt_model = SkinGPTModel(MODEL_PATH)
-print("Service Ready.")
-# 初始化 DeepSeek 服务（异步）
 async def init_deepseek():
     global deepseek_service
     print("\nInitializing DeepSeek service...")
@@ -137,120 +101,115 @@ async def init_deepseek():
     else:
         print("DeepSeek service not available, will return raw results")
 @app.post("/v1/upload/{state_id}")
 async def upload_file(state_id: str, file: UploadFile = File(...), survey: str = Form(None)):
-    """
-    接收图片上传。
-    逻辑：将图片保存到本地临时目录，并标记该 state_id 有一张待处理图片。
-    """
     try:
-        # 1. 保存图片到本地临时文件
         file_extension = file.filename.split(".")[-1] if "." in file.filename else "jpg"
         unique_name = f"{state_id}_{uuid.uuid4().hex}.{file_extension}"
-        file_path = os.path.join(TEMP_DIR, unique_name)
-        with open(file_path, "wb") as buffer:
             shutil.copyfileobj(file.file, buffer)
-        # 2. 记录图片路径等待下一次 predict 调用时使用
-        # 如果是多图模式，这里可以改成 list，目前演示单图覆盖或更新
-        pending_images[state_id] = file_path
-        # 3. 初始化对话状态（如果是新会话）
         if state_id not in chat_states:
             chat_states[state_id] = []
-        return {"message": "Image uploaded successfully", "path": file_path}
-    except Exception as e:
-        raise HTTPException(status_code=500, detail=f"Upload failed: {str(e)}")
 @app.post("/v1/predict/{state_id}")
 async def v1_predict(request: Request, state_id: str):
-    """
-    接收文本并执行推理。
-    逻辑：检查是否有待处理图片。如果有，将其与文本组合成 multimodal 消息。
-    """
     try:
         data = await request.json()
-    except:
-        raise HTTPException(status_code=400, detail="Invalid JSON")
     user_message = data.get("message", "")
     if not user_message:
         raise HTTPException(status_code=400, detail="Missing 'message' field")
-    # 获取或初始化历史
     history = chat_states.get(state_id, [])
-    # 构建当前轮次的用户内容
     current_content = []
-    # 1. 检查是否有刚刚上传的图片
     if state_id in pending_images:
-        img_path = pending_images.pop(state_id) # 取出并移除
         current_content.append({"type": "image", "image": img_path})
-        # ��果是第一次对话，加上 System Prompt
         if not history:
-             system_prompt = "You are a professional AI dermatology assistant. "
-             user_message = f"{system_prompt}\n\n{user_message}"
-    # 2. 添加文本
     current_content.append({"type": "text", "text": user_message})
-    # 3. 更新历史
     history.append({"role": "user", "content": current_content})
     chat_states[state_id] = history
-    # 4. 运行推理 (在线程池中运行以防阻塞)
     try:
-        response_text = await run_in_threadpool(
-            gpt_model.generate_response,
-            messages=history
-        )
-    except Exception as e:
-        # 回滚历史（移除刚才出错的用户提问）
         chat_states[state_id].pop()
-        raise HTTPException(status_code=500, detail=f"Inference error: {str(e)}")
-    # 5. 将回复加入历史
     history.append({"role": "assistant", "content": [{"type": "text", "text": response_text}]})
     chat_states[state_id] = history
     return {"message": response_text}
 @app.post("/v1/reset/{state_id}")
 async def reset_chat(state_id: str):
-    """清除会话状态"""
     if state_id in chat_states:
         del chat_states[state_id]
     if state_id in pending_images:
-        # 可选：删除临时文件
         try:
-            os.remove(pending_images[state_id])
-        except:
             pass
         del pending_images[state_id]
     return {"message": "Chat history reset"}
 @app.get("/")
 async def root():
-    """根路径"""
     return {
-        "name": "SkinGPT-R1 皮肤诊断系统",
-        "version": "1.0.0",
         "status": "running",
-        "description": "智能皮肤诊断助手"
     }
 @app.get("/health")
 async def health_check():
-    """健康检查"""
-    return {
-        "status": "healthy",
-        "model_loaded": True
-    }
 @app.post("/diagnose/stream")
 async def diagnose_stream(
@@ -258,126 +217,89 @@ async def diagnose_stream(
     text: str = Form(...),
     language: str = Form("zh"),
 ):
-    """
-    SSE流式诊断接口（用于前端）
-    支持图片上传和文本输入，返回真正的流式响应
-    使用 DeepSeek API 优化输出格式
-    """
-    from queue import Queue, Empty
-    from threading import Thread
     language = language if language in ("zh", "en") else "zh"
-    # 处理图片
     pil_image = None
-    temp_image_path = None
     if image:
         contents = await image.read()
         pil_image = Image.open(BytesIO(contents)).convert("RGB")
-    # 创建队列用于线程间通信
     result_queue = Queue()
-    # 用于存储完整响应和解析结果
     generation_result = {"full_response": [], "parsed": None, "temp_image_path": None}
     def run_generation():
-        """在后台线程中运行流式生成"""
         full_response = []
         try:
-            # 构建消息
             messages = []
             current_content = []
-            # 添加系统提示
-            system_prompt = "You are a professional AI dermatology assistant." if language == "en" else "你是一个专业的AI皮肤科助手。"
-            # 如果有图片，保存到临时文件
             if pil_image:
-                generation_result["temp_image_path"] = os.path.join(TEMP_DIR, f"temp_{uuid.uuid4().hex}.jpg")
-                pil_image.save(generation_result["temp_image_path"])
-                current_content.append({"type": "image", "image": generation_result["temp_image_path"]})
-            # 添加文本
-            prompt = f"{system_prompt}\n\n{text}"
-            current_content.append({"type": "text", "text": prompt})
             messages.append({"role": "user", "content": current_content})
-            # 流式生成 - 每个 chunk 立即���入队列
             for chunk in gpt_model.generate_response_stream(
                 messages=messages,
-                max_new_tokens=2048,
-                temperature=0.7
             ):
                 full_response.append(chunk)
                 result_queue.put(("delta", chunk))
-            # 解析结果
             response_text = "".join(full_response)
-            parsed = parse_diagnosis_result(response_text)
             generation_result["full_response"] = full_response
-            generation_result["parsed"] = parsed
-            # 标记生成完成
             result_queue.put(("generation_done", None))
-        except Exception as e:
-            result_queue.put(("error", str(e)))
     async def event_generator():
-        """异步生成SSE事件"""
-        # 在后台线程启动生成（非阻塞）
         gen_thread = Thread(target=run_generation)
         gen_thread.start()
         loop = asyncio.get_event_loop()
-        # 从队列中读取并发送流式内容
         while True:
             try:
-                # 非阻塞获取
                 msg_type, data = await loop.run_in_executor(
-                    None,
-                    lambda: result_queue.get(timeout=0.1)
                 )
                 if msg_type == "generation_done":
-                    # 流式生成完成，准备处理最终结果
                     break
-                elif msg_type == "delta":
-                    yield_chunk = json.dumps({"type": "delta", "text": data}, ensure_ascii=False)
-                    yield f"data: {yield_chunk}\n\n"
                 elif msg_type == "error":
                     yield f"data: {json.dumps({'type': 'error', 'message': data}, ensure_ascii=False)}\n\n"
                     gen_thread.join()
                     return
             except Empty:
-                # 队列暂时为空，继续等待
                 await asyncio.sleep(0.01)
-                continue
         gen_thread.join()
-        # 获取解析结果
         parsed = generation_result["parsed"]
         if not parsed:
-            yield f"data: {json.dumps({'type': 'error', 'message': 'Failed to parse response'}, ensure_ascii=False)}\n\n"
             return
         raw_thinking = parsed["thinking"]
         raw_answer = parsed["answer"]
-        # 使用 DeepSeek 优化结果
         refined_by_deepseek = False
         description = None
         thinking = raw_thinking
         answer = raw_answer
         if deepseek_service and deepseek_service.is_loaded:
             try:
-                print(f"Calling DeepSeek to refine diagnosis (language={language})...")
                 refined = await deepseek_service.refine_diagnosis(
                     raw_answer=raw_answer,
                     raw_thinking=raw_thinking,
@@ -388,36 +310,35 @@ async def diagnose_stream(
                     thinking = refined["analysis_process"]
                     answer = refined["diagnosis_result"]
                     refined_by_deepseek = True
-                    print(f"DeepSeek refinement completed successfully")
-            except Exception as e:
-                print(f"DeepSeek refinement failed, using original: {e}")
         else:
             print("DeepSeek service not available, using raw results")
-        success_msg = "Diagnosis completed" if language == "en" else "诊断完成"
-        # 返回格式与参考项目保持一致
         final_payload = {
-            "description": description,              # 图片描述（从 thinking 中提取）
-            "thinking": thinking,                    # 分析过程（DeepSeek 优化后）
-            "answer": answer,                        # 诊断结果（DeepSeek 优化后）
-            "raw": parsed["raw"],                    # 原始响应
-            "refined_by_deepseek": refined_by_deepseek,  # 是否被 DeepSeek 优化
             "success": True,
-            "message": success_msg
         }
-        yield_final = json.dumps({"type": "final", "result": final_payload}, ensure_ascii=False)
-        yield f"data: {yield_final}\n\n"
-        # 清理临时图片
         temp_path = generation_result.get("temp_image_path")
-        if temp_path and os.path.exists(temp_path):
             try:
-                os.remove(temp_path)
-            except:
                 pass
     return StreamingResponse(event_generator(), media_type="text/event-stream")
-if __name__ == '__main__':
-    uvicorn.run("app:app", host="0.0.0.0", port=5900, reload=False)

+from __future__ import annotations
+import asyncio
+import json
 import os
 import shutil
+import sys
 import uuid
 from contextlib import asynccontextmanager
+from io import BytesIO
+from pathlib import Path
+from queue import Empty, Queue
+from threading import Thread
+from typing import Optional
+import uvicorn
+from fastapi import FastAPI, File, Form, HTTPException, Request, UploadFile
+from fastapi.concurrency import run_in_threadpool
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import StreamingResponse
+from PIL import Image
+try:
+    from .model_utils import (
+        DEFAULT_DO_SAMPLE,
+        DEFAULT_MAX_NEW_TOKENS,
+        DEFAULT_MODEL_PATH,
+        DEFAULT_REPETITION_PENALTY,
+        QuantizedSkinGPTModel,
+    )
+except ImportError:
+    from model_utils import (
+        DEFAULT_DO_SAMPLE,
+        DEFAULT_MAX_NEW_TOKENS,
+        DEFAULT_MODEL_PATH,
+        DEFAULT_REPETITION_PENALTY,
+        QuantizedSkinGPTModel,
+    )
+try:
+    from inference.full_precision.deepseek_service import DeepSeekService, get_deepseek_service
+except ImportError:
+    sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
+    from inference.full_precision.deepseek_service import DeepSeekService, get_deepseek_service
+TEMP_DIR = Path(__file__).resolve().parents[1] / "temp_uploads"
+TEMP_DIR.mkdir(parents=True, exist_ok=True)
+DEEPSEEK_API_KEY = os.environ.get("DEEPSEEK_API_KEY")
 deepseek_service: Optional[DeepSeekService] = None
+def parse_diagnosis_result(raw_text: str) -> dict:
+    import re
+    think_match = re.search(r"<think>([\s\S]*?)</think>", raw_text)
+    answer_match = re.search(r"<answer>([\s\S]*?)</answer>", raw_text)
+    thinking = think_match.group(1).strip() if think_match else None
+    answer = answer_match.group(1).strip() if answer_match else None
+    if not thinking:
+        unclosed_think = re.search(r"<think>([\s\S]*?)(?=<answer>|$)", raw_text)
         if unclosed_think:
             thinking = unclosed_think.group(1).strip()
+    if not answer:
+        unclosed_answer = re.search(r"<answer>([\s\S]*?)$", raw_text)
         if unclosed_answer:
             answer = unclosed_answer.group(1).strip()
     if not answer:
+        cleaned = re.sub(r"<think>[\s\S]*?</think>", "", raw_text)
+        cleaned = re.sub(r"<think>[\s\S]*", "", cleaned)
+        cleaned = re.sub(r"</?answer>", "", cleaned)
+        answer = cleaned.strip() or raw_text
     if answer:
+        answer = re.sub(r"</?think>|</?answer>", "", answer).strip()
+        final_answer_match = re.search(r"Final Answer:\s*([\s\S]*)", answer, re.IGNORECASE)
         if final_answer_match:
             answer = final_answer_match.group(1).strip()
+    if thinking:
+        thinking = re.sub(r"</?think>|</?answer>", "", thinking).strip()
+    return {"thinking": thinking or None, "answer": answer, "raw": raw_text}
+print("Initializing INT4 Model Service...")
+gpt_model = QuantizedSkinGPTModel(DEFAULT_MODEL_PATH)
+print("INT4 service ready.")
 async def init_deepseek():
     global deepseek_service
     print("\nInitializing DeepSeek service...")
     else:
         print("DeepSeek service not available, will return raw results")
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    await init_deepseek()
+    yield
+    print("\nShutting down INT4 service...")
+app = FastAPI(
+    title="SkinGPT-R1 INT4 API",
+    description="INT4 quantized dermatology assistant backend",
+    version="1.1.0",
+    lifespan=lifespan,
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["http://localhost:3000", "http://localhost:5173", "http://127.0.0.1:5173", "*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+chat_states = {}
+pending_images = {}
 @app.post("/v1/upload/{state_id}")
 async def upload_file(state_id: str, file: UploadFile = File(...), survey: str = Form(None)):
+    del survey
     try:
         file_extension = file.filename.split(".")[-1] if "." in file.filename else "jpg"
         unique_name = f"{state_id}_{uuid.uuid4().hex}.{file_extension}"
+        file_path = TEMP_DIR / unique_name
+        with file_path.open("wb") as buffer:
             shutil.copyfileobj(file.file, buffer)
+        pending_images[state_id] = str(file_path)
         if state_id not in chat_states:
             chat_states[state_id] = []
+        return {"message": "Image uploaded successfully", "path": str(file_path)}
+    except Exception as exc:
+        raise HTTPException(status_code=500, detail=f"Upload failed: {exc}") from exc
 @app.post("/v1/predict/{state_id}")
 async def v1_predict(request: Request, state_id: str):
     try:
         data = await request.json()
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail="Invalid JSON") from exc
     user_message = data.get("message", "")
     if not user_message:
         raise HTTPException(status_code=400, detail="Missing 'message' field")
     history = chat_states.get(state_id, [])
     current_content = []
     if state_id in pending_images:
+        img_path = pending_images.pop(state_id)
         current_content.append({"type": "image", "image": img_path})
         if not history:
+            user_message = f"You are a professional AI dermatology assistant.\n\n{user_message}"
     current_content.append({"type": "text", "text": user_message})
     history.append({"role": "user", "content": current_content})
     chat_states[state_id] = history
     try:
+        response_text = await run_in_threadpool(gpt_model.generate_response, messages=history)
+    except Exception as exc:
         chat_states[state_id].pop()
+        raise HTTPException(status_code=500, detail=f"Inference error: {exc}") from exc
     history.append({"role": "assistant", "content": [{"type": "text", "text": response_text}]})
     chat_states[state_id] = history
     return {"message": response_text}
 @app.post("/v1/reset/{state_id}")
 async def reset_chat(state_id: str):
     if state_id in chat_states:
         del chat_states[state_id]
     if state_id in pending_images:
         try:
+            Path(pending_images[state_id]).unlink(missing_ok=True)
+        except Exception:
             pass
         del pending_images[state_id]
     return {"message": "Chat history reset"}
 @app.get("/")
 async def root():
     return {
+        "name": "SkinGPT-R1 INT4 API",
+        "version": "1.1.0",
         "status": "running",
+        "description": "INT4 quantized dermatology assistant",
     }
 @app.get("/health")
 async def health_check():
+    return {"status": "healthy", "model_loaded": True}
 @app.post("/diagnose/stream")
 async def diagnose_stream(
     text: str = Form(...),
     language: str = Form("zh"),
 ):
     language = language if language in ("zh", "en") else "zh"
     pil_image = None
     if image:
         contents = await image.read()
         pil_image = Image.open(BytesIO(contents)).convert("RGB")
     result_queue = Queue()
     generation_result = {"full_response": [], "parsed": None, "temp_image_path": None}
     def run_generation():
         full_response = []
         try:
             messages = []
             current_content = []
+            system_prompt = (
+                "You are a professional AI dermatology assistant."
+                if language == "en"
+                else "你是一个专业的AI皮肤科助手。"
+            )
             if pil_image:
+                temp_image_path = TEMP_DIR / f"temp_{uuid.uuid4().hex}.jpg"
+                pil_image.save(temp_image_path)
+                generation_result["temp_image_path"] = str(temp_image_path)
+                current_content.append({"type": "image", "image": str(temp_image_path)})
+            current_content.append({"type": "text", "text": f"{system_prompt}\n\n{text}"})
             messages.append({"role": "user", "content": current_content})
             for chunk in gpt_model.generate_response_stream(
                 messages=messages,
+                max_new_tokens=DEFAULT_MAX_NEW_TOKENS,
+                do_sample=DEFAULT_DO_SAMPLE,
+                repetition_penalty=DEFAULT_REPETITION_PENALTY,
             ):
                 full_response.append(chunk)
                 result_queue.put(("delta", chunk))
             response_text = "".join(full_response)
             generation_result["full_response"] = full_response
+            generation_result["parsed"] = parse_diagnosis_result(response_text)
             result_queue.put(("generation_done", None))
+        except Exception as exc:
+            result_queue.put(("error", str(exc)))
     async def event_generator():
         gen_thread = Thread(target=run_generation)
         gen_thread.start()
         loop = asyncio.get_event_loop()
         while True:
             try:
                 msg_type, data = await loop.run_in_executor(
+                    None,
+                    lambda: result_queue.get(timeout=0.1),
                 )
                 if msg_type == "generation_done":
                     break
+                if msg_type == "delta":
+                    yield f"data: {json.dumps({'type': 'delta', 'text': data}, ensure_ascii=False)}\n\n"
                 elif msg_type == "error":
                     yield f"data: {json.dumps({'type': 'error', 'message': data}, ensure_ascii=False)}\n\n"
                     gen_thread.join()
                     return
             except Empty:
                 await asyncio.sleep(0.01)
         gen_thread.join()
         parsed = generation_result["parsed"]
         if not parsed:
+            yield "data: {\"type\": \"error\", \"message\": \"Failed to parse response\"}\n\n"
             return
         raw_thinking = parsed["thinking"]
         raw_answer = parsed["answer"]
         refined_by_deepseek = False
         description = None
         thinking = raw_thinking
         answer = raw_answer
         if deepseek_service and deepseek_service.is_loaded:
             try:
                 refined = await deepseek_service.refine_diagnosis(
                     raw_answer=raw_answer,
                     raw_thinking=raw_thinking,
                     thinking = refined["analysis_process"]
                     answer = refined["diagnosis_result"]
                     refined_by_deepseek = True
+            except Exception as exc:
+                print(f"DeepSeek refinement failed, using original: {exc}")
         else:
             print("DeepSeek service not available, using raw results")
         final_payload = {
+            "description": description,
+            "thinking": thinking,
+            "answer": answer,
+            "raw": parsed["raw"],
+            "refined_by_deepseek": refined_by_deepseek,
             "success": True,
+            "message": "Diagnosis completed" if language == "en" else "诊断完成",
         }
+        yield f"data: {json.dumps({'type': 'final', 'result': final_payload}, ensure_ascii=False)}\n\n"
         temp_path = generation_result.get("temp_image_path")
+        if temp_path:
             try:
+                Path(temp_path).unlink(missing_ok=True)
+            except Exception:
                 pass
     return StreamingResponse(event_generator(), media_type="text/event-stream")
+def main() -> None:
+    uvicorn.run("app:app", host="0.0.0.0", port=5901, reload=False)
+if __name__ == "__main__":
+    main()

inference/{.ipynb_checkpoints/chat-checkpoint.py → int4_quantized/chat.py} RENAMED Viewed

@@ -1,48 +1,51 @@
-# chat.py
 import argparse
-import os
-from model_utils import SkinGPTModel
-def main():
-    parser = argparse.ArgumentParser(description="SkinGPT-R1 Multi-turn Chat")
-    parser.add_argument("--model_path", type=str, default="../checkpoint")
     parser.add_argument("--image", type=str, required=True, help="Path to initial image")
-    args = parser.parse_args()
-    # 初始化模型
-    bot = SkinGPTModel(args.model_path)
-    # 初始化对话历史
-    # 系统提示词
-    system_prompt = "You are a professional AI dermatology assistant. Analyze the skin condition carefully."
-    # 构造第一条包含图片的消息
-    if not os.path.exists(args.image):
         print(f"Error: Image {args.image} not found.")
         return
-    history = [
-        {
-            "role": "user",
-            "content": [
-                {"type": "image", "image": args.image},
-                {"type": "text", "text": f"{system_prompt}\n\nPlease analyze this image."}
-            ]
-        }
-    ]
-    print("\n=== SkinGPT-R1 Chat (Type 'exit' to quit) ===")
     print(f"Image loaded: {args.image}")
-    # 获取第一轮诊断
     print("\nModel is thinking...", end="", flush=True)
-    response = bot.generate_response(history)
     print(f"\rAssistant: {response}\n")
-    # 将助手的回复加入历史
     history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
-    # 进入多轮对话循环
     while True:
         try:
             user_input = input("User: ")
@@ -51,18 +54,14 @@ def main():
             if not user_input.strip():
                 continue
-            # 加入用户的新问题
             history.append({"role": "user", "content": [{"type": "text", "text": user_input}]})
             print("Model is thinking...", end="", flush=True)
-            response = bot.generate_response(history)
             print(f"\rAssistant: {response}\n")
-            # 加入助手的新回复
             history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
         except KeyboardInterrupt:
             break
 if __name__ == "__main__":
-    main()

+from __future__ import annotations
 import argparse
+from pathlib import Path
+try:
+    from .model_utils import (
+        DEFAULT_MODEL_PATH,
+        QuantizedSkinGPTModel,
+        build_single_turn_messages,
+    )
+except ImportError:
+    from model_utils import (
+        DEFAULT_MODEL_PATH,
+        QuantizedSkinGPTModel,
+        build_single_turn_messages,
+    )
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="SkinGPT-R1 INT4 multi-turn chat")
+    parser.add_argument("--model_path", type=str, default=DEFAULT_MODEL_PATH)
     parser.add_argument("--image", type=str, required=True, help="Path to initial image")
+    return parser
+def main() -> None:
+    args = build_parser().parse_args()
+    if not Path(args.image).exists():
         print(f"Error: Image {args.image} not found.")
         return
+    model = QuantizedSkinGPTModel(args.model_path)
+    history = build_single_turn_messages(
+        args.image,
+        "Please analyze this image.",
+        system_prompt="You are a professional AI dermatology assistant. Analyze the skin condition carefully.",
+    )
+    print("\n=== SkinGPT-R1 INT4 Chat (Type 'exit' to quit) ===")
     print(f"Image loaded: {args.image}")
     print("\nModel is thinking...", end="", flush=True)
+    response = model.generate_response(history)
     print(f"\rAssistant: {response}\n")
     history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
     while True:
         try:
             user_input = input("User: ")
             if not user_input.strip():
                 continue
             history.append({"role": "user", "content": [{"type": "text", "text": user_input}]})
             print("Model is thinking...", end="", flush=True)
+            response = model.generate_response(history)
             print(f"\rAssistant: {response}\n")
             history.append({"role": "assistant", "content": [{"type": "text", "text": response}]})
         except KeyboardInterrupt:
             break
 if __name__ == "__main__":
+    main()

inference/int4_quantized/infer.py ADDED Viewed

	@@ -0,0 +1,82 @@

+from __future__ import annotations
+import argparse
+import time
+from pathlib import Path
+try:
+    from .model_utils import (
+        DEFAULT_DO_SAMPLE,
+        DEFAULT_MODEL_PATH,
+        DEFAULT_MAX_NEW_TOKENS,
+        DEFAULT_PROMPT,
+        DEFAULT_REPETITION_PENALTY,
+        DEFAULT_TEMPERATURE,
+        DEFAULT_TOP_P,
+        QuantizedSkinGPTModel,
+        build_single_turn_messages,
+    )
+except ImportError:
+    from model_utils import (
+        DEFAULT_DO_SAMPLE,
+        DEFAULT_MODEL_PATH,
+        DEFAULT_MAX_NEW_TOKENS,
+        DEFAULT_PROMPT,
+        DEFAULT_REPETITION_PENALTY,
+        DEFAULT_TEMPERATURE,
+        DEFAULT_TOP_P,
+        QuantizedSkinGPTModel,
+        build_single_turn_messages,
+    )
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="SkinGPT-R1 INT4 inference")
+    parser.add_argument("--model_path", type=str, default=DEFAULT_MODEL_PATH)
+    parser.add_argument("--image_path", type=str, required=True, help="Path to the test image")
+    parser.add_argument("--prompt", type=str, default=DEFAULT_PROMPT, help="Prompt for diagnosis")
+    parser.add_argument("--max_new_tokens", type=int, default=DEFAULT_MAX_NEW_TOKENS)
+    parser.add_argument("--do_sample", action="store_true", default=DEFAULT_DO_SAMPLE)
+    parser.add_argument("--temperature", type=float, default=DEFAULT_TEMPERATURE)
+    parser.add_argument("--top_p", type=float, default=DEFAULT_TOP_P)
+    parser.add_argument("--repetition_penalty", type=float, default=DEFAULT_REPETITION_PENALTY)
+    return parser
+def main() -> None:
+    args = build_parser().parse_args()
+    if not Path(args.image_path).exists():
+        print(f"Error: Image not found at {args.image_path}")
+        return
+    print("=== [1] Initializing INT4 Quantization ===")
+    print("BitsAndBytesConfig will be applied during model loading.")
+    print("=== [2] Loading Model and Processor ===")
+    start_load = time.time()
+    model = QuantizedSkinGPTModel(args.model_path)
+    print(f"Model loaded in {time.time() - start_load:.2f} seconds.")
+    print("=== [3] Preparing Input ===")
+    messages = build_single_turn_messages(args.image_path, args.prompt)
+    print("=== [4] Generating Response ===")
+    start_infer = time.time()
+    output_text = model.generate_response(
+        messages,
+        max_new_tokens=args.max_new_tokens,
+        do_sample=args.do_sample,
+        temperature=args.temperature,
+        top_p=args.top_p,
+        repetition_penalty=args.repetition_penalty,
+    )
+    print(f"Inference completed in {time.time() - start_infer:.2f} seconds.")
+    print("\n================ MODEL OUTPUT ================\n")
+    print(output_text)
+    print("\n==============================================\n")
+if __name__ == "__main__":
+    main()

inference/int4_quantized/model_utils.py ADDED Viewed

	@@ -0,0 +1,538 @@

+from __future__ import annotations
+from pathlib import Path
+from threading import Thread
+from typing import Optional, Tuple
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from qwen_vl_utils import process_vision_info
+from transformers import (
+    AutoProcessor,
+    BitsAndBytesConfig,
+    StoppingCriteria,
+    StoppingCriteriaList,
+    TextIteratorStreamer,
+)
+from transformers.models.qwen2_5_vl.modeling_qwen2_5_vl import (
+    Qwen2_5_VLForConditionalGeneration,
+)
+DEFAULT_MODEL_PATH = "./checkpoints/int4"
+DEFAULT_SYSTEM_PROMPT = (
+    "You are a professional AI dermatology assistant. "
+    "Reason step by step, keep the reasoning concise, avoid repetition, "
+    "and always finish with <answer>...</answer>."
+)
+DEFAULT_MAX_NEW_TOKENS = 768
+DEFAULT_CONTINUE_TOKENS = 256
+DEFAULT_DO_SAMPLE = False
+DEFAULT_TEMPERATURE = 0.2
+DEFAULT_TOP_P = 0.9
+DEFAULT_REPETITION_PENALTY = 1.15
+DEFAULT_NO_REPEAT_NGRAM_SIZE = 3
+DEFAULT_PROMPT = (
+    "Act as a dermatologist. Analyze the visual features of this skin lesion "
+    "step by step, and provide a final diagnosis."
+)
+def resolve_model_path(model_path: str = DEFAULT_MODEL_PATH) -> str:
+    raw_path = Path(model_path).expanduser()
+    repo_root = Path(__file__).resolve().parents[2]
+    candidates = [raw_path]
+    if not raw_path.is_absolute():
+        candidates.append(Path.cwd() / raw_path)
+        candidates.append(repo_root / raw_path)
+        if raw_path.parts and raw_path.parts[0] == repo_root.name:
+            candidates.append(repo_root.joinpath(*raw_path.parts[1:]))
+    for candidate in candidates:
+        if candidate.exists():
+            return str(candidate)
+    return str(raw_path)
+def build_single_turn_messages(
+    image_path: str,
+    prompt: str,
+    system_prompt: str = DEFAULT_SYSTEM_PROMPT,
+) -> list[dict]:
+    return [
+        {
+            "role": "user",
+            "content": [
+                {"type": "image", "image": image_path},
+                {"type": "text", "text": f"{system_prompt}\n\n{prompt}"},
+            ],
+        }
+    ]
+def build_quantization_config() -> BitsAndBytesConfig:
+    return BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_quant_type="nf4",
+        bnb_4bit_compute_dtype=torch.bfloat16,
+        bnb_4bit_use_double_quant=True,
+    )
+def resolve_quantized_device_map():
+    if not torch.cuda.is_available():
+        raise RuntimeError("INT4 quantized inference requires a CUDA GPU.")
+    return {"": f"cuda:{torch.cuda.current_device()}"}
+class StopOnTokenSequence(StoppingCriteria):
+    def __init__(self, stop_ids: list[int]):
+        super().__init__()
+        self.stop_ids = stop_ids
+        self.stop_length = len(stop_ids)
+    def __call__(self, input_ids, scores, **kwargs) -> bool:
+        if self.stop_length == 0 or input_ids.shape[1] < self.stop_length:
+            return False
+        return input_ids[0, -self.stop_length :].tolist() == self.stop_ids
+class ExpertBlock(nn.Module):
+    def __init__(self, hidden_dim, bottleneck_dim=64):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(hidden_dim, bottleneck_dim),
+            nn.ReLU(),
+            nn.Linear(bottleneck_dim, hidden_dim),
+        )
+    def forward(self, x):
+        return self.net(x)
+class SkinAwareMoEAdapter(nn.Module):
+    def __init__(self, hidden_dim, num_experts=8, top_k=2, bottleneck_dim=64):
+        super().__init__()
+        self.num_experts = num_experts
+        self.top_k = top_k
+        self.router_img = nn.Linear(hidden_dim, num_experts, bias=False)
+        self.router_skin = nn.Linear(3, num_experts, bias=False)
+        self.experts = nn.ModuleList(
+            [ExpertBlock(hidden_dim, bottleneck_dim) for _ in range(num_experts)]
+        )
+    def forward(self, x: torch.Tensor, skin_probs: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        img_logits = self.router_img(x)
+        skin_bias = self.router_skin(skin_probs)
+        router_logits = img_logits + skin_bias
+        router_probs = F.softmax(router_logits, dim=-1)
+        top_k_probs, top_k_indices = torch.topk(router_probs, self.top_k, dim=-1)
+        top_k_probs = top_k_probs / (top_k_probs.sum(dim=-1, keepdim=True) + 1e-6)
+        final_output = torch.zeros_like(x)
+        for expert_idx, expert in enumerate(self.experts):
+            expert_mask = top_k_indices == expert_idx
+            if expert_mask.any():
+                rows, k_indices = torch.where(expert_mask)
+                inp = x[rows]
+                out = expert(inp)
+                weights = top_k_probs[rows, k_indices].unsqueeze(-1)
+                final_output.index_add_(0, rows, (out * weights).to(final_output.dtype))
+        mean_prob = router_probs.mean(0)
+        mask_all = torch.zeros_like(router_probs)
+        mask_all.scatter_(1, top_k_indices, 1.0)
+        mean_freq = mask_all.mean(0)
+        aux_loss = (mean_prob * mean_freq).sum() * self.num_experts
+        return x + final_output, aux_loss
+class PatchDistillHead(nn.Module):
+    def __init__(
+        self,
+        embed_dim: int = 1024,
+        adapter_layers: int = 4,
+        in_dim: Optional[int] = None,
+        out_dim: Optional[int] = None,
+        num_experts: int = 8,
+        top_k: int = 2,
+    ):
+        super().__init__()
+        self.embed_dim = embed_dim
+        self.in_proj = None if in_dim is None else nn.Linear(in_dim, embed_dim, bias=False)
+        self.skin_classifier = nn.Sequential(
+            nn.Linear(embed_dim, 64),
+            nn.ReLU(),
+            nn.Linear(64, 3),
+        )
+        self.adapters = nn.ModuleList(
+            [
+                SkinAwareMoEAdapter(embed_dim, num_experts=num_experts, top_k=top_k)
+                for _ in range(adapter_layers)
+            ]
+        )
+        self.out_proj: nn.Module = (
+            nn.Identity() if out_dim is None else nn.Linear(embed_dim, out_dim)
+        )
+    def _ensure_in_proj(self, din: int, device, dtype):
+        if self.in_proj is None:
+            self.in_proj = nn.Linear(din, self.embed_dim, bias=False).to(device=device, dtype=dtype)
+    def forward(self, pixel_values: torch.Tensor, image_grid_thw: torch.Tensor) -> dict:
+        _, din = pixel_values.shape
+        counts = (image_grid_thw[:, 0] * image_grid_thw[:, 1] * image_grid_thw[:, 2]).tolist()
+        device, dtype = pixel_values.device, pixel_values.dtype
+        self._ensure_in_proj(din, device, dtype)
+        chunks = torch.split(pixel_values, counts, dim=0)
+        pooled, all_skin_logits = [], []
+        total_aux_loss = torch.tensor(0.0, device=device, dtype=dtype)
+        for x in chunks:
+            h = self.in_proj(x)
+            global_feat = h.mean(dim=0, keepdim=True)
+            skin_logits = self.skin_classifier(global_feat)
+            skin_probs = F.softmax(skin_logits, dim=-1)
+            all_skin_logits.append(skin_logits)
+            skin_probs_expanded = skin_probs.expand(h.size(0), -1)
+            for adapter in self.adapters:
+                h, layer_loss = adapter(h, skin_probs_expanded)
+                total_aux_loss += layer_loss
+            pooled.append(h.mean(dim=0))
+        vision_embed = torch.stack(pooled, dim=0)
+        vision_proj = self.out_proj(vision_embed)
+        return {
+            "vision_embed": vision_embed,
+            "vision_proj": vision_proj,
+            "aux_loss": total_aux_loss,
+            "skin_logits": torch.cat(all_skin_logits, dim=0),
+        }
+    def configure_out_dim(self, out_dim: int):
+        if isinstance(self.out_proj, nn.Linear) and self.out_proj.out_features == out_dim:
+            return
+        self.out_proj = (
+            nn.Linear(self.embed_dim, out_dim, bias=False)
+            if out_dim != self.embed_dim
+            else nn.Identity()
+        )
+        try:
+            params = next(self.parameters())
+            self.out_proj.to(device=params.device, dtype=params.dtype)
+        except StopIteration:
+            pass
+class SkinVLModelWithAdapter(Qwen2_5_VLForConditionalGeneration):
+    def __init__(self, config):
+        super().__init__(config)
+        self.distill_head = PatchDistillHead(
+            embed_dim=1024,
+            adapter_layers=4,
+            num_experts=8,
+            top_k=2,
+            in_dim=1176,
+        )
+        bottleneck = 64
+        self.text_bias = nn.Sequential(
+            nn.Linear(1024, bottleneck, bias=False),
+            nn.Tanh(),
+            nn.Linear(bottleneck, config.hidden_size, bias=False),
+        )
+        self.logit_bias_scale = nn.Parameter(torch.tensor(2.5, dtype=torch.bfloat16))
+    def forward(self, *args, **kwargs):
+        skin_vocab_mask = kwargs.pop("skin_vocab_mask", None)
+        skin_labels = kwargs.get("skin_labels", None)
+        pixel_values = kwargs.get("pixel_values", None)
+        image_grid_thw = kwargs.get("image_grid_thw", None)
+        if isinstance(pixel_values, list):
+            try:
+                pixel_values = torch.stack(pixel_values)
+                kwargs["pixel_values"] = pixel_values
+            except Exception:
+                pass
+        outputs = super().forward(*args, **kwargs)
+        vision_embed = None
+        loss_skin = torch.tensor(0.0, device=outputs.logits.device)
+        aux_loss = torch.tensor(0.0, device=outputs.logits.device)
+        if pixel_values is not None and image_grid_thw is not None:
+            if not isinstance(pixel_values, torch.Tensor):
+                if isinstance(pixel_values, list):
+                    pixel_values = torch.stack(pixel_values)
+                else:
+                    pixel_values = torch.tensor(pixel_values)
+            image_grid_thw = image_grid_thw.to(pixel_values.device)
+            side = self.distill_head(pixel_values=pixel_values, image_grid_thw=image_grid_thw)
+            vision_embed = side["vision_embed"]
+            aux_loss = side["aux_loss"]
+            if skin_labels is not None:
+                skin_labels = skin_labels.to(side["skin_logits"].device)
+                loss_skin = nn.CrossEntropyLoss()(side["skin_logits"], skin_labels)
+            setattr(outputs, "vision_embed", vision_embed)
+            setattr(outputs, "vision_proj", side["vision_proj"])
+            setattr(outputs, "loss_skin", loss_skin)
+            setattr(outputs, "aux_loss", aux_loss)
+            setattr(outputs, "skin_logits", side["skin_logits"])
+            pack_vision_proj = (
+                side["vision_proj"]
+                if side["vision_proj"] is not None
+                else torch.tensor(0.0, device=aux_loss.device)
+            )
+            pack_skin_logits = (
+                side["skin_logits"]
+                if side["skin_logits"] is not None
+                else torch.tensor(0.0, device=aux_loss.device)
+            )
+            outputs.attentions = (pack_vision_proj, aux_loss, pack_skin_logits)
+            self.latest_side_output = {
+                "vision_proj": side["vision_proj"],
+                "aux_loss": aux_loss,
+                "skin_logits": side["skin_logits"],
+            }
+        if hasattr(outputs, "logits") and vision_embed is not None and skin_vocab_mask is not None:
+            bias_features = self.text_bias(vision_embed.to(self.logit_bias_scale.dtype))
+            lm_weight = self.lm_head.weight.to(bias_features.dtype)
+            vocab_bias = F.linear(bias_features, lm_weight)
+            scale = self.logit_bias_scale.to(outputs.logits.dtype)
+            outputs.logits = outputs.logits + (scale * vocab_bias[:, None, :] * skin_vocab_mask)
+        if outputs.loss is not None:
+            outputs.loss = outputs.loss + loss_skin + (0.01 * aux_loss)
+        return outputs
+    def freeze_all_but_distill(self):
+        self.requires_grad_(False)
+        for params in self.distill_head.parameters():
+            params.requires_grad_(True)
+        for params in self.text_bias.parameters():
+            params.requires_grad_(True)
+        self.logit_bias_scale.requires_grad_(True)
+    def configure_out_dim(self, out_dim: int):
+        self.distill_head.configure_out_dim(out_dim)
+    def project_only(self, vision_embed: torch.Tensor) -> torch.Tensor:
+        return self.distill_head.out_proj(vision_embed)
+def load_quantized_model_and_processor(model_path: str = DEFAULT_MODEL_PATH):
+    resolved_model_path = resolve_model_path(model_path)
+    quantization_config = build_quantization_config()
+    model = SkinVLModelWithAdapter.from_pretrained(
+        resolved_model_path,
+        device_map=resolve_quantized_device_map(),
+        quantization_config=quantization_config,
+        attn_implementation="sdpa",
+    )
+    model.eval()
+    processor = AutoProcessor.from_pretrained(
+        resolved_model_path,
+        min_pixels=256 * 28 * 28,
+        max_pixels=1280 * 28 * 28,
+    )
+    return model, processor
+def get_model_device(model) -> torch.device:
+    try:
+        return model.device
+    except AttributeError:
+        return next(model.parameters()).device
+def prepare_inputs(processor, model, messages: list[dict]):
+    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+    image_inputs, video_inputs = process_vision_info(messages)
+    inputs = processor(
+        text=[text],
+        images=image_inputs,
+        videos=video_inputs,
+        padding=True,
+        return_tensors="pt",
+    ).to(get_model_device(model))
+    inputs.pop("mm_token_type_ids", None)
+    return inputs
+class QuantizedSkinGPTModel:
+    def __init__(self, model_path: str = DEFAULT_MODEL_PATH):
+        resolved_model_path = resolve_model_path(model_path)
+        print(f"Loading INT4 model from {resolved_model_path}...")
+        self.model, self.processor = load_quantized_model_and_processor(resolved_model_path)
+        self.model_path = resolved_model_path
+        self.device = get_model_device(self.model)
+        self.stop_ids = self.processor.tokenizer.encode("</answer>", add_special_tokens=False)
+        print(f"Model loaded successfully on {self.device}.")
+    @staticmethod
+    def has_complete_answer(text: str) -> bool:
+        return "<answer>" in text and "</answer>" in text
+    def _build_generation_kwargs(
+        self,
+        inputs,
+        max_new_tokens: int,
+        do_sample: bool,
+        temperature: float,
+        repetition_penalty: float,
+        top_p: float,
+        no_repeat_ngram_size: int,
+        streamer=None,
+    ) -> dict:
+        generation_kwargs = {
+            **inputs,
+            "max_new_tokens": max_new_tokens,
+            "do_sample": do_sample,
+            "repetition_penalty": repetition_penalty,
+            "no_repeat_ngram_size": no_repeat_ngram_size,
+            "use_cache": True,
+            "stopping_criteria": StoppingCriteriaList([StopOnTokenSequence(self.stop_ids)]),
+        }
+        if streamer is not None:
+            generation_kwargs["streamer"] = streamer
+        if do_sample:
+            generation_kwargs["temperature"] = temperature
+            generation_kwargs["top_p"] = top_p
+        return generation_kwargs
+    def _generate_text(
+        self,
+        messages,
+        max_new_tokens: int,
+        do_sample: bool,
+        temperature: float,
+        repetition_penalty: float,
+        top_p: float,
+        no_repeat_ngram_size: int,
+    ) -> str:
+        inputs = prepare_inputs(self.processor, self.model, messages)
+        generation_kwargs = self._build_generation_kwargs(
+            inputs=inputs,
+            max_new_tokens=max_new_tokens,
+            do_sample=do_sample,
+            temperature=temperature,
+            repetition_penalty=repetition_penalty,
+            top_p=top_p,
+            no_repeat_ngram_size=no_repeat_ngram_size,
+        )
+        with torch.inference_mode():
+            generated_ids = self.model.generate(**generation_kwargs)
+        generated_ids_trimmed = [
+            out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+        ]
+        output_text = self.processor.batch_decode(
+            generated_ids_trimmed,
+            skip_special_tokens=True,
+            clean_up_tokenization_spaces=False,
+        )
+        return output_text[0]
+    def generate_response(
+        self,
+        messages,
+        max_new_tokens: int = DEFAULT_MAX_NEW_TOKENS,
+        continue_tokens: int = DEFAULT_CONTINUE_TOKENS,
+        do_sample: bool = DEFAULT_DO_SAMPLE,
+        temperature: float = DEFAULT_TEMPERATURE,
+        repetition_penalty: float = DEFAULT_REPETITION_PENALTY,
+        top_p: float = DEFAULT_TOP_P,
+        no_repeat_ngram_size: int = DEFAULT_NO_REPEAT_NGRAM_SIZE,
+    ) -> str:
+        output_text = self._generate_text(
+            messages=messages,
+            max_new_tokens=max_new_tokens,
+            do_sample=do_sample,
+            temperature=temperature,
+            repetition_penalty=repetition_penalty,
+            top_p=top_p,
+            no_repeat_ngram_size=no_repeat_ngram_size,
+        )
+        if not self.has_complete_answer(output_text) and continue_tokens > 0:
+            output_text = self._generate_text(
+                messages=messages,
+                max_new_tokens=max_new_tokens + continue_tokens,
+                do_sample=do_sample,
+                temperature=temperature,
+                repetition_penalty=repetition_penalty,
+                top_p=top_p,
+                no_repeat_ngram_size=no_repeat_ngram_size,
+            )
+        return output_text
+    def generate_response_stream(
+        self,
+        messages,
+        max_new_tokens: int = DEFAULT_MAX_NEW_TOKENS,
+        continue_tokens: int = DEFAULT_CONTINUE_TOKENS,
+        do_sample: bool = DEFAULT_DO_SAMPLE,
+        temperature: float = DEFAULT_TEMPERATURE,
+        repetition_penalty: float = DEFAULT_REPETITION_PENALTY,
+        top_p: float = DEFAULT_TOP_P,
+        no_repeat_ngram_size: int = DEFAULT_NO_REPEAT_NGRAM_SIZE,
+    ):
+        inputs = prepare_inputs(self.processor, self.model, messages)
+        streamer = TextIteratorStreamer(
+            self.processor.tokenizer,
+            skip_prompt=True,
+            skip_special_tokens=True,
+        )
+        generation_kwargs = self._build_generation_kwargs(
+            inputs=inputs,
+            max_new_tokens=max_new_tokens,
+            do_sample=do_sample,
+            temperature=temperature,
+            repetition_penalty=repetition_penalty,
+            top_p=top_p,
+            no_repeat_ngram_size=no_repeat_ngram_size,
+            streamer=streamer,
+        )
+        def _generate():
+            with torch.inference_mode():
+                self.model.generate(**generation_kwargs)
+        thread = Thread(target=_generate)
+        thread.start()
+        partial_chunks = []
+        for text_chunk in streamer:
+            partial_chunks.append(text_chunk)
+            yield text_chunk
+        thread.join()
+        partial_text = "".join(partial_chunks)
+        if not self.has_complete_answer(partial_text) and continue_tokens > 0:
+            completed_text = self._generate_text(
+                messages=messages,
+                max_new_tokens=max_new_tokens + continue_tokens,
+                do_sample=do_sample,
+                temperature=temperature,
+                repetition_penalty=repetition_penalty,
+                top_p=top_p,
+                no_repeat_ngram_size=no_repeat_ngram_size,
+            )
+            if completed_text.startswith(partial_text):
+                tail_text = completed_text[len(partial_text) :]
+                if tail_text:
+                    yield tail_text

inference/int4_quantized/run_api.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+set -euo pipefail
+PYTHON_EXE="${PYTHON_EXE:-python}"
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+"${PYTHON_EXE}" "${SCRIPT_DIR}/app.py"

inference/int4_quantized/run_chat.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+set -euo pipefail
+PYTHON_EXE="${PYTHON_EXE:-python}"
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+"${PYTHON_EXE}" "${SCRIPT_DIR}/chat.py" "$@"

inference/int4_quantized/run_infer.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+set -euo pipefail
+PYTHON_EXE="${PYTHON_EXE:-python}"
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+"${PYTHON_EXE}" "${SCRIPT_DIR}/infer.py" "$@"

inference/int4_quantized/test_single.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+set -euo pipefail
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+"${SCRIPT_DIR}/run_infer.sh" "$@"

inference/temp_uploads/.ipynb_checkpoints/temp_d2b1c6f9a43940d2812f10a8cc8bc3ef-checkpoint.jpg DELETED Viewed

Binary file (47.9 kB)

inference/temp_uploads/.ipynb_checkpoints/user_1769671453128_43ccc61bfcb64c6bbbabbadfa887591c-checkpoint.jpg DELETED Viewed

Binary file (80.7 kB)

inference/temp_uploads/temp_d2b1c6f9a43940d2812f10a8cc8bc3ef.jpg DELETED Viewed

Binary file (47.9 kB)

inference/temp_uploads/user_1769671453128_43ccc61bfcb64c6bbbabbadfa887591c.jpg DELETED Viewed

Binary file (80.7 kB)

requirements.txt CHANGED Viewed

@@ -14,6 +14,10 @@ fastapi>=0.100.0
 uvicorn>=0.20.0
 python-multipart>=0.0.6
 openai>=1.0.0  # For DeepSeek API (OpenAI-compatible)
 # Install latest transformers from source (Required for Qwen2.5-VL/Vision-R1)
 git+https://github.com/huggingface/transformers.git
@@ -23,4 +27,4 @@ git+https://github.com/huggingface/transformers.git
 # For potential future demo usage
 gradio==5.4.0
-gradio_client==1.4.2

 uvicorn>=0.20.0
 python-multipart>=0.0.6
 openai>=1.0.0  # For DeepSeek API (OpenAI-compatible)
+bitsandbytes>=0.43.0  # Required for INT4 quantized inference
+# Attention notes:
+# - SDPA is built into PyTorch 2.x
+# - flash-attn is optional and mainly useful on GPUs officially supported by the project
 # Install latest transformers from source (Required for Qwen2.5-VL/Vision-R1)
 git+https://github.com/huggingface/transformers.git
 # For potential future demo usage
 gradio==5.4.0
+gradio_client==1.4.2