Instructions to use internlm/Intern-S2-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use internlm/Intern-S2-Preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="internlm/Intern-S2-Preview", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("internlm/Intern-S2-Preview", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use internlm/Intern-S2-Preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "internlm/Intern-S2-Preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S2-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/internlm/Intern-S2-Preview

SGLang

How to use internlm/Intern-S2-Preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "internlm/Intern-S2-Preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S2-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "internlm/Intern-S2-Preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S2-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use internlm/Intern-S2-Preview with Docker Model Runner:
```
docker model run hf.co/internlm/Intern-S2-Preview
```

haijunlv commited on 7 days ago

Commit

87ce0b4

verified ·

1 Parent(s): f66c559

Upload README.md

Browse files

Files changed (1) hide show

README.md +118 -0

README.md CHANGED Viewed

@@ -36,6 +36,8 @@ By extending professional scientific tasks into a full-chain training pipeline f
 - **Efficient RL reasoning with MTP and CoT compression.** During RL, Intern-S2-Preview adopts shared-weight MTP with KL loss to reduce the mismatch between training and inference behavior, substantially improving MTP accept rate and token generation speed. It also introduces CoT compression techniques to shorten responses while preserving strong reasoning capability, achieving improvements in both performance and efficiency.
 <figure>
   <img src="./figs/efficiency.jpg" alt="efficient RL reasoning with MTP and CoT compression">
   <figcaption>Fig1: Reasoning Efficiency on Complex Math Benchmarks. Accuracy vs. Average Response Length. Intern-S2-Preview (red star) significantly outperforms trillion-scale Intern-S1-Pro (red circle), and achieving higher accuracy with better token efficiency among medium-size models.</figcaption>
@@ -290,6 +292,122 @@ print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))
 > Note: We do not recommend disabling thinking mode for agentic tasks.
 ## Agent Integration
 Intern-S2-Preview can be plugged into agent frameworks in two ways: connecting to a **self-hosted deployment**, or calling the **official InternLM API**. Below we cover both, with examples for agent frameworks (OpenClaw, Hermes, etc.) and for Claude Code.

 - **Efficient RL reasoning with MTP and CoT compression.** During RL, Intern-S2-Preview adopts shared-weight MTP with KL loss to reduce the mismatch between training and inference behavior, substantially improving MTP accept rate and token generation speed. It also introduces CoT compression techniques to shorten responses while preserving strong reasoning capability, achieving improvements in both performance and efficiency.
+- **Upgraded time-series Modeling** for better physical signal representation; supports long, heterogeneous time-series (10^0–10^6 points).
 <figure>
   <img src="./figs/efficiency.jpg" alt="efficient RL reasoning with MTP and CoT compression">
   <figcaption>Fig1: Reasoning Efficiency on Complex Math Benchmarks. Accuracy vs. Average Response Length. Intern-S2-Preview (red star) significantly outperforms trillion-scale Intern-S1-Pro (red circle), and achieving higher accuracy with better token efficiency among medium-size models.</figcaption>
 > Note: We do not recommend disabling thinking mode for agentic tasks.
+### Time Series Demo
+Time series inference is currently only supported in LMDeploy. To get started, download and deploy Intern-S2-Preview with LMDeploy by following the [Model Deployment Guide](./deployment_guide.md).
+Below is an example of detecting earthquake events from a time series signal file. Additional data types and functionalities are also supported.
+```
+from openai import OpenAI
+from lmdeploy.vl.time_series_utils import encode_time_series_base64
+openai_api_key = "EMPTY"
+openai_api_base = "http://0.0.0.0:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+model_name = client.models.list().data[0].id
+def send_base64(file_path: str, sampling_rate: int = 100):
+    """base64-encoded time-series data."""
+    # encode_time_series_base64 accepts local file paths and http urls,
+    # encoding time-series data (.npy, .csv, .wav, .mp3, .flac, etc.) into base64 strings.
+    base64_ts = encode_time_series_base64(file_path)
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please determine whether an Earthquake event has occurred in the provided time-series data. If so, please specify the starting time point indices of the P-wave and S-wave in the event."
+                },
+                {
+                    "type": "time_series_url",
+                    "time_series_url": {
+                        "url": f"data:time_series/npy;base64,{base64_ts}",
+                        "sampling_rate": sampling_rate
+                    },
+                },
+            ],
+        }
+    ]
+    return client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0,
+        max_tokens=200,
+    )
+def send_http_url(url: str, sampling_rate: int = 100):
+    """http(s) url pointing to the time-series data."""
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please determine whether an Earthquake event has occurred in the provided time-series data. If so, please specify the starting time point indices of the P-wave and S-wave in the event."
+                },
+                {
+                    "type": "time_series_url",
+                    "time_series_url": {
+                        "url": url,
+                        "sampling_rate": sampling_rate
+                    },
+                },
+            ],
+        }
+    ]
+    return client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0,
+        max_tokens=200,
+    )
+def send_file_url(file_path: str, sampling_rate: int = 100):
+    """file url pointing to the time-series data."""
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Please determine whether an Earthquake event has occurred in the provided time-series data. If so, please specify the starting time point indices of the P-wave and S-wave in the event."
+                },
+                {
+                    "type": "time_series_url",
+                    "time_series_url": {
+                        "url": f"file://{file_path}",
+                        "sampling_rate": sampling_rate
+                    },
+                },
+            ],
+        }
+    ]
+    return client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0,
+        max_tokens=200,
+    )
+response = send_base64("./0092638_seism.npy")
+# response = send_http_url("https://huggingface.co/internlm/Intern-S1-Pro/raw/main/0092638_seism.npy")
+# response = send_file_url("./0092638_seism.npy")
+print(response.choices[0].message)
+```
 ## Agent Integration
 Intern-S2-Preview can be plugged into agent frameworks in two ways: connecting to a **self-hosted deployment**, or calling the **official InternLM API**. Below we cover both, with examples for agent frameworks (OpenClaw, Hermes, etc.) and for Claude Code.