Spaces:
Running
ai-provider-test-report
Generated: 2026-04-05 02:23:10
Test Duration: 23.50s
summary
- Total Tests: 10
- Passed: 9
- Failed: 1
- Success Rate: 90.0%
test-results
1-code-generation-pass
Task Type: code
Provider: nvidia
Model: llama-3.3-70b
Duration: 5.41s
Prompt:
Write a Python function to calculate fibonacci numbers recursively.
Response:
### Recursive Fibonacci Function in Python
#### Overview
The Fibonacci sequence is a series of numbers where a number is the sum of the two preceding ones, usually starting with 0 and 1. This function calculates the nth Fibonacci number using recursion.
#### Code
```python
def fibonacci(n):
"""
Calculate the nth Fibonacci number recursively.
Args:
n (int): The position of the Fibonacci number to calculate.
Returns:
int: The nth Fibonacci number.
Raises:
...
Metadata:
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 429
2. Data Extraction PASS
Task Type: extraction
Provider: groq
Model: llama-3.3-70b-versatile
Duration: 0.78s
Prompt:
Extract the key information from this text: 'John Doe, age 35, lives in New York and works as a software engineer at Tech Corp since 2020.'
Response:
The key information extracted from the text is:
1. **Name**: John Doe
2. **Age**: 35
3. **Location**: New York
4. **Occupation**: Software Engineer
5. **Employer**: Tech Corp
6. **Employment Start Date**: 2020
Metadata:
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 132
3. Reasoning Task PASS
Task Type: reasoning
Provider: nvidia
Model: devstral-2-123b
Duration: 5.25s
Prompt:
If a train travels 120 miles in 2 hours, and another train travels 180 miles in 3 hours, which train is faster and by how much?
Response:
To determine which train is faster and by how much, we'll calculate the speed of each train using the formula:
\[
\text{Speed} = \frac{\text{Distance}}{\text{Time}}
\]
### first-train
- **Distance:** 120 miles
- **Time:** 2 hours
\[
\text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
\]
### second-train
- **Distance:** 180 miles
- **Time:** 3 hours
\[
\text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
\]
### comparison
Both tr...
Metadata:
- model_used: devstral-2-123b
- provider_used: nvidia
- tokens: 251
4. General Question PASS
Task Type: general
Provider: groq
Model: llama-3.3-70b-versatile
Duration: 0.92s
Prompt:
What are the three primary colors?
Response:
The three primary colors are:
1. Red
2. Blue
3. Yellow
These colors cannot be created by mixing other colors together, and they are the base for creating all other colors.
Metadata:
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 83
5. JSON Generation PASS
Task Type: code
Provider: nvidia
Model: devstral-2-123b
Duration: 4.44s
Prompt:
Generate a JSON object representing a user profile with name, email, age, and interests (array).
Response:
Here's a JSON object representing a user profile with the specified fields:
```json
{
"userProfile": {
"name": "John Doe",
"email": "john.doe@example.com",
"age": 30,
"interests": ["hiking", "photography", "reading", "cooking"]
}
}
explanation
- "name": A string representing the user's full name.
- "email": A string representing the user's email address.
- "age": A number representing the user's age.
- "interests": An array of strings listing the us...
**Metadata:**
- model_used: devstral-2-123b
- provider_used: nvidia
- tokens: 187
---
### 6. Text Summarization PASS
**Task Type:** general
**Provider:** groq
**Model:** llama-3.3-70b-versatile
**Duration:** 0.33s
**Prompt:**
Summarize in one sentence: Artificial intelligence is transforming industries by automating tasks, improving decision-making, and creating new opportunities for innovation.
**Response:**
Artificial intelligence is revolutionizing various industries by automating tasks, enhancing decision-making, and driving innovation, ultimately transforming the way businesses operate and creating new opportunities.
**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 96
---
### 7. Math Problem PASS
**Task Type:** reasoning
**Provider:** nvidia
**Model:** llama-3.3-70b
**Duration:** 2.71s
**Prompt:**
Solve: 2x + 5 = 15. Show your work.
**Response:**
To solve for x, we need to isolate x on one side of the equation.
2x + 5 = 15
First, subtract 5 from both sides of the equation: 2x + 5 - 5 = 15 - 5 2x = 10
Next, divide both sides of the equation by 2: 2x / 2 = 10 / 2 x = 5
Therefore, the value of x is 5.
**Metadata:**
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 152
---
### 8. Creative Writing PASS
**Task Type:** general
**Provider:** nvidia
**Model:** llama-3.3-70b
**Duration:** 1.33s
**Prompt:**
Write a haiku about coding at night.
**Response:**
Moonlit code unfurls Midnight keyboard symphony Glowing screen delight
**Metadata:**
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 62
---
### 9. Code Debug PASS
**Task Type:** code
**Provider:** groq
**Model:** llama-3.3-70b-versatile
**Duration:** 0.76s
**Prompt:**
Find the bug in this code: def add(a, b): return a + b + 1
**Response:**
Bug Identification
The bug in the given code is that the function add(a, b) is supposed to return the sum of two numbers a and b, but it is currently adding an extra 1 to the result.
Corrected Code
def add(a, b):
"""Returns the sum of two numbers."""
return a + b
Explanation
The original code return a + b + 1 is incrementing the sum by 1, which is not the expected behavior of an add function. The corrected code simply returns the sum of a an...
**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 219
---
### 10. Complex Reasoning FAIL
**Task Type:** reasoning
**Provider:** nvidia
**Model:** devstral-2-123b
**Duration:** 1.56s
**Prompt:**
If all roses are flowers, and some flowers fade quickly, can we conclude that some roses fade quickly?
**Error:**
[router] All models failed. Last error: [nvidia] NVIDIA API error (500): {"error":{"message":"EngineCore encountered an issue. See stack trace (above) for the root cause.","type":"Internal Server Error","param":null,"code":500}}
---
## Provider Performance
| Provider | Tests | Passed | Failed | Success Rate | Avg Duration |
|----------|-------|--------|--------|--------------|-------------|
| groq | 4 | 4 | 0 | 100.0% | 0.70s |
| nvidia | 6 | 5 | 1 | 83.3% | 3.45s |
## document-flow
```mermaid
flowchart TD
A[document] --> B[key-sections]
B --> C[implementation]
B --> D[operations]
B --> E[validation]
related-api-reference
| item | value |
|---|---|
| api-reference | api-reference.md |