Spaces:

NeerajCodz
/

scrapeRL

Running

App Files Files Community

scrapeRL / docs /test /ai-provider-test-report.md

NeerajCodz

docs: init proto

24f0bf0 12 days ago

preview code

raw

history blame contribute delete

7.3 kB

ai-provider-test-report

Generated: 2026-04-05 02:23:10
Test Duration: 23.50s

summary

Total Tests: 10
Passed: 9
Failed: 1
Success Rate: 90.0%

test-results

1-code-generation-pass

Task Type: code
Provider: nvidia
Model: llama-3.3-70b
Duration: 5.41s

Prompt:

Write a Python function to calculate fibonacci numbers recursively.

Response:

### Recursive Fibonacci Function in Python
#### Overview

The Fibonacci sequence is a series of numbers where a number is the sum of the two preceding ones, usually starting with 0 and 1. This function calculates the nth Fibonacci number using recursion.

#### Code

```python
def fibonacci(n):
    """
    Calculate the nth Fibonacci number recursively.

    Args:
        n (int): The position of the Fibonacci number to calculate.

    Returns:
        int: The nth Fibonacci number.

    Raises:
...

Metadata:

model_used: llama-3.3-70b
provider_used: nvidia
tokens: 429

2. Data Extraction PASS

Task Type: extraction
Provider: groq
Model: llama-3.3-70b-versatile
Duration: 0.78s

Prompt:

Extract the key information from this text: 'John Doe, age 35, lives in New York and works as a software engineer at Tech Corp since 2020.'

Response:

The key information extracted from the text is:

1. **Name**: John Doe
2. **Age**: 35
3. **Location**: New York
4. **Occupation**: Software Engineer
5. **Employer**: Tech Corp
6. **Employment Start Date**: 2020

Metadata:

model_used: llama-3.3-70b-versatile
provider_used: groq
tokens: 132

3. Reasoning Task PASS

Task Type: reasoning
Provider: nvidia
Model: devstral-2-123b
Duration: 5.25s

Prompt:

If a train travels 120 miles in 2 hours, and another train travels 180 miles in 3 hours, which train is faster and by how much?

Response:

To determine which train is faster and by how much, we'll calculate the speed of each train using the formula:

\[
\text{Speed} = \frac{\text{Distance}}{\text{Time}}
\]

### first-train
- **Distance:** 120 miles
- **Time:** 2 hours

\[
\text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
\]

### second-train
- **Distance:** 180 miles
- **Time:** 3 hours

\[
\text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
\]

### comparison
Both tr...

Metadata:

model_used: devstral-2-123b
provider_used: nvidia
tokens: 251

4. General Question PASS

Task Type: general
Provider: groq
Model: llama-3.3-70b-versatile
Duration: 0.92s

Prompt:

What are the three primary colors?

Response:

The three primary colors are:
1. Red
2. Blue
3. Yellow

These colors cannot be created by mixing other colors together, and they are the base for creating all other colors.

Metadata:

model_used: llama-3.3-70b-versatile
provider_used: groq
tokens: 83

5. JSON Generation PASS

Task Type: code
Provider: nvidia
Model: devstral-2-123b
Duration: 4.44s

Prompt:

Generate a JSON object representing a user profile with name, email, age, and interests (array).

Response:

Here's a JSON object representing a user profile with the specified fields:

```json
{
  "userProfile": {
    "name": "John Doe",
    "email": "john.doe@example.com",
    "age": 30,
    "interests": ["hiking", "photography", "reading", "cooking"]
  }
}

explanation

"name": A string representing the user's full name.
"email": A string representing the user's email address.
"age": A number representing the user's age.
"interests": An array of strings listing the us...


**Metadata:**
- model_used: devstral-2-123b
- provider_used: nvidia
- tokens: 187

---

### 6. Text Summarization  PASS

**Task Type:** general  
**Provider:** groq  
**Model:** llama-3.3-70b-versatile  
**Duration:** 0.33s  

**Prompt:**

Summarize in one sentence: Artificial intelligence is transforming industries by automating tasks, improving decision-making, and creating new opportunities for innovation.


**Response:**

Artificial intelligence is revolutionizing various industries by automating tasks, enhancing decision-making, and driving innovation, ultimately transforming the way businesses operate and creating new opportunities.


**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 96

---

### 7. Math Problem  PASS

**Task Type:** reasoning  
**Provider:** nvidia  
**Model:** llama-3.3-70b  
**Duration:** 2.71s  

**Prompt:**

Solve: 2x + 5 = 15. Show your work.


**Response:**

To solve for x, we need to isolate x on one side of the equation.

2x + 5 = 15

First, subtract 5 from both sides of the equation: 2x + 5 - 5 = 15 - 5 2x = 10

Next, divide both sides of the equation by 2: 2x / 2 = 10 / 2 x = 5

Therefore, the value of x is 5.


**Metadata:**
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 152

---

### 8. Creative Writing  PASS

**Task Type:** general  
**Provider:** nvidia  
**Model:** llama-3.3-70b  
**Duration:** 1.33s  

**Prompt:**

Write a haiku about coding at night.


**Response:**

Moonlit code unfurls Midnight keyboard symphony Glowing screen delight


**Metadata:**
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 62

---

### 9. Code Debug  PASS

**Task Type:** code  
**Provider:** groq  
**Model:** llama-3.3-70b-versatile  
**Duration:** 0.76s  

**Prompt:**

Find the bug in this code: def add(a, b): return a + b + 1


**Response:**

Bug Identification

The bug in the given code is that the function add(a, b) is supposed to return the sum of two numbers a and b, but it is currently adding an extra 1 to the result.

Corrected Code

def add(a, b):
    """Returns the sum of two numbers."""
    return a + b

Explanation

The original code return a + b + 1 is incrementing the sum by 1, which is not the expected behavior of an add function. The corrected code simply returns the sum of a an...


**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 219

---

### 10. Complex Reasoning  FAIL

**Task Type:** reasoning  
**Provider:** nvidia  
**Model:** devstral-2-123b  
**Duration:** 1.56s  

**Prompt:**

If all roses are flowers, and some flowers fade quickly, can we conclude that some roses fade quickly?


**Error:**

[router] All models failed. Last error: [nvidia] NVIDIA API error (500): {"error":{"message":"EngineCore encountered an issue. See stack trace (above) for the root cause.","type":"Internal Server Error","param":null,"code":500}}


---

## Provider Performance

| Provider | Tests | Passed | Failed | Success Rate | Avg Duration |
|----------|-------|--------|--------|--------------|-------------|
| groq | 4 | 4 | 0 | 100.0% | 0.70s |
| nvidia | 6 | 5 | 1 | 83.3% | 3.45s |

## document-flow

```mermaid
flowchart TD
    A[document] --> B[key-sections]
    B --> C[implementation]
    B --> D[operations]
    B --> E[validation]

related-api-reference

item	value
api-reference	`api-reference.md`