--- title: regression_evaluate datasets: - GeoBenchmark tags: - evaluate - metric description: 'TODO: add a description here' sdk: gradio sdk_version: 6.5.1 app_file: app.py pinned: false --- # Metric Card for regression_evaluate ## Metric Description This metric aims to evaluate regression tasks done by LMs. It expects the model to generate a list of numerical values to compare it to gold list of numerical values. ## How to Use This metric takes 2 mandatory arguments : `generations` (a list of string), `golds` (a list of list of floats). ```python import evaluate metric = evaluate.load("rfr2003/regression_evaluate") results = metric.compute(generations=['[150, 0]'], golds=[183, 177, 146, 85, 70, 78, 55, 17, 0, -1, -1]) print(results) {'precision': [4.0], 'recall': [344.0], 'macro-mean': [174.0], 'median macro-mean': 174.0} ``` This metric accepts one optional argument: `d`: function used to compute the distance between a generated value and a gold one. The default value is a function computing the absolute difference between two numbers. ### Output Values This metric outputs a dictionary with the following values: `precision`: Sum of the minimum distances between each predicted value and the set of gold values, computed for each question. `recall`: Sum of the minimum distances between each gold value and the set of generated values, computed for each question. `macro-mean`: Average between precision and recall, computed for each question. `median macro-mean`: Median accross macro-mean values. #### Values from Popular Papers ### Examples ```python import evaluate metric = evaluate.load("rfr2003/regression_evaluate") results = metric.compute(generations=['[150, 0]'], golds=[183, 177, 146, 85, 70, 78, 55, 17, 0, -1, -1]) print(results) {'precision': [4.0], 'recall': [344.0], 'macro-mean': [174.0], 'median macro-mean': 174.0} ``` ## Limitations and Bias ## Citation ## Further References