Martin Dočekal
commited on
Commit
·
d8248d0
1
Parent(s):
6aed907
description update
Browse files- README.md +15 -15
- rouge_raw.py +13 -11
README.md
CHANGED
|
@@ -34,7 +34,7 @@ predictions = ["the cat is on the mat", "hello there"]
|
|
| 34 |
references = ["the cat is on the mat", "hello there"]
|
| 35 |
results = rougeraw.compute(predictions=predictions, references=references)
|
| 36 |
print(results)
|
| 37 |
-
{'
|
| 38 |
```
|
| 39 |
|
| 40 |
|
|
@@ -43,22 +43,22 @@ predictions: list of predictions to evaluate. Each prediction should be a string
|
|
| 43 |
references: list of reference for each prediction. Each reference should be a string with tokens separated by space
|
| 44 |
|
| 45 |
### Output Values
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
Output Example(s):
|
| 57 |
-
```python
|
| 58 |
-
{'rougeraw1_precision': 1.0, 'rougeraw1_recall': 1.0, 'rougeraw1_fmeasure': 1.0, 'rougeraw2_precision': 1.0, 'rougeraw2_recall': 1.0, 'rougeraw2_fmeasure': 1.0, 'rougerawl_precision': 1.0, 'rougerawl_recall': 1.0, 'rougerawl_fmeasure': 1.0}
|
| 59 |
```
|
| 60 |
|
| 61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
## Citation(s)
|
| 64 |
```bibtex
|
|
|
|
| 34 |
references = ["the cat is on the mat", "hello there"]
|
| 35 |
results = rougeraw.compute(predictions=predictions, references=references)
|
| 36 |
print(results)
|
| 37 |
+
{'1_precision': 1.0, '1_recall': 1.0, '1_fmeasure': 1.0, '2_precision': 1.0, '2_recall': 1.0, '2_fmeasure': 1.0, 'l_precision': 1.0, 'l_recall': 1.0, 'l_fmeasure': 1.0}
|
| 38 |
```
|
| 39 |
|
| 40 |
|
|
|
|
| 43 |
references: list of reference for each prediction. Each reference should be a string with tokens separated by space
|
| 44 |
|
| 45 |
### Output Values
|
| 46 |
+
This metric outputs a dictionary, containing the scores.
|
| 47 |
+
|
| 48 |
+
There are precision, recall, F1 values for rougeraw-1, rougeraw-2 and rougeraw-l. By default the bootstrapped confidence intervals are calculated, meaning that for each metric there are low, mid , high values specifying the confidence interval.
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
Key format:
|
| 52 |
+
```
|
| 53 |
+
{1|2|l}_{low|mid|high}_{precision|recall|fmeasure}
|
| 54 |
+
e.g.: 1_low_precision
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
```
|
| 56 |
|
| 57 |
+
If aggregate is False the format is:
|
| 58 |
+
```
|
| 59 |
+
{1|2|l}_{precision|recall|fmeasure}
|
| 60 |
+
e.g.: 1_precision
|
| 61 |
+
```
|
| 62 |
|
| 63 |
## Citation(s)
|
| 64 |
```bibtex
|
rouge_raw.py
CHANGED
|
@@ -324,18 +324,20 @@ Args:
|
|
| 324 |
select: (Optional) string. The name of the metric to return. One of: 'rougeraw1_precision', 'rougeraw1_recall', 'rougeraw1_fmeasure', 'rougeraw2_precision', 'rougeraw2_recall', 'rougeraw2_fmeasure', 'rougerawl_precision', 'rougerawl_recall', 'rougerawl_fmeasure'.
|
| 325 |
If None, all metrics are returned as a dictionary.
|
| 326 |
Returns:
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
1_fmeasure
|
| 330 |
-
2_precision
|
| 331 |
-
2_recall
|
| 332 |
-
2_fmeasure
|
| 333 |
-
l_precision
|
| 334 |
-
l_recall
|
| 335 |
-
l_fmeasure
|
| 336 |
|
| 337 |
-
|
| 338 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
Examples:
|
| 340 |
>>> rougeraw = evaluate.load('CZLC/rouge_raw')
|
| 341 |
>>> predictions = ["the cat is on the mat", "hello there"]
|
|
|
|
| 324 |
select: (Optional) string. The name of the metric to return. One of: 'rougeraw1_precision', 'rougeraw1_recall', 'rougeraw1_fmeasure', 'rougeraw2_precision', 'rougeraw2_recall', 'rougeraw2_fmeasure', 'rougerawl_precision', 'rougerawl_recall', 'rougerawl_fmeasure'.
|
| 325 |
If None, all metrics are returned as a dictionary.
|
| 326 |
Returns:
|
| 327 |
+
This metric outputs a dictionary, containing the scores.
|
| 328 |
+
There are precision, recall, F1 values for rougeraw-1, rougeraw-2 and rougeraw-l. By default the bootstrapped confidence intervals are calculated, meaning that for each metric there are low, mid , high values specifying the confidence interval.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 329 |
|
| 330 |
+
Key format:
|
| 331 |
+
```
|
| 332 |
+
{1|2|l}_{low|mid|high}_{precision|recall|fmeasure}
|
| 333 |
+
e.g.: 1_low_precision
|
| 334 |
+
```
|
| 335 |
+
|
| 336 |
+
If aggregate is False the format is:
|
| 337 |
+
```
|
| 338 |
+
{1|2|l}_{precision|recall|fmeasure}
|
| 339 |
+
e.g.: 1_precision
|
| 340 |
+
```
|
| 341 |
Examples:
|
| 342 |
>>> rougeraw = evaluate.load('CZLC/rouge_raw')
|
| 343 |
>>> predictions = ["the cat is on the mat", "hello there"]
|