Skip to content

Commit a6f76fe

Browse files
Feat/quoted spans metric (#2311)
## Issue Link / Problem Description <!-- Link to related issue or describe the problem this PR solves --> - contd #2237 --------- Co-authored-by: Mohd Ibrahim A. <106313018+mohdibrahimai@users.noreply.github.com>
1 parent 96284d7 commit a6f76fe

File tree

3 files changed

+207
-0
lines changed

3 files changed

+207
-0
lines changed

docs/quoted_spans_metric.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
## `citation_alignment_quoted_spans`
2+
3+
**What:** A metric that measures the fraction of quoted spans in a model’s answer
4+
that appear verbatim in the retrieved sources. The score is in the range
5+
[0, 1], where 1.0 indicates every quoted span is supported by evidence and 0.0
6+
indicates no quoted spans are found in the sources.
7+
8+
**Why:** Users place extra trust in exact quotes. When a model quotes facts
9+
that aren’t present in its evidence, it undermines reliability. This metric
10+
helps catch cases of citation drift where quoted phrases in the answer are
11+
unsupported.
12+
13+
**Input shape:**
14+
15+
- `answers: List[str]` – list of model answers (length N)
16+
- `sources: List[List[str]]` – list (length N) of lists of source passages
17+
18+
**Output:** A dictionary containing:
19+
20+
```python
21+
{
22+
"citation_alignment_quoted_spans": float, # score in [0,1]
23+
"matched": float, # number of spans found in sources
24+
"total": float # total number of spans considered
25+
}
26+
```
27+
28+
**Notes:**
29+
30+
- The implementation normalizes text by collapsing whitespace and lower‑casing.
31+
- Spans shorter than three words are ignored by default; adjust `min_len` to change this.
32+
- If no quoted spans are found across all answers, the score is defined as 0.0 with
33+
`total = 0`.

src/ragas/metrics/quoted_spans.py

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
"""
2+
Quoted Spans Alignment Metric
3+
================================
4+
5+
This module provides a simple metric to measure citation alignment for quoted spans
6+
in model-generated answers. The idea is to compute the fraction of quoted spans
7+
appearing verbatim in any of the provided source passages. If an answer quotes
8+
facts that cannot be found in the sources, the metric will reflect that drift.
9+
10+
The metric function is designed to be plug‑and‑play in existing evaluation
11+
pipelines. It returns a score in the range [0, 1] along with the raw counts for
12+
matched and total quoted spans. It performs light normalization by collapsing
13+
whitespace and lower‑casing strings. You can adjust the minimum length of a
14+
quoted span and choose to disable case folding if desired.
15+
"""
16+
17+
from __future__ import annotations
18+
19+
import re
20+
from typing import Dict, Sequence
21+
22+
# Regular expression to extract both straight and curly quoted spans. Matches
23+
# pairs of quotes and captures the inner text.
24+
_QUOTE_RE = re.compile(r"[\"" "''`´](.*?)[\"" "''`´]")
25+
26+
27+
def _normalize(text: str) -> str:
28+
"""Normalize text by collapsing whitespace and lower‑casing it."""
29+
return re.sub(r"\s+", " ", text).strip().lower()
30+
31+
32+
def _extract_quoted_spans(answer: str, *, min_len: int = 3) -> Sequence[str]:
33+
"""
34+
Extract quoted spans from an answer.
35+
36+
Parameters
37+
----------
38+
answer: str
39+
The model answer to search for quoted spans.
40+
min_len: int, optional
41+
Minimum number of words required for a span to be considered. Shorter
42+
spans are ignored to avoid spurious matches.
43+
44+
Returns
45+
-------
46+
Sequence[str]
47+
A list of quoted spans (strings) that meet the minimum length
48+
requirement.
49+
"""
50+
spans: list[str] = []
51+
for match in _QUOTE_RE.finditer(answer):
52+
span = (match.group(1) or "").strip()
53+
# filter out spans shorter than min_len words
54+
if len(span.split()) >= min_len:
55+
spans.append(span)
56+
return spans
57+
58+
59+
def quoted_spans_alignment(
60+
answers: Sequence[str],
61+
sources: Sequence[Sequence[str]],
62+
*,
63+
casefold: bool = True,
64+
min_len: int = 3,
65+
) -> Dict[str, float]:
66+
"""
67+
Compute the citation alignment score for quoted spans in model answers.
68+
69+
Parameters
70+
----------
71+
answers: Sequence[str]
72+
List of model answers (length N).
73+
sources: Sequence[Sequence[str]]
74+
List of lists (length N) containing passages for each answer.
75+
casefold: bool, optional
76+
Whether to normalize text by lower‑casing before matching. Defaults
77+
to True.
78+
min_len: int, optional
79+
Minimum number of words in a quoted span. Defaults to 3.
80+
81+
Returns
82+
-------
83+
Dict[str, float]
84+
A dictionary containing:
85+
- "citation_alignment_quoted_spans": the fraction of quoted
86+
spans found verbatim in the provided sources.
87+
- "matched": number of spans that were matched
88+
- "total": total number of spans considered
89+
90+
Notes
91+
-----
92+
If no quoted spans are found across the dataset, the score is defined as
93+
0.0, with matched=0 and total=0. Matching is substring matching on
94+
normalized text.
95+
"""
96+
if len(answers) != len(sources):
97+
raise ValueError("answers and sources must have the same length")
98+
matched = 0
99+
total = 0
100+
101+
for answer, src_list in zip(answers, sources):
102+
spans = _extract_quoted_spans(answer, min_len=min_len)
103+
if not spans:
104+
continue
105+
# join all sources for this answer into one string
106+
joined_sources = " ".join(src_list)
107+
if casefold:
108+
normalized_sources = _normalize(joined_sources)
109+
else:
110+
normalized_sources = joined_sources
111+
112+
for span in spans:
113+
total += 1
114+
span_norm = _normalize(span) if casefold else span
115+
# check if the normalized span appears in the normalized sources
116+
if span_norm and span_norm in normalized_sources:
117+
matched += 1
118+
119+
score = (matched / total) if total else 0.0
120+
return {
121+
"citation_alignment_quoted_spans": float(score),
122+
"matched": float(matched),
123+
"total": float(total),
124+
}

tests/test_quoted_spans.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
"""
2+
Unit tests for the quoted spans alignment metric.
3+
4+
These tests are written using pytest and cover several common cases:
5+
- A perfect match where the quoted span appears in the sources.
6+
- A mismatch where the quoted span does not appear in the sources.
7+
- Case and whitespace variations to verify normalization logic.
8+
- Answers with no quoted spans to ensure the score is zero and total is zero.
9+
10+
To run these tests, install pytest and run `pytest` in the repository root.
11+
"""
12+
13+
from ragas.metrics.quoted_spans import quoted_spans_alignment
14+
15+
16+
def test_perfect_match():
17+
"""Quoted span matches exactly in the source."""
18+
answers = ['Paris is "the capital of France".']
19+
sources = [["The capital of France is Paris."]]
20+
result = quoted_spans_alignment(answers, sources)
21+
assert result["citation_alignment_quoted_spans"] == 1.0
22+
assert result["matched"] == 1.0
23+
assert result["total"] == 1.0
24+
25+
26+
def test_mismatch_detected():
27+
"""Quoted span does not appear in the sources."""
28+
answers = ['GDP was "$2.9T" in 2023.']
29+
sources = [["…GDP was $2.7T in 2023 per WB…"]]
30+
result = quoted_spans_alignment(answers, sources, min_len=1)
31+
assert result["citation_alignment_quoted_spans"] == 0.0
32+
assert result["matched"] == 0.0
33+
assert result["total"] == 1.0
34+
35+
36+
def test_mixed_case_and_whitespace():
37+
"""Matching should be case-insensitive and handle extra whitespace."""
38+
answers = ['Result: "Delta E = mc ^ 2".']
39+
sources = [["…delta e = mc ^ 2 holds…"]]
40+
result = quoted_spans_alignment(answers, sources)
41+
assert result["citation_alignment_quoted_spans"] == 1.0
42+
43+
44+
def test_no_quotes_returns_zero_with_zero_denominator():
45+
"""An answer with no quoted spans should yield score 0.0 and total 0."""
46+
answers = ["No quotes here."]
47+
sources = [["Irrelevant."]]
48+
result = quoted_spans_alignment(answers, sources)
49+
assert result["citation_alignment_quoted_spans"] == 0.0
50+
assert result["total"] == 0.0

0 commit comments

Comments
 (0)