Skip to content

Commit 40ab287

Browse files
committed
fix(utils): Improve URL normalization by avoiding quote/unquote to preserve '+' signs. ref unclecode#1332
1 parent 90af453 commit 40ab287

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

crawl4ai/utils.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2184,8 +2184,10 @@ def normalize_url(
21842184
netloc = parsed.netloc.lower()
21852185

21862186
# ── path ──
2187-
# Strip duplicate slashes and trailing “/” (except root)
2188-
path = quote(unquote(parsed.path))
2187+
# Strip duplicate slashes and trailing "/" (except root)
2188+
# IMPORTANT: Don't use quote(unquote()) as it mangles + signs in URLs
2189+
# The path from urlparse is already properly encoded
2190+
path = parsed.path
21892191
if path.endswith('/') and path != '/':
21902192
path = path.rstrip('/')
21912193

0 commit comments

Comments
 (0)