From be989a0b0fce6161d76f41513fbbd8bd57d2fe36 Mon Sep 17 00:00:00 2001 From: "Piotr P. Karwasz" Date: Fri, 26 Sep 2025 11:28:36 +0200 Subject: [PATCH] fix: Clarify handling of `/` in the specification MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This change clarifies the meaning of “excessive slashes” (`/`) in `namespace`, `name`, and `subpath` by adding explicit requirements for parsers: * **Namespace**: Parsers must ignore all empty segments. Leading and trailing slashes are just special cases of empty segments (e.g., `/foo/` → `["", "foo", ""]`). * **Name**: Parsers must ignore trailing slashes. To avoid ambiguity when multiple leading slashes appear (which some parsers might interpret as part of the `name` and others as part of the `namespace`), parsers should remove them in either case. For example: * `pkg:type/namespace//name` → the intended namespace is `namespace/` and the name is `name`, but some parsers might incorrectly treat the name as `/name`. * `pkg:type//name` → the intended namespace is empty, and the name is `name`, but some parsers might incorrectly treat the name as `/name`. * **Subpath**: Parsers must apply the same rule and ignore empty segments. These changes aim to resolve ambiguities in the current specification. For example, under the existing wording: * Leading and trailing slashes `/` “should be stripped in the canonical form,” but it is unclear whether this refers to the canonical form of the entire PURL or only the encoded `namespace`. * At the same time, those slashes are described as not being part of the `namespace`. The revision also extends the specification by requiring parsers to strip consecutive slashes that appear within the encoded `namespace` and `subpath`. This change addresses part of the ambiguities described in #584 by eliminating the category of “invalid but tolerated” PURLs. Instead, it shifts the responsibility to parsers, which must leniently normalize PURLs with excessive slashes into valid ones. Importantly, this normalization can be applied without requiring a full parser. --- docs/standard/components.md | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/docs/standard/components.md b/docs/standard/components.md index 661d252b..e994c037 100644 --- a/docs/standard/components.md +++ b/docs/standard/components.md @@ -36,27 +36,29 @@ The rules for each component are: definition. - If present, the `namespace` may contain one or more segments, separated by a single unencoded slash '/' character. - - All leading and trailing slashes '/' are not significant and should be - stripped in the canonical form. They are not part of the `namespace`. - Each `namespace` segment must be a percent-encoded string. - When percent-decoded, a segment: - Must not contain any slash '/' characters - - Must not be empty + - Must not be empty. In particular, leading and trailing empty segments + are not allowed - Must contain any Unicode character other than '/' unless the package's `type` definition provides otherwise. - A URL host or Authority must not be used as a `namespace`. Use instead a `repository_url` qualifier. Note however, that for some types, the `namespace` may look like a host. - + - PURL parsers must accept URLs where `namespace` contains empty segments, + and must remove them. - **name**: - The `name` is prefixed by a single slash '/' separator when the `namespace` is not empty. - - All leading and trailing slashes '/' are not significant and should be - stripped in the canonical form. They are not part of the `name`. + - PURL parsers must accept inputs where `name` is prefixed with multiple `/` separators + and normalize them to one. + - PURL parsers must accept URLs where `name` is suffixed with one or more `/` separators + and remove them. - A `name` must be a percent-encoded string. - When percent-decoded, a `name` may contain any Unicode character unless the package's `type` definition provides otherwise. @@ -103,15 +105,16 @@ The rules for each component are: - The `subpath` string is prefixed by a '#' separator when not empty - The '#' is not part of the `subpath` - The `subpath` contains zero or more segments, separated by slash '/' - - Leading and trailing slashes '/' are not significant and should be - stripped in the canonical form - Each `subpath` segment must be a percent-encoded string - When percent-decoded, a segment: - Must not contain any slash '/' characters - - Must not be empty + - Must not be empty. In particular, leading and trailing empty segments + are not allowed - Must not be any of '..' or '.' - May contain any Unicode character other than '/' unless the package's `type` definition provides otherwise. - The `subpath` must be interpreted as relative to the root of the package + - PURL parsers must accept URLs where `subpath` contains empty segments, + and must remove them.