-
Notifications
You must be signed in to change notification settings - Fork 166
Add tests for parsing of inline HTML when rendering markdown as HTML #1379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
d-ronnqvist
wants to merge
2
commits into
swiftlang:main
Choose a base branch
from
d-ronnqvist:output-html-parse-inline
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+91
−32
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -675,58 +675,69 @@ package struct MarkdownRenderer<Provider: LinkProvider> { | |||||
| // - An empty element like `<br />` or `<hr />` that's complete on its own. | ||||||
| // - An element with children like `<span style="color: red;">Something</span>` that needs to be created out of multiple markup elements. | ||||||
| // | ||||||
| // FIXME: See if this can be extracted into 2 private functions to make the code easier to read. | ||||||
| // Because it may take multiple markdown elements to create an HTML element, we pop elements rather than iterating | ||||||
| var elements = Array(container) | ||||||
| outer: while !elements.isEmpty { | ||||||
| let element = elements.removeFirst() | ||||||
|
|
||||||
| guard let start = element as? InlineHTML else { | ||||||
| var remainder = Array(container)[...] | ||||||
| while let element = remainder.popFirst() { | ||||||
| guard let openingHTML = element as? InlineHTML else { | ||||||
| // If the markup _isn't_ inline HTML we can simply visit it to transform it. | ||||||
| children.append(visit(element)) | ||||||
| continue | ||||||
| } | ||||||
|
|
||||||
| // Otherwise, we need to determine how long this markdown element it. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| var rawHTML = start.rawHTML | ||||||
| let rawHTML = openingHTML.rawHTML | ||||||
| // Simply skip any HTML/XML comments. | ||||||
| guard !rawHTML.hasPrefix("<!--") else { | ||||||
| // If it's a basic link, simply skip it. | ||||||
| continue | ||||||
| } | ||||||
|
|
||||||
| // Next, check if its empty element (for example `<br />` or `<hr />`) that's complete on its own. | ||||||
| if let parsed = try? XMLElement(xmlString: rawHTML) { | ||||||
|
|
||||||
| // On non-Darwin platforms, `XMLElement(xmlString:)` sometimes crashes for certain invalid / incomplete XML strings. | ||||||
| // To minimize the risk of this happening, don't try to parse the XML string as an empty HTML element unless it ends with "/>" | ||||||
| if rawHTML.hasSuffix("/>"), let parsed = try? XMLElement(xmlString: rawHTML) { | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you run into some crashes while running your new tests in Linux CI? |
||||||
| children.append(parsed) | ||||||
| continue | ||||||
| } | ||||||
|
|
||||||
| // This could be an HTML element with content or it could be invalid HTML. | ||||||
| // Don't modify `elements` until we know that we've parsed a valid HTML element. | ||||||
| var copy = elements | ||||||
|
|
||||||
| // Gradually check a longer and longer series of markup elements to see if they form a valid HTML element. | ||||||
| inner: while !copy.isEmpty, let next = copy.first as? any InlineMarkup { | ||||||
| _ = copy.removeFirst() | ||||||
|
|
||||||
| // Skip any HTML/XML comments _inside_ this HTML tag | ||||||
| if let html = next as? InlineHTML, html.rawHTML.hasPrefix("<!--") { | ||||||
| continue inner | ||||||
| } | ||||||
|
|
||||||
| rawHTML += next.format() | ||||||
| if let parsed = try? XMLElement(xmlString: rawHTML) { | ||||||
| children.append(parsed) // Include the valid HTML element in the output. | ||||||
| elements = copy // Skip over all the elements that were used to create that HTML element. | ||||||
| continue outer | ||||||
| } | ||||||
| // Lastly, check if this is the start of an HTML element that needs to be constructed out of more than one markup element | ||||||
| else if let parsed = _findMultiMarkupHTMLElement(in: &remainder, openingRawHTML: rawHTML) { | ||||||
| children.append(parsed) | ||||||
| } | ||||||
| // If we reached the end of the inline elements without parsing a valid HTML element, skip that first InlineHTML markup and continue from there. | ||||||
| continue | ||||||
| } | ||||||
|
|
||||||
| return children | ||||||
| } | ||||||
|
|
||||||
| private func _findMultiMarkupHTMLElement(in remainder: inout ArraySlice<any Markup>, openingRawHTML: String) -> XMLNode? { | ||||||
| // Don't modify `remainder` until we know that we've parsed a valid HTML element. | ||||||
| var copy = remainder | ||||||
|
|
||||||
| var rawHTML = openingRawHTML | ||||||
| let tagName = rawHTML.dropFirst(/* the opening "<" */).prefix(while: \.isLetter) | ||||||
| let expectedClosingTag = "</\(tagName)>" | ||||||
|
|
||||||
| // Only iterate as long the markup is _inline_ markup. | ||||||
| while let next = copy.first as? any InlineMarkup { | ||||||
| _ = copy.removeFirst() | ||||||
| let html = next as? InlineHTML | ||||||
|
|
||||||
| // Skip any HTML/XML comments _inside_ this HTML tag | ||||||
| if let html, html.rawHTML.hasPrefix("<!--") { | ||||||
| continue | ||||||
| } | ||||||
|
|
||||||
| // If this wasn't a comment, accumulate more raw HTML to try and parse | ||||||
| rawHTML += next.format() | ||||||
| // On non-Darwin platforms, `XMLElement(xmlString:)` sometimes crashes for certain invalid / incomplete XML strings. | ||||||
| // To minimize the risk of this happening, don't try to parse the XML string as an empty HTML element unless it ends with "/>" | ||||||
| if html?.rawHTML == expectedClosingTag, let parsed = try? XMLElement(xmlString: rawHTML) { | ||||||
| remainder = copy // Skip over all the elements that were used to create that HTML element. | ||||||
| return parsed // Include the valid HTML element in the output. | ||||||
| } | ||||||
| } | ||||||
| // If we reached the end of the _inline_ markup without parsing a valid HTML element, skip just that opening markup without updating `remainder` | ||||||
| return nil | ||||||
| } | ||||||
|
|
||||||
| // MARK: Directives | ||||||
|
|
||||||
| func visit(_: BlockDirective) -> XMLNode { | ||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're using a slice here, I imagine this is quite efficient, even though we are mutating
remainder.