-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Capture code snippet from diagnostic compiler output #9362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* Built tentative test class SwiftBuildSystemOutputParser to handle the compiler output specifically * Added a handleDiagnostic method to possibly substitute the emitEvent local scope implementation of handling a SwiftBuildMessage diagnostic
* the flag `appendToOutputStream` helps us to determine whether a diagnostic is to be emitted or whether we'll be emitting the compiler output via OutputInfo * separate the emitEvent method into the SwiftBuildSystemMessageHandler
a20f020 to
f3aaabf
Compare
1dcaaeb to
c48e606
Compare
|
@swift-ci please test |
owenv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thie generally lgtm but I have some concerns about the regex-based parsing when we emit the textual compiler output.
- Perf - It's important this is fast so that it doesn't block the end of the build if a command produces huge quantities of output. It's hard to say if this will be a real issue without some testing
- We're re-parsing information which we're already getting from the compiler in structured form. I see the appeal of not reporting a diagnostic twice if multiple compile jobs report it though
|
|
||
| /// Split compiler output into individual diagnostic segments | ||
| /// Format: /path/to/file.swift:line:column: severity: message | ||
| private func splitIntoDiagnostics(_ output: String) -> [ParsedDiagnostic] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be doing any output parsing at this level, Swift Build already has an entire subsystem dedicated to doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakepetroules I would agree, however the events we're receiving from SwiftBuild accumulates every diagnostic message into a singular data buffer -- when emitting this as-is, it's possible that we'd be coupling some info-level diagnostics with error-level diagnostics with no way to separate the severity. Splitting the string on a per-diagnostic basis is the only way to achieve this in this way.
The observability scope that we use to emit these diagnostics will capture the entire string blob, and we have to decide with what severity to emit the entire message. I'm not sure if there's an alternative path here that would give us the same ergonomics on the user front, but IMO it's preferred to separate each diagnostic and ensure that we're emitting them with the appropriate severity, rather than sending the entire string of possibly many diagnostics with varying severities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to clarify: Swift Build already parses the individual diagnostic messages into structured data objects independently of the singular output buffer, which is for the textual output of the tool. So Swift Build is already doing the equivalent of what splitIntoDiagnostics is doing, and those messages are given to you in the diagnostics message. They include rich information including severity, line number, and so on.
Similar to what I mentioned elsewhere about the taskStarted/taskEnded events, you want to capture those diagnostics' association to the specific task (e.g. unprocessedDiagnostics should be a dictionary rather than an array, and belongs in BuildState rather than in the top level object), and then replay them at task completion time, without attempting to do your own parsing.
I also pointed this out inline, though I'm not sure why the re-parsing relates to deduplication? We should be able to deduplicate diagnostics whether we parse them again or use the existing ones. |
(@jakepetroules @owenv -- tagging for visibility, GitHub notifications can be weird) Re-parsing doesn't affect the de-duplication -- when tracking the data buffer per task, I also track whether we've emitted the associated output for a given task (using its signature) and guard against this before emitting for that task again. We only go down this path (emitting for a given task) once we've received the task completed event. I mentioned this inline as well but for visibility: the re-parsing just addresses the fact that for a given task signature, we have an accumulated data buffer that contains all possible diagnostic messages coming from the compiler. I find that the user ergonomics aren't great when simply emitting the entire string blob through the observability scope, since these diagnostics can have varying severities and we'll have to decide up-front which severity to choose to emit the entire string of all diagnostics. I do also maintain a list of the Perhaps some more discussion is needed here. 😄 |
* Remove taskStarted outputStream emissions * Propagate exception if unable to find path
|
@swift-ci please test |
|
@swift-ci please test windows |
The TaskDataBuffer introduces some extra complexity in its handling of various buffers we'd actually like to track and emit, rather than ignore. We will keep multiple buffers depending on the information we have available i.e. whether there's an existing task signature buffer, taskID buffer, targetID buffer, or simply a global buffer.
If there is no command line display strings available for a failed task, we should demote this message to info-level to avoid cluttering the output stream with messages that may not be incredibly helpful. To consider here: if this is the only error, we should be able to expose this as an error and perhaps omit "<no command line>".
|
@swift-ci please test |
|
The failing tests seem to be command plugins that do things like this Not sure if we're failing to print the command line in the log, or the observabilityscope isn't adding it to logText |
Sources/Basics/Observability.swift
Outdated
| public protocol DiagnosticsHandler: Sendable { | ||
| func handleDiagnostic(scope: ObservabilityScope, diagnostic: Diagnostic) | ||
|
|
||
| func printToOutput(message: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this func print(_ output: String, verbose: Bool) instead? Since we already have a viable implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the DiagnosticHandlers responsibility to print the diagnostic here. We should instead have a "LoggingHandler" which would know where to emit the "message", whether it's to stdout, stderr, to a file, or some other place.
| buildSystem.delegate?.buildSystem(buildSystem, didUpdateTaskProgress: message) | ||
| case .diagnostic(let info): | ||
| if info.appendToOutputStream { | ||
| emitInfoAsDiagnostic(info: info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to capture task-level diagnostics and defer emission of those until the taskCompleted event.
Global diagnostics and target diagnostics can be emitted immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a diagnostic buffer to handle these cases; it will track whether a diagnostic event is related to a taskID, and if so it will append it to the buffer and wait for the associated taskComplete event to emit.
If this is a global/target diagnostic, then I emit it as-is here.
bkhouri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work, but we should also include automated tests as part of this change to ensure we won't regress the behaviour.
Sources/Basics/Observability.swift
Outdated
| public protocol DiagnosticsHandler: Sendable { | ||
| func handleDiagnostic(scope: ObservabilityScope, diagnostic: Diagnostic) | ||
|
|
||
| func printToOutput(message: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: the function name is a bit misleading. Where if the output going to? to a file, stdout, stderr, somewhere else? can we please update the function name to make this more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is meant to be a temporary workaround given that we are emitting diagnostics from the compiler output accumulated as one string blob, and observability scope usually demands a severity along with this despite the fact that these diagnostics could very possibly have differing severities.
It's a little hacky but I'm hoping that future discussion surrounding how we'd like to architect logging will improve this :)
Sources/Basics/Observability.swift
Outdated
| public protocol DiagnosticsHandler: Sendable { | ||
| func handleDiagnostic(scope: ObservabilityScope, diagnostic: Diagnostic) | ||
|
|
||
| func printToOutput(message: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the DiagnosticHandlers responsibility to print the diagnostic here. We should instead have a "LoggingHandler" which would know where to emit the "message", whether it's to stdout, stderr, to a file, or some other place.
| } | ||
|
|
||
| /// Convenience extensions to extract taskID and targetID from the LocationContext. | ||
| extension SwiftBuildMessage.LocationContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Should these extension belong in SwiftBuild?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They certainly could, though in this context I only use them within this file. There are other ways in which we extract the taskID and targetID from LocationContext, however the ergonomics surrounding a if case let <...> is not my favourite 😄
If folks prefer it, I can post a PR on the swiftbuild end.
| /// Tracks the task IDs for failed tasks. | ||
| private var failedTasks: [Int] = [] | ||
| /// Tracks the tasks by their signature for which we have already emitted output. | ||
| private var tasksEmitted: Set<String> = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (possibly-blocking): what is the difference between tasksEmitted and taskIDsEmitted? If they are tracking the same thing, can we remove one? If we need both, could you please provide an explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many taskIDs can map to a singular taskSignature, and so I initially thought it could be helpful to track both separately. However, I think simply tracking by taskSignature may be sufficient here, so I removed taskIDsEmitted in the latest change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many taskIDs can map to a singular taskSignature
I think that shouldn't be true within a single build operation. Are you seeing that to not be the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen this behaviour locally, usually between two tasks that have an execution description of Emit Swift module (arm64) and Compile <filename>.swift (arm64) respectively; they'll share a task signature but have differing task IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a bug. The signatures are supposed to be globally unique.
cc @owenv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we should avoid using the signatures at all right now, and use only the numeric task IDs for safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, these two tasks also produce the same output (particularly the diagnostics) which helped to reduce duplicate logs. I'll try some alternative approaches locally but I may have to track both ID and signature for the time being.
* Defer emission of diagnostics to completed task event * Remove global data buffer from TaskDataBuffer * Remove emits to observabilityScope for taskStarted events * Move SwiftBuildSystemMessageHandler to its own file * Rename printToOutput to print(_ output:verbose:) * Remove unprocessedDiagnostics parameter from message handler * Remove taskIDsEmitted in favour of using tasksEmitted
Modified the map to track these by targetID rather than name; since the `BuildResult` is expecting a map by targetName, another computed property representing the same map by targetName has also been added
|
@swift-ci please test |
|
@swift-ci please test windows |
|
|
||
| let progressAnimation: ProgressAnimationProtocol | ||
| var serializedDiagnosticPathsByTargetID: [Int: [Basics.AbsolutePath]] = [:] | ||
| var serializedDiagnosticPathsByTargetName: [String: [Basics.AbsolutePath]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to end up mixing diagnostics from potentially completely unrelated targets.
I see the existing BuildResult structure is forcing you into providing this, and it's only used by the Migrate command that was added recently. That should be refactored to use a globally unique identifier for the target instead (ResolvedModule.ID?), but that can be done separately from this PR.
For now I would add a FIXME comment that effectively repeats what I said above, though.
PhantomInTheWire
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @bripeticca just a quick one do you think your changes might fix #8877?
|
@PhantomInTheWire I'll take a look and see if this addresses the issue, thanks for the heads up! |
This PR refactors diagnostic handling in the Swift build system by introducing a dedicated message handler and per-task output buffering to properly parse and emit compiler diagnostics individually.
Key Changes
SwiftBuildSystemMessageHandler
SwiftBuildMessageevents from the build operationPer-Task Data Buffering
taskDataBufferstruct inBuildStateto capture compiler output per task signatureTaskDataBufferstruct allows for usingLocationContextorLocationContext2as a subscript key to fetch the appropriate data buffer for a task, defaulting to the global buffer if no associated task or target can be identified..outputmessages arrive