-
Notifications
You must be signed in to change notification settings - Fork 444
Implement fetching JS sources from browser via WebChannel #5506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5506 +/- ##
========================================
Coverage 85.74% 85.75%
========================================
Files 310 311 +1
Lines 30482 30696 +214
Branches 8396 8439 +43
========================================
+ Hits 26137 26323 +186
- Misses 3922 3950 +28
Partials 423 423 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@mstange This is ready for review now! Please let me know if you have any questions. |
mstange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I have some concerns about the global ID. I would like to decrease the degree to which we rely on the PID. On Windows, in theory, it is possible to have the profiler running for a long enough time frame that you can have profiles from different content processes with the same (recycled) PID. I know we already have existing assumptions about PIDs being unique, but it would be nice to eliminate that assumption eventually, so we shouldn't bake it into new web channel methods.
Could we instead have a single ID which identifies the right source in the right process?
For example, Firefox could generate a UUIDv4 for every source, and put an index-to-(UUID, URL) mapping into a new table in the Gecko format. Then it could also remove the URL from the location string and save some space. And then we can use that UUID when we query the source via the web channel.
I'm not quite sure how to store this information in the processed format. We could have a new sources table in profile.shared.sources, also with UUID-or-null + URL-or-path, and it would allow us to have just a single index for the source which is global. We could then also put that index into the URL, rather than the source path + pid + index. But then the question is whether this only applies to JS sources, or to existing C++ sources as well. In the upgrader, would we need to look through the funcTable and gather up all source paths to create entries in profile.shared.sources for them? And then we'd need a URL upgrader too...
What are your thoughts? From my side the highest priority is to avoid putting the PID into the webchannel message. What we put in the profile and the URL is something we can always change with an upgrader later, but the webchannel "protocol" is more long-lived.
Some other thoughts I had:
- For extension JS URLs, we currently create one resource per extension, not one resource per source. So the mapping won't work correctly there.
- If we change the string format we use in the Gecko format's "location" string, does this also affect what we put into the Jitdump when profiling with samply?
- When thinking about processed profile format changes, let's keep in mind what we would like it to look like once we put all the tables into
profile.shared. - We should also keep in mind that we want to have the ability to embed sources into the profile. The
profile.shared.sourcesidea was with this in mind - we could put the actual source into that same table.
I'm going to mark this PR as "Request Changes" only due to the PID in the webchannel message and due to the getDownloadRecipe issue mentioned in a separate comment.
This is needed for the following patches to make sure that we don't have a circular dependency. Previously this class was only used for our own profile processing from the gecko profile data. But in the next commits I'm changing the profile importers to use this as well for easy profile shared data collection.
e75754e to
4776a15
Compare
|
Thanks for the review @mstange! I've updated the PR after your review comments and it should be ready for another review. I didn't update my backend patches yet but I will update them soon too. We've discussed it before sync, but here's an overview of the things I did:
The diff says a lot of addition/removal in github, but they are mostly from snapshot updates. I tried to extract them into different commits so they are easier to review. |
mstange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only reviewed the first two commits so far and they look fine. But I fear that the other commits will have a lot of changes in them that won't be necessary if we put the sources into the derived thread... what do you think?
src/profile-logic/process-profile.ts
Outdated
| for (const data of geckoProfile.sources.data) { | ||
| const uuid = data[schema.uuid]; | ||
| const filename = data[schema.filename]; | ||
| globalDataCollector.indexForSource(uuid, filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a bit odd to ignore the return value here, but I suspect this will change in upcoming commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, currently I'm preadding all the sources. Then I will get the source indexes in the place where I'm processing the location JS string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why is this upfront processing needed? Won't those later calls make sure the right entries are present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was pretty sure there was a reason when I was doing it, but I can't find it now. Thinking more about it, I think it's better to remove this _processSourcesTable. Done.
c670ec3 to
380193c
Compare
Oh, I always thought about the shared data as outside of the thread mentally, never thought about putting it inside the derived thread. But it makes sense, and it removes a lot of annoying code where I pass the sources around. Changed it! |
mstange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good now! My main question is about the response format for the GET_JS_SOURCES message. Also, have you thought about how this will interact with source maps?
| stackTable: StackTable, | ||
| stringTable: StringTable | ||
| stringTable: StringTable, | ||
| sources: SourceTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, should we call it sourceTable instead? No strong opinion and this is something we can always change later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, to be honest I think I changed my mind twice while I was writing this code 😄 I had the same question both for the name in the profile json and code when I pass it around in argument. I currently like the shorter sources version because:
- It's more concise.
- It's easier to type
- For the profile format: All of our tables are kinda mixed in naming and there is no single rule we follow, which makes it harder to decide which one to pick. I currently prefer the naming without table 😄
But I feel like it's all very subjective and even I changed my mind twice so far 😄 I'll keep it as is for now, but we can always change it later.
| }; | ||
| } | ||
|
|
||
| function createMockCache(): Map<IndexIntoSourceTable, SourceCodeStatus> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have no test than a test that uses mocking and depends on implementation details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you in general. But I don't see it that way for testing the source code cache. This is because:
- Source code cache itself is an implementation detail, so it's very difficult to test without mocking.
- Even though this cache is an implementation detail itself, I wanted to write a test for it to make sure that it works correctly. Because I broke it while working on it :) So I didn't someone else doing the same mistake I did.
- This is a unit test. Even though I agree about mocking for the component tests, unit tests usually depend on mocks.
Let me know how you feel. I guess I can remove the whole test if you prefer. I don't feel strongly about it. Just wanted to make sure it works correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just removed the mock cache and switched it to use the actions that we have (finishLoadingSourceCode etc.). That makes the test work without the explicit mock in the state.
src/app-logic/web-channel.ts
Outdated
| type QuerySymbolicationApiResponse = string; | ||
| type GetPageFaviconsResponse = Array<FaviconData | null>; | ||
| type OpenScriptInTabDebuggerResponse = void; | ||
| type GetJSSourcesResponse = Array<string | null>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, should we make this an object, so that it can give us a per-source error message? Making it an object would also let us add more fields in the future, e.g. sourceMapUrl and maybe sourceMapUuid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I was thinking about having a different webchannel request for the sourcemap but it also makes sense to combine them. Changed it.
|
I'm looking into https://typescript-eslint.io/getting-started/typed-linting/ now which I think may allow us to catch unnecessary null checks. |
It would work but we'll need some changes to make it pass, see #5620 |
This patch adds the sources table but it doesn't use that yet. The following commits will be changing the funcTable and then start using it. Note that this gathers all sources, including the JS and native sources. This way, we have a single source of truth for them. But this will happen in the following patches. This patch: - Adds a `geckoProfile.sources` table to the gecko profile format. - Handles the processing of gecko profile -> processed profile format.
Note that this gathers all sources, including the JS and native sources. This way, we have a single source of truth for them. This patch: - Changes the `thread.funcTable` to have a `source` array instead of a `fileName` array. - Changes the rest of the codebase to make sure that everything uses the new format.
Add an upgrader for frameTable.location changes that include JS sourceIds in brackets.
566a813 to
446b90b
Compare
canova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review @mstange! I've updated the PR.
Oh welp, thanks for catching all these unnecessary if checks. All of the if (sources) checks come from my previous implementation where I made the sources optional in the processed format first. I then decided that it doesn't make sense and removed that optional case and thought I removed these checks. Apparently a lot of them slipped through the cracks. I think the filenameIndex checks were there because before this PR these filenames were in the funcTable that was nullable.
So I've amended the very easy changes to their commits directly (like the variable renaming/removing if checks/comment fixes etc.). And decided to put the logical changes to their own commits so they are easier to review. Last 4 commits are new (the ones after "Add tests for JavaScript source fetching")
I'm looking into https://typescript-eslint.io/getting-started/typed-linting/ now which I think may allow us to catch unnecessary null checks.
It would work but we'll need some changes to make it pass, see #5620
Ah, yeah, that looks nice. It would be good to use it. It looks like the issue you created was closed in favor of microsoft/TypeScript#15838 which is also closed. Does it mean that it's fixed in the main branch? (it looks very old, so I'm not so sure what's going on)
src/profile-logic/symbolication.ts
Outdated
| // Find or create source entry | ||
| let sourceIndex = null; | ||
| for (let i = 0; i < sources.filename.length; i++) { | ||
| if (sources.filename[i] === fileNameStringIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think it's very unlikely to have the same filename with a UUID to collide with this. Before this patch, we were treating fileName as a unique identifier (filename was kept in the url). With the JS sources coming, still I can't think of a way that they could clash. But on the other hand, it's a small/harmless check that's not worth thinking too much. I just added still, just so we are more defensive about it.
Adding that check makes it not possible to use indexOf or findIndex btw. It's better to use a normal for loop for that case, so I kept it.
| } | ||
| ); | ||
|
|
||
| const getLib: Selector<string> = createSelector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh yeah, it's a bit strange 😄
| stackTable: StackTable, | ||
| stringTable: StringTable | ||
| stringTable: StringTable, | ||
| sources: SourceTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, to be honest I think I changed my mind twice while I was writing this code 😄 I had the same question both for the name in the profile json and code when I pass it around in argument. I currently like the shorter sources version because:
- It's more concise.
- It's easier to type
- For the profile format: All of our tables are kinda mixed in naming and there is no single rule we follow, which makes it harder to decide which one to pick. I currently prefer the naming without table 😄
But I feel like it's all very subjective and even I changed my mind twice so far 😄 I'll keep it as is for now, but we can always change it later.
| for (let i = 0; i < sources.filename.length; i++) { | ||
| if (sources.filename[i] === urlStringIndex && sources.uuid[i] === uuid) { | ||
| return i; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two arrays to iterate there not a single array (filename and uuid). We usually go with a for loop in cases like that since it's simpler.
src/profile-logic/process-profile.ts
Outdated
| extensions: ExtensionTable = getEmptyExtensions(), | ||
| globalDataCollector: GlobalDataCollector | ||
| globalDataCollector: GlobalDataCollector, | ||
| processSourceTable: GeckoSourceTable | undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently in the backend this feature is behind the "jssources" flag.
src/app-logic/web-channel.ts
Outdated
| type QuerySymbolicationApiResponse = string; | ||
| type GetPageFaviconsResponse = Array<FaviconData | null>; | ||
| type OpenScriptInTabDebuggerResponse = void; | ||
| type GetJSSourcesResponse = Array<string | null>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I was thinking about having a different webchannel request for the sourcemap but it also makes sense to combine them. Changed it.
| // Note: source indexes don't need translation as they point to sources table, not strings | ||
| // Source table will be handled separately in _createSourcesTableWithTranslatedStringIndexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I didn't do this initially thinking that probably it's not going to be very big. I was thinking about doing it in a follow-up (even though the comment here doesn't state that). But let's do it here. It shouldn't be too difficult. Updated the code.
src/profile-logic/process-profile.ts
Outdated
| for (const data of geckoProfile.sources.data) { | ||
| const uuid = data[schema.uuid]; | ||
| const filename = data[schema.filename]; | ||
| globalDataCollector.indexForSource(uuid, filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was pretty sure there was a reason when I was doing it, but I can't find it now. Thinking more about it, I think it's better to remove this _processSourcesTable. Done.
| }; | ||
| } | ||
|
|
||
| function createMockCache(): Map<IndexIntoSourceTable, SourceCodeStatus> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you in general. But I don't see it that way for testing the source code cache. This is because:
- Source code cache itself is an implementation detail, so it's very difficult to test without mocking.
- Even though this cache is an implementation detail itself, I wanted to write a test for it to make sure that it works correctly. Because I broke it while working on it :) So I didn't someone else doing the same mistake I did.
- This is a unit test. Even though I agree about mocking for the component tests, unit tests usually depend on mocks.
Let me know how you feel. I guess I can remove the whole test if you prefer. I don't feel strongly about it. Just wanted to make sure it works correctly.
| }; | ||
| } | ||
|
|
||
| function createMockCache(): Map<IndexIntoSourceTable, SourceCodeStatus> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just removed the mock cache and switched it to use the actions that we have (finishLoadingSourceCode etc.). That makes the test work without the explicit mock in the state.
mstange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| for (let i = 0; i < sources.filename.length; i++) { | ||
| if (sources.filename[i] === urlStringIndex && sources.uuid[i] === uuid) { | ||
| return i; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, of course
src/app-logic/url-handling.ts
Outdated
| const { sources, stringArray } = profile.shared; | ||
|
|
||
| // Find the filename string index | ||
| const filenameStringIndex = stringArray.indexOf(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth using StringTable here - we'll likely already have a cached lookup map for this stringArray, or if not, we'll need it pretty soon.
Oh I had actually missed that it was closed. No it's not fixed - I guess they closed it as a duplicate of the more general issue which is still open, microsoft/TypeScript#9998 . |
…filename Now that we changed the place where we send information to this place, we can simplify it.
Implement browser communication to fetch JS source code using GlobalJSSourceId. This enables the profiler to retrieve original source content directly from the browser's JavaScript engine. Backend bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1916785
…user properly This commit changes the response type to a object, so it will be easier to expand later for the sourcemap support.
…essing the JS functions
446b90b to
9f0aefb
Compare
|
Thanks for the reviews! |
Fixes #5110.
Backend bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1916785
(but patches are from multiple bugs, see https://phabricator.services.mozilla.com/D259273 for the whole stack)
This PR adds support for fetching the JS source code via WebChannel and displaying them in the source view.
You should be able to try it yourself if you can compile the backend patches yourself too, which is not super convenient, but currently there is no upload of JS sources, that's why it's only possible to see the sources when we capture the profile.
Currently:
Next steps as follow-ups: