-
Notifications
You must be signed in to change notification settings - Fork 69
feat: filter log types using include/exclude patterns #3838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: filter log types using include/exclude patterns #3838
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
Robot Results
|
2db56ee to
0bba586
Compare
0bba586 to
53925a2
Compare
docs/src/extend/log-management.md
Outdated
| [[plugins.journald]] | ||
| include = "tedge-*" | ||
|
|
||
| [[plugins.journald]] | ||
| exclude = "systemd*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the filtering rules, this include clause is useless. I would be good to come with an example that better demonstrates the usefulness of include / exclude clauses.
The point here is only to show something sensible, the details being explained in the following sections.
It can be "exclude all systemd services except systemd-logind".
Or better, something simpler with two plugins:
| [[plugins.journald]] | |
| include = "tedge-*" | |
| [[plugins.journald]] | |
| exclude = "systemd*" | |
| [[plugins.docker]] | |
| include = "tedge-*" | |
| [[plugins.journald]] | |
| exclude = "systemd*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better, something simpler with two plugins:
Yup, I'll do that since this is just the introduction section
It can be "exclude all systemd services except systemd-logind".
Have included that example in the examples section below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by 2d9f927
| if let Ok(compiled) = glob::Pattern::new(pattern) { | ||
| include_patterns.push(compiled); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract this compilation stage out of the filter_log_types method. Can this be done when the file is read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done when the file is read?
Yes. We can keep these patterns compiled when the config file is read/reloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by 2d9f927
| pub struct PluginFilterEntry { | ||
| #[serde(default)] | ||
| pub include: Option<String>, | ||
| #[serde(default)] | ||
| pub exclude: Option<String>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking how this is used, an enum seems more appropriate.
| pub struct PluginFilterEntry { | |
| #[serde(default)] | |
| pub include: Option<String>, | |
| #[serde(default)] | |
| pub exclude: Option<String>, | |
| } | |
| pub enum PluginFilterEntry { | |
| Include(String), | |
| Exclude(String), | |
| } |
I would even go a step further with:
| pub struct PluginFilterEntry { | |
| #[serde(default)] | |
| pub include: Option<String>, | |
| #[serde(default)] | |
| pub exclude: Option<String>, | |
| } | |
| pub enum PluginFilterEntry { | |
| Include(glob::Pattern), | |
| Exclude(glob::Pattern), | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this comment from @reubenmiller I'm having second thoughts on converting that struct definition into an enum as it prevents users from defining both include and exclude filters under the same plugin entry as follows:
[[plugins.journald]]
exclude = "systemd-*"
include = "systemd-logind"
The struct might be more future-proof as well if we want to support more fields beyond include and exclude in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is not to prevent a combination of include and exclude rules for a given plugin, but to simplify the representation and the implementation logic. Note that currently a Vec<PluginFilterEntry>> is attached to each log type, making the enum enough and clearer. Two different structs might be required, though. A first one for the TOML file, aka PluginConfig, a second one to control the extent of log types to manage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have done the compilation to regex pattern in 2d9f927. The enum couldn't be done due to the possibility of both include and exclude filters under the [[plugins.journald]] entry, unless we use an enum like:
enum Filter {
Include(Regex)
Exclude(Regex)
Both(Regex)
}
which would make little difference in the filtering logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay let's keep the struct.
However, I disagree with the rationale. The in-memory representation of the filters doesn't have to match the external schema of the configuration file. We could have a vector of enum values with two cases or two vectors (one for the included regex the other for the excluded ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a vector of enum values with two cases or two vectors (one for the included regex the other for the excluded ones)
Aah yes. For the internal representation, we can absolutely do that. I was only thinking of the external struct used by serde.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved that as well in 9d4bbb6 (squashed commit) with a TomlPluginConfig struct used for parsing and a compiled PluginConfig struct that maintains include_patterns and exclude_patterns separately.
| } | ||
| }, | ||
| Err(_) => { | ||
| // File doesn't exist or not readable - this is OK, just use defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but it would be good to warn the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. I would be careful with the wording if this is potentially not actually an issue, just highlighting that we didn't read the configuration from the specified file. If the default behaviour of thin-edge is that this file doesn't exist, this warning will appear often, and we should make it clear that this is not an issue. I may also be misassuming how things work in this case, and it may be the case that we can cope but this is a weird edge case that the user should fix, in which case a normal warning would be entirely justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by 2d9f927.
docs/src/extend/log-management.md
Outdated
| ### Configuration Format | ||
|
|
||
| Use the `[[plugins.<plugin-name>]]` sections to define filters. | ||
| Both `include` and `exclude` support [glob patterns](https://en.wikipedia.org/wiki/Glob_(programming)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth using regex instead of glob patterns to be consistent with other parts of tedge (e.g. the sm plugin filtering)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we are using glob patterns for file paths in config/log management. However, I also agree with using regex patterns for filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to regex patterns in 2d9f927.
docs/src/extend/log-management.md
Outdated
| [[plugins.journald]] | ||
| include = "systemd-logind" | ||
|
|
||
| [[plugins.journald]] | ||
| exclude = "systemd-*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the chosen data format here, as it involves arrays of objects, which makes it a bit difficult to manipulate. Could we just have a flat object structure and change the "include" and "exclude" keys to be an array?
e.g.
[plugin.journald]
include = ["systemd-logind"]
exclude = ["systemd-*"]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've had much deliberation on this format. And I agree that the very first format that came to my mind was exactly what you've proposed, which is fairly intuitive as well. But I went with the array of objects for easier manipulation and extensibility.
I agree that all include patterns in a single array would be easier for manual manipulation, but if a user wants to programatically add a new include pattern, it'd be a lot more difficult to append to an existing array, right?
With the proposed array of objects patterns, it can be a simple append of the following block, without having to know the existing values in the pattern:
[[plugin.journald]]
include = "new-pattern-*"
Programmatically removing an existing pattern looks equally difficult with both formats, but I'm expecting more dynamic appends than removals.
This format would also come in handy in future, if/when we add support for drop-in extension files for tedge-log-plugin.toml. In that case also, a new include/exclude pattern for a plugin can be defined in an extension file the exact same way, even if include/exclude patterns for the same plugin are defined in the main tedge-log-plugin.toml` or any other extension file. But if we go with the simple format that you've proposed, a new pattern cannot be added to an extension file if the include patterns for that same plugin is defined elsewhere.
These were the reasons why I went with the array of objects format, although I agree that it is not the most intuitive format for someone new to TOML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering if the data model is essentially an array of objects, rather than an array of strings, that this makes it harder to manipulate the data in the future.
Though is there anything stopping people also providing both the include and exclude properties within one array item, e.g.
[[plugins.journald]]
include = "systemd-logind"
exclude = "systemd-*"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though is there anything stopping people also providing both the include and exclude properties within one array item, e.g.
[[plugins.journald]] include = "systemd-logind" exclude = "systemd-*"
That's perfectly valid as the keys are different (include and exclude).
Or did you mean to ask if multiple include filters are allowed under the same object array as follows:
[[plugins.journald]]
include = "abc"
include = "xyz"
OR
[[plugins.journald]]
include = ["abc", "def"]
include = ["xyz"]
and expect all the include patterns to be combined into a single include array?
That's an invalid toml due to key duplication under the same table. The second include pattern must be placed in a separate [[plugins.journald]] block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I just meant the example that I posted. I just wanted to make sure that it also worked, as adding two array items seemed strange when you want to just define an include and an exclude.
After the demo, I'm ok with the overall data structure (as long as the user can also provided include and exclude on the same array item).
| pub fn from_file(path: &std::path::Path) -> Self { | ||
| match std::fs::read_to_string(path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub fn from_file(path: &std::path::Path) -> Self { | |
| match std::fs::read_to_string(path) { | |
| pub async fn from_file(path: &std::path::Path) -> Self { | |
| match tokio::fs::read_to_string(path).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by 2d9f927.
| let matches_include = include_patterns.iter().any(|p| p.matches(log_type)); | ||
| let matches_exclude = exclude_patterns.iter().any(|p| p.matches(log_type)); | ||
|
|
||
| // When both include and exclude patterns exist, use OR logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused by this logic. Essentially this means if I have an include filter and I add an exclude filter, that will potentially cause more items to match than before we added the second filter rule. I think my natural expectation is for the two to be ANDed together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I agree with you that the natural expectation would be an AND. But this OR combination was implemented because the sm-plugin filtering also does the same and we wanted to keep it consistent. Actually there are use-cases for both.
If it was AND I could define rules like:
[[plugin.journald]]
include = "tedge-*"
exclude = "tedge-watcher"
where we want to include everything matching tedge-* but selectively exclude tedge-watcher from it.
Now, the ORcan also be used to define the reverse rule:
[[plugin.journald]]
exclude = "systemd-*"
include = "systemd-logind"
where we exclude everything systemd-* but would like to selectively override that blanket exclusion by including systemd-logind.
Both are useful in my opinion and for some reason we chose the latter for sm-plugins and we're still sticking to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What confuses me is the current implementation is the default values.
- When no
excluderules are given it means exclude everything, i.e. take only what is explicitly included. - When no
includerules are given it means include everything, unless there is an exclude rule in which case it means include nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What confuses me is the current implementation is the default values.
- When no
excluderules are given it means exclude everything, i.e. take only what is explicitly included.- When no
includerules are given it means include everything, unless there is an exclude rule in which case it means include nothing.
I totally agree that it is difficult to easily put the meaning of that combined case into words as can be done for the other cases as follows:
- Neither
includeorexclude: Include everything includeonly: Include just what is specifiedexcludeonly: Include everything that is not excluded
But the combined case is really difficult to put in a sentence:
includeandexclude: Include everything that is not excluded + what an be selectively included from the excluded list
If we had done the AND case, it was a lot easier: Include everything matching include, but further filter that with those specified in exclude.
For my brain, the AND is easier to comprehend than the OR.
But, I believe the reasoning for that choice was that users are more likely to provide exclusion filters for what they don't want, rather than providing inclusion filters for everything that they want. Once an exclude filter is provided, people would mostly want to selectively include a few out of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, I believe the reasoning for that choice was that users are more likely to provide
exclusionfilters for what they don't want, rather than providinginclusionfilters for everything that they want. Once anexcludefilter is provided, people would mostly want to selectively include a few out of those.
If so, the doc must highlight it and sounds natural.
One way to do so might be to a layman AND ALSO as a substitute for the formal OR.
As done with a + in the following sentence:
Include everything that is not excluded + what an be selectively included from the excluded list
Are retained all the log types that are not excluded and also those explicitly included.
docs/src/extend/log-management.md
Outdated
| ### Configuration Format | ||
|
|
||
| Use the `[[plugins.<plugin-name>]]` sections to define filters. | ||
| Both `include` and `exclude` support [glob patterns](https://en.wikipedia.org/wiki/Glob_(programming)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we are using glob patterns for file paths in config/log management. However, I also agree with using regex patterns for filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
This PR has to be rebased on top of #3846, to see the system tests fixed.
rina23q
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just have minor system test improvement suggestions.
| Library ThinEdgeIO | ||
|
|
||
| Suite Setup Custom Setup | ||
| Test Setup Custom Setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you changed Suite Setup to Test Setup, I think Set Suite Variable should also be changed to Set Test Variable in the Custom Setup.
Custom Setup
${DEVICE_SN}= Setup
Set Suite Variable $DEVICE_SN # this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in the squashed commit bd7c2a9
| # Clean up any existing plugins for test isolation | ||
| Execute Command rm /usr/share/tedge/log-plugins/file | ||
| Execute Command rm /usr/share/tedge/log-plugins/journald | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why dmsg is not removed in this test?
Execute Command rm /usr/share/tedge/log-plugins/dmesg
and if it's forgotten, probably it's better to create a key, like Cleanup existing plugins since these removals repeat in several tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in the squashed commit bd7c2a9
34480cf to
9d4bbb6
Compare
Allow users to filter log types reported by log plugins using include and exclude patterns in the plugin configuration.
9d4bbb6 to
bd7c2a9
Compare
Allow users to filter log types reported by log plugins using include and exclude patterns in the plugin configuration.
Types of changes
Paste Link to the issue
Checklist
just prepare-devonce)just formatas mentioned in CODING_GUIDELINESjust checkas mentioned in CODING_GUIDELINESFurther comments