-
Notifications
You must be signed in to change notification settings - Fork 69
fix: flaky tedge flows tests #3846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: flaky tedge flows tests #3846
Conversation
| ${start} Get Unix Timestamp | ||
| Install Flow append-to-file.toml | ||
| Should Have MQTT Messages | ||
| ... topic=te/device/main/service/tedge-flows/status/flows | ||
| ... minimum=1 | ||
| ... message_contains=append-to-file | ||
| ... date_from=${start} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not working, but why? The messages are correctly published by tedge flows:
[te/device/main/service/tedge-flows/status/flows] {"flow":"/etc/tedge/flows/append-to-file.toml","status":"updated","time":1761916482}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also failed to improve Install Flow with the following:
Install Flow
[Arguments] ${definition_file}
${start} Get Unix Timestamp
ThinEdgeIO.Transfer To Device ${CURDIR}/input-ext/${definition_file} /etc/tedge/flows/
Should Have MQTT Messages
... topic=te/device/main/service/tedge-flows/status/flows
... minimum=1
... message_contains=${definition_file}
... date_from=${start}
Any clue about what I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the main error is here:
ThinEdgeIO.Transfer To Device ${CURDIR}/input-ext/${definition_file} /etc/tedge/flows/
${CURDIR}/input-extdoesn't exist. The correct directory is${CURDIR}/flow/input-ext- In such a case
ThinEdgeIO.Transfer To Devicedoesn't fail, making the error less visible - Also
ThinEdgeIO.Transfer To Devicecopies nothing when the second arg doesn't have a trailing/.
But why things were sometimes working?
- All the flows under
${CURDIR}/flow/input-extwere actually installed byCopy Configuration Filesand loaded by tedge flows (because being under a sub-directory) - In other words, the changes introduced here were changing nothing from a test perspective: all the flows were loading on start :-(.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> A fix has been pushed (and squashed as the previous commit was fully incorrect):
- The input flows examples have been out the main directory of samples (so no more silently installed)
Copy Configuration Fileshas been fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: A PR has been created to update the robotframework-devicelibrary which adds a check to the Transfer to Device keyword where at least 1 file must be found, otherwise an error is thrown. This should help identifier simple mistakes like using the wrong source path more quickly.
9df9eee to
d53c456
Compare
| } | ||
|
|
||
| fn flow_status(flow: &str, status: &str, time: &OffsetDateTime) -> MqttMessage { | ||
| let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded for now. This will be done properly in #3843
Robot Results
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
rina23q
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root cause descriptions and the test code changes totally make sense.
In addition, the suite execution gets faster than before. I also ran 50 iterations in my local and all passed.
albinsuresh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is working as expected. I just have one question about the topic used for publishing the status. Happy to approve once that's resolved.
| } | ||
|
|
||
| fn flow_status(flow: &str, status: &str, time: &OffsetDateTime) -> MqttMessage { | ||
| let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows"); | |
| let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flow"); |
Since the payload is the status of a single flow and not an aggregate stat across all flows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but the feature is tedge flows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. We can take a final call on this based on whether flow names would be included in the topic or not. In the current state where all flow stats are published to the same topic, this seems appropriate.
| fn flow_status(flow: &str, status: &str, time: &OffsetDateTime) -> MqttMessage { | ||
| let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows"); | ||
| let payload = json!({ | ||
| "flow": flow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it'd be better to include the flow name in the topic instead of the payload so that these status messages can be published as retained messages per flow. That'll help the user get an overview of the status of all the flows by subscribing to status/flow/# topic.
@reubenmiller Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to address the exact topic/s which the stats are published to in a follow up PR...so I assume a single topic was just chosen for this PR, and the follow-up PR will publish them on individual topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include the
flowname in the topic instead of the payload so that these status messages can be published as retained messages per flow.
I considered that but decided against it for now. Indeed this raises other questions, such as "what about a flow which definition is removed while the mapper was down?". The goal of this PR was more to quickly come with a fix.
| Copy Configuration Files | ||
| ThinEdgeIO.Transfer To Device ${CURDIR}/flows/* /etc/tedge/flows/ | ||
| ThinEdgeIO.Transfer To Device ${CURDIR}/flows/*.js /etc/tedge/flows/ | ||
| ThinEdgeIO.Transfer To Device ${CURDIR}/flows/*.toml /etc/tedge/flows/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side comment unrelated to this PR: We could consider installing the relevant flow at the test level for older tests as well in an independent PR.
|
|
||
| async fn run(mut self) -> Result<(), RuntimeError> { | ||
| self.send_updated_subscriptions().await?; | ||
| self.notify_flows_status().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a deal breaker or anything, but I see duplicate status/flow messages whenever flows are dynamically installed. I believe it's because of the fs-notify issue where a newly created file emits both Modified and Updated events. Wondering if we can react to Modified events only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have some generic logic which debounces inotify notifications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have some generic logic which debounces inotify notifications?
Yes we have. But there are still some surprising event sequences.
I believe it's because of the fs-notify issue where a newly created file emits both Modified and Updated events. Wondering if we can react to Modified events only.
For now, we are on the safe side - with duplicated events. I will need some time to investigate if only reacting to Modified is correct.
albinsuresh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as the goal of this PR was to quickly fix the flaky tests and the topic schemes would be finalised separately in #3843.
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
2570950 to
13a2c59
Compare
Proposed changes
Types of changes
Paste Link to the issue
Checklist
just prepare-devonce)just formatas mentioned in CODING_GUIDELINESjust checkas mentioned in CODING_GUIDELINESFurther comments