Skip to content

Conversation

@didier-wenzek
Copy link
Contributor

@didier-wenzek didier-wenzek commented Oct 31, 2025

Proposed changes

  • Tedge flows publish enabled/updated/removed status
  • Fix flaky tedge flows tests needing to wait a flow be activated before sending input messages

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s. You can activate automatic signing by running just prepare-dev once)
  • I ran just format as mentioned in CODING_GUIDELINES
  • I used just check as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Comment on lines 142 to 148
${start} Get Unix Timestamp
Install Flow append-to-file.toml
Should Have MQTT Messages
... topic=te/device/main/service/tedge-flows/status/flows
... minimum=1
... message_contains=append-to-file
... date_from=${start}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not working, but why? The messages are correctly published by tedge flows:

[te/device/main/service/tedge-flows/status/flows] {"flow":"/etc/tedge/flows/append-to-file.toml","status":"updated","time":1761916482}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also failed to improve Install Flow with the following:

Install Flow
    [Arguments]    ${definition_file}
    ${start}    Get Unix Timestamp
    ThinEdgeIO.Transfer To Device    ${CURDIR}/input-ext/${definition_file}    /etc/tedge/flows/
    Should Have MQTT Messages
    ...    topic=te/device/main/service/tedge-flows/status/flows
    ...    minimum=1
    ...    message_contains=${definition_file}
    ...    date_from=${start}

Any clue about what I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main error is here:

ThinEdgeIO.Transfer To Device    ${CURDIR}/input-ext/${definition_file}    /etc/tedge/flows/
  • ${CURDIR}/input-ext doesn't exist. The correct directory is ${CURDIR}/flow/input-ext
  • In such a case ThinEdgeIO.Transfer To Device doesn't fail, making the error less visible
  • Also ThinEdgeIO.Transfer To Device copies nothing when the second arg doesn't have a trailing /.

But why things were sometimes working?

  • All the flows under ${CURDIR}/flow/input-ext were actually installed by Copy Configuration Files and loaded by tedge flows (because being under a sub-directory)
  • In other words, the changes introduced here were changing nothing from a test perspective: all the flows were loading on start :-(.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> A fix has been pushed (and squashed as the previous commit was fully incorrect):

  • The input flows examples have been out the main directory of samples (so no more silently installed)
  • Copy Configuration Files has been fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: A PR has been created to update the robotframework-devicelibrary which adds a check to the Transfer to Device keyword where at least 1 file must be found, otherwise an error is thrown. This should help identifier simple mistakes like using the wrong source path more quickly.

}

fn flow_status(flow: &str, status: &str, time: &OffsetDateTime) -> MqttMessage {
let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded for now. This will be done properly in #3843

@github-actions
Copy link
Contributor

github-actions bot commented Oct 31, 2025

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
731 0 3 731 100 2h19m9.888622s

@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 46.87500% with 17 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/extensions/tedge_flows/src/actor.rs 46.87% 15 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@didier-wenzek didier-wenzek marked this pull request as ready for review October 31, 2025 20:13
Copy link
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause descriptions and the test code changes totally make sense.
In addition, the suite execution gets faster than before. I also ran 50 iterations in my local and all passed.

@reubenmiller reubenmiller changed the title fix: flacky tedge flows tests fix: flaky tedge flows tests Nov 3, 2025
@reubenmiller reubenmiller added skip-release-notes Don't include the ticket in the auto generated release notes flaky test Label given to flaky or otherwise faulty tests labels Nov 3, 2025
Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is working as expected. I just have one question about the topic used for publishing the status. Happy to approve once that's resolved.

}

fn flow_status(flow: &str, status: &str, time: &OffsetDateTime) -> MqttMessage {
let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows");
Copy link
Contributor

@albinsuresh albinsuresh Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows");
let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flow");

Since the payload is the status of a single flow and not an aggregate stat across all flows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but the feature is tedge flows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. We can take a final call on this based on whether flow names would be included in the topic or not. In the current state where all flow stats are published to the same topic, this seems appropriate.

fn flow_status(flow: &str, status: &str, time: &OffsetDateTime) -> MqttMessage {
let topic = Topic::new_unchecked("te/device/main/service/tedge-flows/status/flows");
let payload = json!({
"flow": flow,
Copy link
Contributor

@albinsuresh albinsuresh Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it'd be better to include the flow name in the topic instead of the payload so that these status messages can be published as retained messages per flow. That'll help the user get an overview of the status of all the flows by subscribing to status/flow/# topic.

@reubenmiller Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to address the exact topic/s which the stats are published to in a follow up PR...so I assume a single topic was just chosen for this PR, and the follow-up PR will publish them on individual topics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include the flow name in the topic instead of the payload so that these status messages can be published as retained messages per flow.

I considered that but decided against it for now. Indeed this raises other questions, such as "what about a flow which definition is removed while the mapper was down?". The goal of this PR was more to quickly come with a fix.

Copy Configuration Files
ThinEdgeIO.Transfer To Device ${CURDIR}/flows/* /etc/tedge/flows/
ThinEdgeIO.Transfer To Device ${CURDIR}/flows/*.js /etc/tedge/flows/
ThinEdgeIO.Transfer To Device ${CURDIR}/flows/*.toml /etc/tedge/flows/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment unrelated to this PR: We could consider installing the relevant flow at the test level for older tests as well in an independent PR.


async fn run(mut self) -> Result<(), RuntimeError> {
self.send_updated_subscriptions().await?;
self.notify_flows_status().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a deal breaker or anything, but I see duplicate status/flow messages whenever flows are dynamically installed. I believe it's because of the fs-notify issue where a newly created file emits both Modified and Updated events. Wondering if we can react to Modified events only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have some generic logic which debounces inotify notifications?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have some generic logic which debounces inotify notifications?

Yes we have. But there are still some surprising event sequences.

I believe it's because of the fs-notify issue where a newly created file emits both Modified and Updated events. Wondering if we can react to Modified events only.

For now, we are on the safe side - with duplicated events. I will need some time to investigate if only reacting to Modified is correct.

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as the goal of this PR was to quickly fix the flaky tests and the topic schemes would be finalised separately in #3843.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
@didier-wenzek didier-wenzek force-pushed the fix/flacky-tedge-flows-tests branch from 2570950 to 13a2c59 Compare November 5, 2025 08:43
@didier-wenzek didier-wenzek added this pull request to the merge queue Nov 5, 2025
Merged via the queue into thin-edge:main with commit 969dcee Nov 5, 2025
51 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flaky test Label given to flaky or otherwise faulty tests skip-release-notes Don't include the ticket in the auto generated release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants