poc: rebase tedge mea db onto main #3779

jarhodes314 · 2025-09-12T10:07:19Z

Proposed changes

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#510

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s. You can activate automatic signing by running just prepare-dev once)
I ran just format as mentioned in CODING_GUIDELINES
I used just check as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

codecov · 2025-09-12T10:26:49Z

Codecov Report

❌ Patch coverage is 73.60248% with 255 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/extensions/tedge_flows/src/database.rs	80.76%	72 Missing and 19 partials ⚠️
crates/extensions/tedge_flows/src/bin/db_bench.rs	0.00%	60 Missing ⚠️
crates/extensions/tedge_flows/src/bin/db_dump.rs	0.00%	36 Missing ⚠️
crates/extensions/tedge_flows/src/runtime.rs	77.21%	34 Missing and 2 partials ⚠️
crates/extensions/tedge_flows/src/actor.rs	77.77%	9 Missing and 7 partials ⚠️
crates/extensions/tedge_flows/src/flow.rs	77.77%	6 Missing ⚠️
crates/extensions/tedge_flows/src/config.rs	88.37%	3 Missing and 2 partials ⚠️
crates/extensions/tedge_flows/src/input_source.rs	95.83%	2 Missing and 2 partials ⚠️
crates/core/tedge/src/cli/flows/test.rs	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jarhodes314 · 2025-09-25T16:14:50Z

Database comparison

I've been comparing two database implementations. The first is fjall, an embedded key-value store written in Rust (what we settled on for the original hackathon implementation). The other is sqlite. Sqlite is advantageous if we want to expose the database for other users.

Performance

To measure the perfomance of the database APIs tedge-flows is using, I created a simple benchmark that inserts batches of data to three different series in a loop, then drains one of the series entirely. In practice, database IO will also have a necessary cost of invoking a javascript-based flow, but since that will execute identically with either database backend in use, I didn't attempt to measure the impact of this.

In terms of code-changes, I changed very little of my original implementation before measuring the performance. The only change I made was to use transactions when inserting multiple values, as this has a hugely significant impact on sqlite insertion performance. I used the equivalent "batch" feature of fjall as well, though in this case, this had a more modest impact on insert performance.

In terms of performance, at least with my current implementation, fjall is significantly quicker (>10x) to insert to and somewhat quicker to drain from. Sqlite takes up a bit less disk space, but not massively so.

> ./target/release/db_bench fjall bench2
Benchmarking fjall backend with 15000 inserts and 5000 drains from bench.fjall
Inserted 15000 items (across 15 batches of 1000) in 18.550153ms
Drained 5000 items in 9.597644ms

> ./target/release/db_bench sqlite
Benchmarking sqlite backend with 15000 inserts and 5000 drains from bench.sqlite
Inserted 15000 items (across 15 batches of 1000) in 300.421601ms
Drained 5000 items in 26.938409ms

> du -sh bench.*
4.2M    bench.fjall
3.3M    bench.sqlite

With larger insertions, the IO performance is pretty similar to before (though sqlite is quicker for draining the large number of records). It appears the disk usage of fjall is quite significantly lower (~3x) than the disk usage of sqlite.

> rm -rf bench.*

> ./target/release/db_bench sqlite 100000
Benchmarking sqlite backend with 1500000 inserts and 500000 drains from bench.sqlite
Inserted 1500000 items (across 15 batches of 100000) in 16.10093498s
Drained 500000 items in 1.294142794s

> ./target/release/db_bench fjall 100000
Benchmarking fjall backend with 1500000 inserts and 500000 drains from bench.fjall
Inserted 1500000 items (across 15 batches of 100000) in 1.890891907s
Drained 500000 items in 931.588034ms

> du -sh bench.*
58M     bench.fjall
193M    bench.sqlite

Impact on binary size

Adding the fjall based database to the flows feature in the tedge binary results in a 3.4% increase to the binary size. The impact from (statically-linked) sqlite is over twice that (8.1%). Because we use musl builds, we cannot dynamically link to sqlite, so I didn't include this in the comparison.

=== tedge Binary Size Comparison ===
Configuration        | Size (bytes) | Size (MB) | Overhead
---------------------|--------------|-----------|----------------
No Database:         | 18696688     | 17 MB     | baseline
Fjall Database:      | 19327584     | 18 MB     | +616.109375 KB
SQLite Database:     | 20212200     | 19 MB     | +1479.992188 KB

=== Database Overhead Analysis ===
Fjall overhead:  616.109375 KB (3.4%)
SQLite overhead: 1479.992188 KB (8.1%)
SQLite vs Fjall: +863.882812 KB (4.6%)

didier-wenzek

The current design must be revisited before adding new kind of message sources.

didier-wenzek · 2025-09-30T14:34:40Z

crates/extensions/tedge_flows/src/actor.rs

            tokio::select! {
                _ = interval.tick() => {
+                    let drained_messages = self.drain_db().await?;
+                    self.on_messages(MessageSource::MeaDB, drained_messages).await?;


This design is a legacy of the POC and is actually not so neat.

What is wrong, is that all the flows are requested for messages drained from the db (in self.drain_db()), and then these messages are produced to all the flows (in self.on_messages()) while at most one flow is interested by each: the former source flow! This is:

inefficient: messages drained from the DB are even produced to flows unrelated to the MEA DB

complicated: each message must be attached a source such as MessageSource::MeaDB

not extensible: a new method is required for each kind of message sources (similar to self.drain_db()).

A better approach would be to add a on_interval() method on the sources themselves. Only the messages produced by the source of a flow will then be processed by the flow. This can even be improved with MessageSource trait. But the key point is that the flow actor has no more to be updated for each new source kind.

didier-wenzek · 2025-09-30T14:38:52Z

crates/extensions/tedge_flows/src/flow.rs

+    pub fn accept(&self, source: MessageSource, message_topic: &str) -> bool {
+        match &self.input {
+            FlowInput::Mqtt {
+                topics: input_topics,
+            } => source == MessageSource::Mqtt && input_topics.accept_topic_name(message_topic),
+            FlowInput::MeaDB { .. } => source == MessageSource::MeaDB,
+        }
+    }


With the proposed design, a flow is no more requested to accept or not a message that has been produced by a random source. A flow will only have to process messages produced by its own source.

didier-wenzek · 2025-09-30T14:45:47Z

crates/extensions/tedge_flows/src/flow.rs

    }
 }

 impl FlowInput {


The main idea of the proposed design is to let the source generates messages on some configured interval.

impl FlowInput { pub async fn on_interval( &mut self, timestamp: &DateTime, ) -> Result<Vec<Message>, FlowError> { match self { FlowInput::Mqtt { topics } => Ok(vec![]), FlowInput::MeaDB { .. } => { drain_db().await } } } }

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

…r testing

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 had a problem deploying to Test Pull Request September 12, 2025 10:07 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 12, 2025 11:07 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 12, 2025 12:27 — with GitHub Actions Failure

jarhodes314 force-pushed the poc/tedge-mea-db-2 branch from be7a950 to a8a4dd9 Compare September 12, 2025 12:28

jarhodes314 had a problem deploying to Test Pull Request September 12, 2025 12:28 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 12, 2025 12:53 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 16, 2025 09:25 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 17, 2025 10:42 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 25, 2025 16:15 — with GitHub Actions Failure

jarhodes314 force-pushed the poc/tedge-mea-db-2 branch from e0142da to fd336ad Compare September 25, 2025 16:54

jarhodes314 had a problem deploying to Test Pull Request September 25, 2025 16:54 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 26, 2025 09:00 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request September 29, 2025 12:43 — with GitHub Actions Failure

didier-wenzek mentioned this pull request Sep 29, 2025

poc: tedge MEA DB #3725

Closed

11 tasks

jarhodes314 had a problem deploying to Test Pull Request September 30, 2025 12:41 — with GitHub Actions Failure

didier-wenzek mentioned this pull request Sep 30, 2025

refactor: use more specific types #3799

Merged

11 tasks

didier-wenzek reviewed Sep 30, 2025

View reviewed changes

jarhodes314 force-pushed the poc/tedge-mea-db-2 branch from 8749b8e to 22a4149 Compare September 30, 2025 16:19

jarhodes314 had a problem deploying to Test Pull Request September 30, 2025 16:19 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request October 1, 2025 10:10 — with GitHub Actions Failure

didier-wenzek mentioned this pull request Oct 3, 2025

refactor: improve handling of intervals in tedge-flows #3801

Merged

11 tasks

jarhodes314 force-pushed the poc/tedge-mea-db-2 branch from 449c5ce to 95c2d7a Compare October 3, 2025 13:17

jarhodes314 had a problem deploying to Test Pull Request October 3, 2025 13:17 — with GitHub Actions Failure

jarhodes314 force-pushed the poc/tedge-mea-db-2 branch from 95c2d7a to 776faa3 Compare October 3, 2025 16:49

jarhodes314 had a problem deploying to Test Pull Request October 3, 2025 16:49 — with GitHub Actions Failure

jarhodes314 force-pushed the poc/tedge-mea-db-2 branch from 776faa3 to 593725e Compare October 9, 2025 14:07

jarhodes314 had a problem deploying to Test Pull Request October 9, 2025 14:07 — with GitHub Actions Failure

jarhodes314 had a problem deploying to Test Pull Request October 10, 2025 13:06 — with GitHub Actions Failure

didier-wenzek and others added 8 commits October 13, 2025 12:00

Gen-mapper pipelines can drain data out MeaDB

426f785

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Gen-mapper pipelines can persist data in MeaDB

10ef46e

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

add initial mea database implementation to the generic mapper

9483664

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Add example: pipeline using mea-db

dfab02c

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Remove oldest message timestamp cache

2fa0e5b

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

Distinguish two message sources: MQTT vs DB

b808db7

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>

Refactor database into separate module, and add in memory database fo…

2cc44b2

…r testing

Add sqlite implementation

4d3322f

Signed-off-by: James Rhodes <jarhodes314@gmail.com>

jarhodes314 force-pushed the poc/tedge-mea-db-2 branch from 633c167 to 4d3322f Compare October 13, 2025 11:14

jarhodes314 temporarily deployed to Test Pull Request October 13, 2025 11:14 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

poc: rebase tedge mea db onto main #3779

poc: rebase tedge mea db onto main #3779

Uh oh!

jarhodes314 commented Sep 12, 2025 •

edited by reubenmiller

Loading

Uh oh!

codecov bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

jarhodes314 commented Sep 25, 2025

Uh oh!

didier-wenzek left a comment

Uh oh!

didier-wenzek Sep 30, 2025

Uh oh!

didier-wenzek Sep 30, 2025

Uh oh!

didier-wenzek Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

poc: rebase tedge mea db onto main #3779

Are you sure you want to change the base?

poc: rebase tedge mea db onto main #3779

Uh oh!

Conversation

jarhodes314 commented Sep 12, 2025 • edited by reubenmiller Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

Uh oh!

codecov bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jarhodes314 commented Sep 25, 2025

Database comparison

Performance

Impact on binary size

Uh oh!

didier-wenzek left a comment

Choose a reason for hiding this comment

Uh oh!

didier-wenzek Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

didier-wenzek Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

didier-wenzek Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jarhodes314 commented Sep 12, 2025 •

edited by reubenmiller

Loading

codecov bot commented Sep 12, 2025 •

edited

Loading