-
Notifications
You must be signed in to change notification settings - Fork 70
poc: rebase tedge mea db onto main #3779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
be7a950 to
a8a4dd9
Compare
Database comparisonI've been comparing two database implementations. The first is fjall, an embedded key-value store written in Rust (what we settled on for the original hackathon implementation). The other is sqlite. Sqlite is advantageous if we want to expose the database for other users. PerformanceTo measure the perfomance of the database APIs tedge-flows is using, I created a simple benchmark that inserts batches of data to three different series in a loop, then drains one of the series entirely. In practice, database IO will also have a necessary cost of invoking a javascript-based flow, but since that will execute identically with either database backend in use, I didn't attempt to measure the impact of this. In terms of code-changes, I changed very little of my original implementation before measuring the performance. The only change I made was to use transactions when inserting multiple values, as this has a hugely significant impact on sqlite insertion performance. I used the equivalent "batch" feature of fjall as well, though in this case, this had a more modest impact on insert performance. In terms of performance, at least with my current implementation, fjall is significantly quicker (>10x) to insert to and somewhat quicker to drain from. Sqlite takes up a bit less disk space, but not massively so. > ./target/release/db_bench fjall bench2
Benchmarking fjall backend with 15000 inserts and 5000 drains from bench.fjall
Inserted 15000 items (across 15 batches of 1000) in 18.550153ms
Drained 5000 items in 9.597644ms
> ./target/release/db_bench sqlite
Benchmarking sqlite backend with 15000 inserts and 5000 drains from bench.sqlite
Inserted 15000 items (across 15 batches of 1000) in 300.421601ms
Drained 5000 items in 26.938409ms
> du -sh bench.*
4.2M bench.fjall
3.3M bench.sqliteWith larger insertions, the IO performance is pretty similar to before (though sqlite is quicker for draining the large number of records). It appears the disk usage of fjall is quite significantly lower (~3x) than the disk usage of sqlite. > rm -rf bench.*
> ./target/release/db_bench sqlite 100000
Benchmarking sqlite backend with 1500000 inserts and 500000 drains from bench.sqlite
Inserted 1500000 items (across 15 batches of 100000) in 16.10093498s
Drained 500000 items in 1.294142794s
> ./target/release/db_bench fjall 100000
Benchmarking fjall backend with 1500000 inserts and 500000 drains from bench.fjall
Inserted 1500000 items (across 15 batches of 100000) in 1.890891907s
Drained 500000 items in 931.588034ms
> du -sh bench.*
58M bench.fjall
193M bench.sqliteImpact on binary sizeAdding the fjall based database to the flows feature in the |
e0142da to
fd336ad
Compare
didier-wenzek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design must be revisited before adding new kind of message sources.
| tokio::select! { | ||
| _ = interval.tick() => { | ||
| let drained_messages = self.drain_db().await?; | ||
| self.on_messages(MessageSource::MeaDB, drained_messages).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design is a legacy of the POC and is actually not so neat.
What is wrong, is that all the flows are requested for messages drained from the db (in self.drain_db()), and then these messages are produced to all the flows (in self.on_messages()) while at most one flow is interested by each: the former source flow! This is:
- inefficient: messages drained from the DB are even produced to flows unrelated to the MEA DB
- complicated: each message must be attached a source such as
MessageSource::MeaDB - not extensible: a new method is required for each kind of message sources (similar to
self.drain_db()).
A better approach would be to add a on_interval() method on the sources themselves. Only the messages produced by the source of a flow will then be processed by the flow. This can even be improved with MessageSource trait. But the key point is that the flow actor has no more to be updated for each new source kind.
| pub fn accept(&self, source: MessageSource, message_topic: &str) -> bool { | ||
| match &self.input { | ||
| FlowInput::Mqtt { | ||
| topics: input_topics, | ||
| } => source == MessageSource::Mqtt && input_topics.accept_topic_name(message_topic), | ||
| FlowInput::MeaDB { .. } => source == MessageSource::MeaDB, | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the proposed design, a flow is no more requested to accept or not a message that has been produced by a random source. A flow will only have to process messages produced by its own source.
| } | ||
| } | ||
|
|
||
| impl FlowInput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea of the proposed design is to let the source generates messages on some configured interval.
impl FlowInput {
pub async fn on_interval(
&mut self,
timestamp: &DateTime,
) -> Result<Vec<Message>, FlowError> {
match self {
FlowInput::Mqtt { topics } => Ok(vec![]),
FlowInput::MeaDB { .. } => {
drain_db().await
}
}
}
}
8749b8e to
22a4149
Compare
449c5ce to
95c2d7a
Compare
95c2d7a to
776faa3
Compare
776faa3 to
593725e
Compare
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: James Rhodes <jarhodes314@gmail.com>
633c167 to
4d3322f
Compare
Proposed changes
Types of changes
Paste Link to the issue
#510
Checklist
just prepare-devonce)just formatas mentioned in CODING_GUIDELINESjust checkas mentioned in CODING_GUIDELINESFurther comments