Comments about design document #34

agustinmista · 2025-11-07T08:54:43Z

agustinmista
Nov 7, 2025

Hey 👋

For lack of a better way, I will try to add comments about the design document here.

agustinmista · 2025-11-07T09:45:01Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 25 to 36 in 72c4c13

    
           The tests we'd like to expose to alternative implementations are in the "Node 
        
           vs Environment" style. In effect, while we are ultimately interested in the 
        
           behavior of multiple nodes agreeing on the "right" chain, we can more easily 
        
           test by taking advantage of two insights: 
        
           1. The logic of identifying the honest chain locally is tricky, but it is very 
        
              easy to identify globally. Since we are only interested in cases where there 
        
              is a global best chain, we have a very simple judgment rule as to whether 
        
              a node has selected the correct one. 
        
           2. Once we have an easily identified honest chain, we no longer need to 
        
              simulate multiple nodes and look for agreement - instead, we run only 
        
              a single node and judge the correctness of its responses to stimuli.

While it's very important to acknowledge that most interesting properties cannot be expressed in terms of a single isolated node (making it a good argument for the need of a sophisticated testing approach involving point schedules), there exist (to the best of my limited knowledge) some properties that are always true in a vacuum. E.g.:

Honest nodes should always choose the longest of two competing chains (modulo the details I'm probably not aware of). For Peras: s/longest/heaviest/.
Honest nodes should always disconnect from nodes sending them invalid data (i.e. assuming them as being adversarial).
(Specific to Peras) Honest nodes that haven't voted for a block in two consecutive rounds must not vote again for at least R rounds (i.e., they must effectively enter a cooldown period and stay there until the end).
Probably many more ...

I think what makes these interesting is that they usually admit a much simpler testing strategy (maybe even unit tests). They often also have the side effect of being simpler to debug when something goes wrong.

Maybe it makes sense to mention that (and I might be very wrong here) the goal of this project is to provide an effective testing strategy to be used after all these simple smoke tests have passed?

0 replies

agustinmista · 2025-11-07T10:03:19Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 51 to 55 in 72c4c13

    
           Whilst the point schedule currently is implemented inside the Haskell node, its 
        
           declarative nature makes it possible to export this testing method and make it 
        
           usable across diverse node implementations. To ensure this, we will look only 
        
           at the messages sent over the network, to ensure we are performing black-box 
        
           testing.

If I understand correctly, the abstract consensus protocol is relatively generic. In particular, I'm wondering if one could potentially instantiate it to use a different node to node communication abstraction other than the network stack (e.g., IPC via message queues).

In other words, is "we will look only at the messages sent over the network" a constraint derived from the SUT or a design choice in the conformance test suite?

1 reply

ninioArtillero Nov 7, 2025
Collaborator

FTR, it would seem to be both: we want to minimize the implementation burden on client nodes, so in this sense is a constraint we derive from (in benefit of) the SUT. On the other hand, it is also a design choice; but admittedly, one whose limitations we still need to weight against the complete testing scope (to be defined, but including most of the existing tests).

agustinmista · 2025-11-07T10:20:52Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 68 to 70 in 72c4c13

    
           2. **Reusablity:** whenever possible, we should reuse existing machinery. 
        
              If direct reuse is not possible, we will surgically modify existing code to 
        
              support our new use cases.

I think the consensus implementation makes a good job at providing test-friendly implementations. One notable quirk is the pervasive use of explicit record datatypes to implement interfaces. This is rather boring Haskell-wise, but makes testing implementations against mocked relatively easy.

0 replies

agustinmista · 2025-11-07T10:30:02Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 71 to 73 in 72c4c13

    
           3. **Congruence:** when given the same inputs, the system will return the same 
        
              output (subject to the robustness of the NUT.) One particularly salient 
        
              corollary of this is that the system must be *stateless.*

Is this assuming the SUT not having any non-deterministic internal choice?

Or perhaps are you planning to control its sources of non-determinism (rng, scheduler, etc.) somehow? Something along the lines of what Antithesis* does.

[*] I think this is already being somewhat covered by io-sim and the IOLike interface, but that's Haskell-specific.

0 replies

agustinmista · 2025-11-07T10:51:32Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 193 to 195 in 72c4c13

    
           These operations provide the primitives needed to orchestrate a QuickCheck-like 
        
           workflow. For example, users are free to run the entire test suite by looping 
        
           over `testgen list-classes`.

One detail that would be missing from such a loop is the bias on test size. By default, QuickCheck tries smaller inputs first and progressively increases* the generation size over time. This is generally seen as a good thing, as small tests are often faster to run, and can be shrunk in fewer steps.

Perhaps this is something your generic interface could also expose, either by embedding the notion of size in the order of the result of list-classes, or by making it an explicit input and passing it down to the underlying generator.

This could allow users to write something with the same size bias like:

test_classes = testgen list-classes
max_size = 10

for size in range(1, max_size):
  for test_class in test_classes:
    test = generate test_class --size=size
    result = run test
    ...
  end
end

[*] this is not entirely true, terms and conditions apply.

0 replies

agustinmista · 2025-11-07T10:59:47Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 242 to 244 in 72c4c13

    
           * Do we need a separate peer to act as our state observer? Maybe not, but it's 
        
             conceptually clearer to have a peer whose sole job is to collect data. As 
        
             a counterexample, what happens when the peer schedule is empty?

Wouldn't having an isolated observer node severely limit what one could observe from the NUT?

I imagine there being other observable effects in the NUT that could be useful to define properties around, e.g., certain error codes in logs, number of open file handles, or even memory utilization (esp. its derivative), etc.

These are probably not needed to validate the correctness of a consensus implementation, but one could also argue that any sensible implementation should also fit within certain margins to not destabilize its peers (e.g. by not being able to react to stimuli quickly enough).

Maybe I'm trying to look too far ahead into the fog. Feel free to disregard 😂

0 replies

agustinmista · 2025-11-07T11:14:04Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 301 to 302 in 72c4c13

    
           As a possible exception, we will assume that we can run `cardano-node` over 
        
           `TestBlock`s. This might require a fork.

I would try to mention the tradeoffs of this somewhere.

IIUC:

not being able to validate some properties around cryptographic verification
possible not representative of a real node w.r.t. memory footprint, latency, etc.

0 replies

agustinmista · 2025-11-07T11:30:56Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 411 to 413 in 72c4c13

    
           As a stretch goal: it would be very rewarding to actually find a test failure in 
        
           another implementation. Care must be taken to ensure that this is, however, a 
        
           failure in the node, and not some artifact of our testing procedure.

This would be excellent, but it can't be the only evidence in favor of your new testing strategy. Especially not at this late stage in the project.

Have you considered injecting bugs into Cardano's consensus algorithm and making sure the tool finds them before moving to other implementations? Note that this can bite you in the ass if used too heavily as feedback when developing the tool (because it's both very tempting to bias the generation strategy towards finding the bugs you can come up with, and to forget that other kinds of bugs exist but you just haven't thought about them yet), but it would be an early indication of whether the tool can find bugs at all.

0 replies

agustinmista · 2025-11-07T11:45:25Z

agustinmista
Nov 7, 2025
Author

cardano-conformance-testing-of-consensus/docs/design.md

Lines 444 to 445 in 72c4c13

    
           Do we need to simulate time? This might be related to configuration access to 
        
           node timeouts (as Network delays would be irrelevant in this setting).

This is nugget I learnt from people working at Antithesis: you can do all kinds of fancy shenanigans by intercepting* the time(), sleep(), alarm(), etc. syscalls of an arbitrary binary. This allows them to run tests at 2x or 0.5x speed, or even vary the pace of time over time.

Not sure this is helpful in any way, but the idea of doing something like that sounds fun as hell 😈

[*] via something like ptrace or LD_PRELOAD (if the binary is dynamically loaded and calls these syscalls via their libc's wrappers).

0 replies

Comments about design document #34

Uh oh!

agustinmista Nov 7, 2025

Replies: 9 comments · 1 reply

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

ninioArtillero Nov 7, 2025 Collaborator

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

Uh oh!

agustinmista Nov 7, 2025 Author

Uh oh!

Uh oh!

agustinmista Nov 7, 2025 Author

agustinmista
Nov 7, 2025

Replies: 9 comments 1 reply

agustinmista
Nov 7, 2025
Author

agustinmista
Nov 7, 2025
Author

ninioArtillero Nov 7, 2025
Collaborator

agustinmista
Nov 7, 2025
Author

agustinmista
Nov 7, 2025
Author

agustinmista
Nov 7, 2025
Author

agustinmista
Nov 7, 2025
Author

agustinmista
Nov 7, 2025
Author

agustinmista
Nov 7, 2025
Author

agustinmista
Nov 7, 2025
Author