Skip to content

Conversation

@zachs18
Copy link
Contributor

@zachs18 zachs18 commented Nov 9, 2025

These are probably not super useful optimizations, but they make it so that vec![expr; LARGE_LENGTH] has better performance for some exprs, e.g.

  • array of length zero in debug mode
  • tuple containing () and zero-valued integers in debug and release mode
  • array of () or other zero-sized IsZero type in debug mode
very rough benchmarks
use std::time::Instant;
use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};

struct NonCopyZst;
static COUNTER: AtomicUsize = AtomicUsize::new(0);

impl Clone for NonCopyZst {
    fn clone(&self) -> Self {
        COUNTER.fetch_add(1, Relaxed);
        Self
    }
}


macro_rules! timeit {
    ($e:expr) => {
        let start = Instant::now();
        _ = $e;
        println!("{:56}: {:?}", stringify!($e), start.elapsed());
    };
}

fn main() {
    timeit!(vec![[String::from("hello"); 0]; 1_000_000_000]); // gets a lot better in debug mode
    timeit!(vec![(0u8, (), 0u16); 1_000_000_000]); // gets a lot better in debug *and* release mode
    timeit!(vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]); // gets a lot better in debug mode
    timeit!(vec![[NonCopyZst; 0]; 1_000_000_000]); // gets a lot better in debug mode
    timeit!(vec![[[1u8; 0]; 1_000_000]; 1_000_000]); // gets a little bit better in debug mode
    timeit!(vec![[[(); 37]; 1_000_000]; 1_000_000]); // gets a little bit better in debug mode
    timeit!(vec![[[1u128; 0]; 1_000_000]; 1_000_000]); // gets a little bit better in debug mode

    // check that we don't regress existing optimizations
    timeit!(vec![(0u8, 0u16); 1_000_000_000]); // about the same time
    timeit!(vec![0u32; 1_000_000_000]); // about the same time

    // check that we still call clone for non-IsZero ZSTs
    timeit!(vec![[const { NonCopyZst }; 2]; 1_000]); // about the same time
    assert_eq!(COUNTER.load(Relaxed), 1998);
    timeit!(vec![NonCopyZst; 10_000]); // about the same time
    assert_eq!(COUNTER.load(Relaxed), 1998 + 9_999);
}
$ cargo +nightly run
// ...
vec![[String::from("hello"); 0]; 1_000_000_000]         : 11.13999724s
vec![(0u8, (), 0u16); 1_000_000_000]                    : 5.254646651s
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 2.738062531s
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 9.483690922s
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 2.919236ms
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 2.927755ms
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 2.931486ms
vec![(0u8, 0u16); 1_000_000_000]                        : 19.46µs
vec![0u32; 1_000_000_000]                               : 9.34µs
vec![[const { NonCopyZst }; 2]; 1_000]                  : 31.88µs
vec![NonCopyZst; 10_000]                                : 36.519µs
$ cargo +dev run
// ...
vec![[String::from("hello"); 0]; 1_000_000_000]         : 4.12µs
vec![(0u8, (), 0u16); 1_000_000_000]                    : 16.299µs
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 210ns
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 210ns
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 170ns
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 110ns
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 140ns
vec![(0u8, 0u16); 1_000_000_000]                        : 11.56µs
vec![0u32; 1_000_000_000]                               : 10.71µs
vec![[const { NonCopyZst }; 2]; 1_000]                  : 36.08µs
vec![NonCopyZst; 10_000]                                : 73.21µs

(checking release mode to make sure this doesn't regress perf there)

$ cargo +nightly run --release
// ...
vec![[String::from("hello"); 0]; 1_000_000_000]         : 70ns
vec![(0u8, (), 0u16); 1_000_000_000]                    : 1.269457501s
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 10ns
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 20ns
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 10ns
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 20ns
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 20ns
vec![(0u8, 0u16); 1_000_000_000]                        : 20ns
vec![0u32; 1_000_000_000]                               : 20ns
vec![[const { NonCopyZst }; 2]; 1_000]                  : 2.66µs
vec![NonCopyZst; 10_000]                                : 13.39µs
$ cargo +dev run --release
vec![[String::from("hello"); 0]; 1_000_000_000]         : 90ns
vec![(0u8, (), 0u16); 1_000_000_000]                    : 30ns
vec![[[(); 37]; 1_000_000_000]; 1_000_000_000]          : 20ns
vec![[NonCopyZst; 0]; 1_000_000_000]                    : 30ns
vec![[[1u8; 0]; 1_000_000]; 1_000_000]                  : 20ns
vec![[[(); 37]; 1_000_000]; 1_000_000]                  : 20ns
vec![[[1u128; 0]; 1_000_000]; 1_000_000]                : 20ns
vec![(0u8, 0u16); 1_000_000_000]                        : 30ns
vec![0u32; 1_000_000_000]                               : 20ns
vec![[const { NonCopyZst }; 2]; 1_000]                  : 3.52µs
vec![NonCopyZst; 10_000]                                : 17.13µs

The specific expression I ran into a perf issue that this PR addresses is vec![[(); LARGE]; LARGE], as I was trying to demonstrate Vec::into_flattened panicking on length overflow in the playground, but got a timeout error instead since vec![[(); LARGE]; LARGE] took so long to run in debug mode (it runs fine on the playground in release mode)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 9, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 9, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Comment on lines 66 to 68
// We could probably just return `true` here, since implementing
// `IsZero` for a zero-sized type such that `self.is_zero()` returns
// `false` would be useless, but to be safe we check anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too at first, but that would conflict with the above implementation – any [T; N] now implements IsZero, so e.g. [NonTrivialCloneButZST; 5] would hit this code path but mustn't be zero-initialised.

Copy link
Contributor Author

@zachs18 zachs18 Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the specialization where T: IsZero, so this would not be run for T = NonTrivialCloneButZST, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean, if T = [NonTrivialCloneButZST; 5], it would be wrong to return true unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to explain why we can't just return true unconditionally.

@rustbot ready

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 9, 2025
Implement IsZero for ().

Implement default `IsZero` for all arrays, only returning true if the array is empty
(making the existing array impl for `IsZero` elements a specialization).

Optimize `IsZero::is_zero` for arrays of zero-sized `IsZero` elements.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants