Skip to content

Missed optimization: redundant calls to test around jumps/moves #169437

@tgross35

Description

@tgross35

Input:

define { i1, i8 } @demo(ptr noalias noundef nonnull readonly align 1 captures(none) %s.0, i64 noundef %s.1) unnamed_addr {
start:
  %_23 = icmp samesign ne i64 %s.1, 0
  br i1 %_23, label %bb3, label %bb1

bb3:
  %v = load i8, ptr %s.0, align 1
  br label %bb1

bb1:
  %_0.sroa.3.0 = phi i8 [ %v, %bb3 ], [ undef, %start ]
  %0 = insertvalue { i1, i8 } poison, i1 %_23, 0
  %1 = insertvalue { i1, i8 } %0, i8 %_0.sroa.3.0, 1
  ret { i1, i8 } %1
}

This creates the following output on x86:

demo2:
        test    rsi, rsi
        je      .LBB0_1
        movzx   edx, byte ptr [rdi]
        test    rsi, rsi
        setne   al
        ret
.LBB0_1:
        test    rsi, rsi
        setne   al
        ret

Note the possible control flows

  • test rsi, rsi -> je .LBB0_1 (false) -> test rsi, rsi, and
  • test rsi, rsi -> je .LBB0_1 (true) -> movzx edx, byte ptr [rdi] -> test rsi, rsi

Since je and movzx do not affect flags, the second test in each flow is redundant and the setne could be unconditional.

Aarch64 has something similar: Both cmp x1, #0s could be removed and cset w0, ne changed to unconditional since the cbz x1 guards x1's zero/nonzero status for that branch.

demo2:
        cbz     x1, .LBB0_2
        ldrb    w8, [x0]
        cmp     x1, #0
        mov     w1, w8
        cset    w0, ne
        ret
.LBB0_2:
        cmp     x1, #0
        mov     w1, w8
        cset    w0, ne
        ret

(Also, is it actually performing a store of poison with that mov w1, w8 in LBB0_2?)

https://llvm.godbolt.org/z/cdxK7qfGz

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions