Skip to content

Excessive allocations in multithreaded Julia 1.12.0 and later if ChainRulesCore loaded #60447

@jaakkor2

Description

@jaakkor2

I see bad performance (up to 4x worse on Linux with 64 threads) due to excessive allocations after loading ChainRulesCore on Julia 1.12.0 and later

  1.777681 seconds (10.79 k allocations: 767.009 MiB, 0.40% gc time, 11 lock conflicts)

vs

 3.181472 seconds (495.00 M allocations: 9.615 GiB, 18.66% gc time, 15 lock conflicts)
  • Windows and Linux
  • Julia 1.10.10 ok, 1.11.8 ok, 1.12.0 and later not ok
  • requires -t auto
  • happens after using ChainRulesCore
PS C:\temp> julia +1.12-nightly --project=debug --startup-file=no --threads auto -- foo.jl
Julia Version 1.12.3
Commit b32276809d (2025-12-20 23:30 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 13th Gen Intel(R) Core(TM) i7-1370P
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, alderlake)
  GC: Built with stock GC
Threads: 20 default, 1 interactive, 20 GC (on 20 virtual cores)
Environment:
  JULIA_DEPOT_PATH = C:\temp\depo4
   Resolving package versions...
     Project No packages added to or removed from `C:\temp\debug\Project.toml`
    Manifest No packages added to or removed from `C:\temp\debug\Manifest.toml`
Status `C:\temp\debug\Project.toml`
  [336ed68f] CSV v0.10.15
  [d360d2e6] ChainRulesCore v1.26.0
  [a93c6f00] DataFrames v1.8.1

  2.558131 seconds (3.78 M allocations: 941.129 MiB, 3.18% gc time, 13 lock conflicts, 392.11% compilation time: 37% of which was recompilation)
  1.777681 seconds (10.79 k allocations: 767.009 MiB, 0.40% gc time, 11 lock conflicts)
  1.927009 seconds (10.79 k allocations: 767.009 MiB, 3.63% gc time, 9 lock conflicts)
  1.828774 seconds (10.79 k allocations: 767.009 MiB, 0.81% gc time, 15 lock conflicts)

  3.526123 seconds (497.47 M allocations: 9.734 GiB, 14.51% gc time, 10 lock conflicts, 202.13% compilation time: 99% of which was recompilation)
  3.181472 seconds (495.00 M allocations: 9.615 GiB, 18.66% gc time, 15 lock conflicts)
  3.251814 seconds (495.00 M allocations: 9.615 GiB, 18.53% gc time, 13 lock conflicts)
  3.301931 seconds (495.00 M allocations: 9.615 GiB, 18.35% gc time, 3 lock conflicts)

foo.jl

using InteractiveUtils
versioninfo()
using Pkg
Pkg.add(["CSV", "ChainRulesCore"])
Pkg.status()

# write a test file
open("foo.txt", "w") do io
    for i in 1:100_000_000
        println(io, "foo")
    end
end

using CSV
println()
for i in 1:4
    @time df = CSV.read("foo.txt", NamedTuple);
end

using ChainRulesCore
println()
for i in 1:4
    @time df = CSV.read("foo.txt", NamedTuple);
end

ChainRulesCore is pulled in by many popular packages

  • GLMakie → Makie → KernelDensity → Interpolations → ChainRulesCore
  • NonlinearSolve → LinearSolve → ChainRulesCore
  • OrdinaryDiffEqDefault → LinearSolve → ChainRulesCore

Edit: removed DataFrames.jl from the example, it is enough to use NamedTuple as the sink.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions