Commit 2aa088d
authored
Thrust β CUDA β PTX β SASS ποΈββοΈποΈββοΈ (#25)
* Add: Thrust, CUB, CUDA sorting
This is a draft. It still lacks manual
timing and async scheduling.
* Add: Thrust, CUB, CUDA sorting
This is a draft. It still lacks manual
timing and async scheduling.
* Make: Options for CUDA & TBB in CMake
* Make: Switch to CUDA Toolkit for GPU libs
* Fix: Ranges require `constexpr` on NVCC
* Make: Upgrade `fmt` for NVCC builds
fmtlib/fmt#4297
* Fix: NVCC compilation issues
* Make: Silence NVCC warnings
* Add: Sorting with `thrust` and `cub`
* Add: PTX and `.cuh` kernels
* Make: Don't compile PTX
* Add: Using CUDA Driver API to JIT `.ptx`
* Add: Precompiled CUDA C++ kernels
* Add: cuBLAS benchmarks
* Fix: Compiling `cuBLAS` calls
* Fix: Avoid optimizing-out SASS code
Unless we put an impossible condition with
a `wmma::store_matrix_sync` the result of
fragment multiplication is optimized out.
* Add: Tensor Core intrinsic benchmarks
Targeting `f16`, `bf16`, `tf16`, `f32`, `f64`
on Volta, Turing, and Ampere.
* Make: Build CUDA for multiple platforms
Currently covering Volta, Turing, Ampere,
Ada Lovelace, and Hopper.
* Add: Binary BMMA kernels for GPU
XOR variant for Turing+.
AND variant for Ampere+.
* Docs: Introduce Warp-Group-MMA on Hopper
* Fix: Working PTX kernel
* Fix: Lower PTX version for JIT
* Fix: Use `f16` MMA
* Make: Drop OpenBLASFile tree
6 files changed
+1015
-80
lines changed- .vscode
6 files changed
+1015
-80
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
11 | 13 | | |
12 | 14 | | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
| 18 | + | |
16 | 19 | | |
17 | 20 | | |
18 | 21 | | |
| 22 | + | |
19 | 23 | | |
20 | 24 | | |
21 | 25 | | |
| 26 | + | |
22 | 27 | | |
| 28 | + | |
23 | 29 | | |
24 | 30 | | |
25 | 31 | | |
| |||
28 | 34 | | |
29 | 35 | | |
30 | 36 | | |
| 37 | + | |
31 | 38 | | |
32 | 39 | | |
33 | 40 | | |
34 | 41 | | |
35 | 42 | | |
| 43 | + | |
36 | 44 | | |
| 45 | + | |
37 | 46 | | |
| 47 | + | |
| 48 | + | |
38 | 49 | | |
39 | 50 | | |
40 | 51 | | |
41 | 52 | | |
42 | 53 | | |
43 | 54 | | |
44 | 55 | | |
| 56 | + | |
45 | 57 | | |
46 | 58 | | |
47 | 59 | | |
| |||
54 | 66 | | |
55 | 67 | | |
56 | 68 | | |
| 69 | + | |
| 70 | + | |
57 | 71 | | |
| 72 | + | |
58 | 73 | | |
59 | 74 | | |
60 | 75 | | |
61 | 76 | | |
62 | 77 | | |
63 | 78 | | |
| 79 | + | |
64 | 80 | | |
| 81 | + | |
65 | 82 | | |
66 | 83 | | |
| 84 | + | |
67 | 85 | | |
68 | 86 | | |
| 87 | + | |
69 | 88 | | |
| 89 | + | |
70 | 90 | | |
71 | 91 | | |
72 | 92 | | |
| |||
86 | 106 | | |
87 | 107 | | |
88 | 108 | | |
| 109 | + | |
89 | 110 | | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
90 | 114 | | |
91 | 115 | | |
92 | 116 | | |
| |||
97 | 121 | | |
98 | 122 | | |
99 | 123 | | |
| 124 | + | |
| 125 | + | |
100 | 126 | | |
| 127 | + | |
101 | 128 | | |
102 | 129 | | |
103 | 130 | | |
| |||
114 | 141 | | |
115 | 142 | | |
116 | 143 | | |
| 144 | + | |
117 | 145 | | |
| 146 | + | |
118 | 147 | | |
119 | 148 | | |
120 | 149 | | |
| 150 | + | |
121 | 151 | | |
122 | 152 | | |
| 153 | + | |
123 | 154 | | |
124 | 155 | | |
125 | 156 | | |
| |||
130 | 161 | | |
131 | 162 | | |
132 | 163 | | |
| 164 | + | |
133 | 165 | | |
134 | 166 | | |
135 | 167 | | |
| |||
141 | 173 | | |
142 | 174 | | |
143 | 175 | | |
| 176 | + | |
144 | 177 | | |
145 | 178 | | |
146 | 179 | | |
| 180 | + | |
147 | 181 | | |
148 | 182 | | |
149 | 183 | | |
150 | 184 | | |
151 | 185 | | |
| 186 | + | |
152 | 187 | | |
153 | 188 | | |
154 | 189 | | |
155 | 190 | | |
156 | 191 | | |
157 | 192 | | |
| 193 | + | |
158 | 194 | | |
159 | 195 | | |
160 | 196 | | |
161 | 197 | | |
162 | 198 | | |
163 | 199 | | |
| 200 | + | |
164 | 201 | | |
165 | 202 | | |
| 203 | + | |
166 | 204 | | |
167 | 205 | | |
| 206 | + | |
| 207 | + | |
168 | 208 | | |
169 | 209 | | |
170 | 210 | | |
| |||
176 | 216 | | |
177 | 217 | | |
178 | 218 | | |
| 219 | + | |
| 220 | + | |
179 | 221 | | |
180 | 222 | | |
181 | 223 | | |
182 | 224 | | |
183 | 225 | | |
184 | 226 | | |
| 227 | + | |
185 | 228 | | |
186 | 229 | | |
187 | 230 | | |
| |||
194 | 237 | | |
195 | 238 | | |
196 | 239 | | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
214 | 251 | | |
215 | 252 | | |
0 commit comments