Commit 54fd2eb
authored
fix: Allow split page logic to process files concurrently (#175)
# The Issue
`split_pdf_hook.py` does not support multiple concurrent files. This is
because we store the split request tasks in
`self.coroutines_to_execute[operation_id]`, where `operation_id` is just
the string "partition". Therefore, if we send two concurrent docs using
the same SDK, they'll both try to await the same list of coroutines.
This could result in interleaved results, but mostly it breaks with
`RuntimeError: coroutine is being awaited already`, as the second
request gets ready to await its requests. This will block anyone trying
to use the new `partition_async` to fan out their pdfs.
Note that the js/ts client also has this issue.
# The fix
We need to use an actual id to index into `coroutines_to_execute`. In
`before_request`, let's make a uuid and build up the list of coroutines
for this doc. We need to pass this id to `after_success` in order to
retrieve the results, so we can set it as a header on our "dummy"
request that's returned to the SDK.
# Testing
See the new integration test. We can verify this by sending two docs
serially, and then with `asyncio.gather`, and confirm that the results
are the same.1 parent 4ac3b2d commit 54fd2eb
File tree
3 files changed
+91
-8
lines changed- _test_unstructured_client/integration
- src/unstructured_client/_hooks/custom
3 files changed
+91
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
Lines changed: 65 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| |||
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
128 | 193 | | |
129 | 194 | | |
130 | 195 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
198 | 199 | | |
199 | 200 | | |
200 | 201 | | |
201 | | - | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
202 | 208 | | |
203 | 209 | | |
204 | 210 | | |
| |||
329 | 335 | | |
330 | 336 | | |
331 | 337 | | |
| 338 | + | |
332 | 339 | | |
333 | 340 | | |
| 341 | + | |
| 342 | + | |
334 | 343 | | |
335 | 344 | | |
336 | 345 | | |
337 | 346 | | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
342 | 352 | | |
343 | 353 | | |
344 | 354 | | |
| |||
407 | 417 | | |
408 | 418 | | |
409 | 419 | | |
410 | | - | |
411 | | - | |
412 | | - | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
413 | 423 | | |
414 | 424 | | |
415 | 425 | | |
| |||
0 commit comments