Can the decoding be sped up where multiple segments are available (i.e. samples/px > 1 or bytes/px > 1) using concurrency?
There may also be gains to be made by offering up a multi-frame decoder in addition to the single frame decoder currently available. Have to watch the memory use, though. And add support to pylibjpeg/pydicom.