This is a joint DataSets and Parquet issue — at the root of issue #59 is really that Parquet.jl is currently entirely file-based (cf JuliaIO/Parquet.jl#145). I would love it to have a more seamless (and efficient!) way to work with Parquet files directly by streaming them and/or only grabbing the parts I need.