Hacker News new | past | comments | ask | show | jobs | submit login

I'm not an expert in the nuts and bolts of Arrow, but I think you have two options:

- Save to feather format. Feather format is essentially the same thing as the Arrow in-memory format. This is uncompressed and so if you have super fast IO, it'll read back to memory faster, or at least, with minimal CPU usage.

- Save to compressed parquet format. Because you're often IO bound, not CPU bound, this may read back to memory faster, at the expense of the CPU usage of decompressing.

On a modern machine with a fast SSD, I'm not sure which would be faster. If you're saving to remote blob storage e.g. S3, parquet will almost certainly be faster.

See also https://news.ycombinator.com/item?id=34324649




Thanks! Exactly what I was looking for. I'll do some benchmarking of these two options for my workload.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: