There's lots of fancy file formats to choose from when building a #datalake but we still went with gzipped JSON. Why? Because we prioritize moving data into purpose-built systems rather than querying it directly. This basic shift in approach has made a world of difference! Here's a thing I wrote about that:
https://opendatascience.com/choosing-a-data-lake-format-what-to-actually-look-for/