New project alert! Comparqter, a tool that compacts Parquet files and optimises file sizes.

New project alert! Comparqter, a tool that compacts Parquet files and optimises file sizes.
"Centralize Your Data Lake: Apache Polaris Supports Apache Iceberg and Now Delta Lake"
BTW 'Polaris' used to be the name of the UK nuclear deterrent pre 1996.
https://snowflake.com/en/engineering-blog/apache-polaris-supports-iceberg-delta-lake/
First I thought I'd found the Loch Ness Monster...turns out to be Nessie instead.
Project Nessie: Transactional Catalog for Data Lakes with Git-like semantics
"Nessie is to Data Lakes what Git is to source code repositories..."
Ah, the $10/month Lakehouses: because who wouldn't want a bargain-basement data lake with all the charm of a timeshare in purgatory? Just add a sprinkle of buzzwords like "DuckLake" and "time travel" and voilà, you've got a tech article that feels like a 2-hour #infomercial for something you'll never use.
https://tobilg.com/the-age-of-10-dollar-a-month-lakehouses #Lakehouses #DuckLake #DataLake #TechTrends #HackerNews #ngated
Apache Iceberg Deep Dive | Part 1 | Crash Course
Lakehouse #iceberg #Apache_Iceberg #datalake #data ... source
https://quadexcel.com/wp/apache-iceberg-deep-dive-part-1-crash-course/
#TBT... to an entire week ago at #RSAC where Seth Goldhammer had the chance to demo Graylog's data telemetry pipeline management!
Join Seth as he talks about data lakes, data lake previews, getting your data back when you need it, and more.
Wanna learn more about this topic? Here you go: https://graylog.org/post/security-data-lake-strategy/ #RSA #RSAC2025 #datalake #datamanagement #datapipeline
Shifting Left isn’t just a buzzword - it’s the foundation for efficiency in your organization!
By making clean, reliable, and accessible data available across your organization, you reduce complexity and unlock time to focus on higher-value work.
Data products are the foundation of this #ShiftLeft, enabling healthy, scalable data communication.
Dive into the details in the #InfoQ article: https://bit.ly/3WHjxsf
Attended an event Brewing Data with Snowflake yesterday in Vilnius
Some of they key insights:
Full text of one of the slides presented:
Strategic Architecture Outlook
- Agility & Future-Proofing - Open, portable data means you can adopt new technologies or switch platforms with minimal friction. No single vendor can hold your data hostage, so you can evolve vour architecture as needed.
- Multi-Cloud and Hybrid - An open data layer can span clouds and on-prem seamlessly. You avoid cloud vendor lock-in and leverage best-of-breed services on different clouds using the same data. This flexibility is key for resilience and optimization.
- Accelerating Innovation - When any team can access data with the tools of their choice, experimentation flourishes. Open data fosters Al/ML and cross-domain analytics since data isn't locked in silos - more innovation and insights from the same data.
- Vendor Leverage - Strategically, using open standards increases your leverage in vendor negotiations. You car opt in or out of services more freely, pushing vendors to provide value (since you're not irreversibly locked to them).
A Data Lake in the software world is essentially where raw data is taken and turned into something tangible like reports, often using AI/machine learning and them put into the Data Warehouse. #software #datalake #datawarehouse
Demo: SAP Business Data Cloud | SAP Business Unleashed https://youtu.be/OkwQimWDeos?si=UNGdcAVobyMNCkUm via @YouTube
(and find related Videos in the SAP channel - see below)
#SAP #SAPBDC #GenAI #LLM #DataCloud #DataLake #SAPChampions #SAPBW #SAPDatasphere @sap
There is no need to move data. Data latency is minimised. Data can be transformed and analysed within a single platform.
Let me know what you know about Zero-ETL
Why ETL-Zero? Understanding the shift in Data Integration“ by Sarah Lea on Medium: https://medium.com/towards-data-science/why-etl-zero-understanding-the-shift-in-data-integration-as-a-beginner-d0cefa244154
A #ShiftLeft approach to #DataProcessing relies on data products, which form the basis of data communication across the business.
This addresses many flaws in traditional data processing and makes data more relevant, complete, and trustworthy.
#InfoQ article: https://bit.ly/3WHjxsf
The house at the lake, Teil 3 - The Dashboard Diaries: https://blog.sogeo.services/blog/2025/01/26/house-at-the-lake-03.html #Trino #SQL #datalake #datalakehouse #lakehouse #duckdb #apacheiceberg
#ApacheHudi 1.0 is now generally available!
The release introduces new features aimed at transforming data lakehouses into what the project community considers a fully-fledged "Data Lakehouse Management System" (DLMS).
Details on #InfoQ https://bit.ly/3E5AXZi
All in one.
Massively scalable, software defined storage (#SDS) for modern workloads with support for file, block and object based applications: https://ibm.com/products/ceph
#IBM #RedHat
#IBMStorage
#IBMStorageCeph #DataLake
#IBMtechnology #technology
#IBMStorageRocks
The house at the lake, Teil 2 - Start your engines: https://blog.sogeo.services/blog/2025/01/12/house-at-the-lake-02.html #Spark #ApacheIceberg #SQL #Datalake #Lakehouse #DuckDB
One of the most highlighted parts: "There is no need to move data. Data latency is minimised. Data can be transformed and analysed within a single platform.“
This is one of the reasons for 'Why ETL-Zero'
It's december and that means lighting talk time for our user group! Join us online for some short, powerful insights from both known and new speakers!
Follow the link to sign up and see you next tuesday.
#Meetup
#Community
#LightningTalk
#Microsoft
#DataPlatform
#AI
#DataLake
#Azure
#PowerBI
#SQLServer
#Clarity
#DataDriven
https://www.meetup.com/groningen-microsoft-data-meetup-groep/events/298279290/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link