Start of main content

Create a git-like experience for Data Lake analytics

Day 4


The first problem faced with big data was the feasibility of processing data at such a high scale. In solving the scale problem, people developed technologies we know today like Kafka, Spark, Presto, Snowflake, etc.

Now the problem people face is one of manageability. They no longer ask if they can handle a dataset but rather: how can I move faster when developing data-intensive applications? How do I utilize all of my data and ensure it is high-quality?

Learn how lakeFS simplifies the management of a Data Lake by enabling git-like operations over files in object storage. See how common processes like experimentation, reproducing data and ensuring data quality are simplified with workflows centered around branching, committing, and the merging of data. Finally, we'll explain how the innovative graveler data model used by lakeFS makes these operations not only possible but efficient at any scale.

  • #datavirtualisation
  • #tooling


Invited experts