Reynold Xin

Summit 2021 Keynotes: Data Science and Machine Learning

May 27, 2021 08:30 AM PT

The pursuit of AI is one of the biggest priorities in data today. The Thursday morning keynote will be led by Databricks Cofounder and CEO Ali Ghodsi and cover advances in data science, machine learning, MLOps and more in both open source and the Databricks Lakehouse Platform.

We’ll also be joined by data leaders from McDonalds and Microsoft, as well as the legendary Bill Nye, a scientist, engineer, comedian and author.

Summit 2021 Keynotes: Lakehouse Data Architecture, Data Engineering, and Analytics

May 26, 2021 08:00 AM PT

Join the Wednesday morning keynote to hear from Databricks co-founders and original creators of popular projects Apache Spark, Delta Lake, and MLflow on how the open source community is tackling the biggest challenges in data.

Stay tuned for them to reveal some of the latest innovations in data engineering and data analytics to simplify and scale your work.

Wednesday Morning Keynote

November 17, 2020 04:00 PM PT

Welcome from Ali Ghodsi, Databricks

Project Zen: Making Spark Pythonic

Reynold Xin
Co-founder & Chief Architect, Databricks

In this keynote from Reynold Xin, the top contributor to Apache Spark and PMC member, we will review the state of the project and highlight major community developments in the 10th anniversary release and beyond. Reynold will review how the recent Spark 3.0 release focused on making it easier to use, faster, and more ANSI standard compliant. With Python representing nearly 70% of notebook commands, he’ll focus on the development of Project Zen - the community effort to make Spark more Pythonic. This includes improvements in development tooling, API design, error handling and more, to make data scientists and engineers more productive with data.

Demo:Pythonic Spark with Real Koalas

Caryl Yuhas
Sr. Manager, Field Engineering, Databricks

The Rise of the Lakehouse

Ali Ghodsi
Co-founder & CEO
Original Creator of Apache Spark, Databricks

Data warehouses have a long history in decision support and business intelligence applications. But, data warehouses were not well suited to dealing with the unstructured, semi-structured, and streaming data common in modern enterprises. This led to organizations building data lakes of raw data about a decade ago. But, they also lacked important capabilities. The need for a better solution has given rise to lakehouse architecture, which implements similar data structures and data management features to those in a data warehouse, directly on the kind of low cost storage used for data lakes.

This keynote by Databricks CEO, Ali Ghodsi, explains how the open source Delta Lake project allows the industry to realize the full potential of lakehouse architecture. Additionally, Ali will discuss the newly announced SQL Analytics service that allows users to run traditional analytics on their data lake, instead of moving data out to data warehouses, without sacrificing performance, security, or quality. This service completes the vision of lakehouse architecture to allow the data lake to be a single source of truth of all data workloads.

Discussion with Tableau Software

Francois Ajenstat
Chief Product Officer, Tableau Software

Demo: SQL Analytics and the Lakehouse Architecture

Brooke Wenig,
Machine Learning Practice Lead, Databricks

How SQL Analytics Makes Lakehouse Fast

Reynold Xin
Co-founder & Chief Architect, Databricks

In this keynote, Reynold Xin, Co-founder and Chief Architect at Databricks, will explore how SQL Analytics brings a new level of performance to data lakes for analytics workloads. Traditionally, data lakes have struggled with analytics, because they struggle to deliver the fast query performance wiht low latency at high user concurrency. Reynold will provide a techical deep dive of how Databricks has addresssed these challenges. First, Delta Engine, Databricks' polymorphic vectorized execution engine, delivers extremely fast single query throughput. Second, the new auto-scaling SQL-optimized clusters in SQL Analytics make it easy to match compute capacity to user load. And third, optimizations in the new SQL Analytics Endpoints reduce the time required to get query results by up to 6x. Altogether, SQL Analytics is able to provide users with data warehousing performance at data lake economics for their analytics workloads.

Discussion with Peter Boncz

Professor, CWI & Vrije Universiteit Amsterdam

Discussion with Unilever

Phinean Woodward
Head of Architecture, Information and Analytics, Unilever

In this talk, we’ll discuss how the Lakehouse architecture has become a critical part of Unilever’s information management infrastructure to limit traditional enterprise data silos, and enable agile access to data both up and downstream that’s needed for faster decision making. As a result, IT is helping Unilever to deliver higher quality predictions in many areas of the business, thereby building trust in AI throughout the company.

Why Data Should Drive the Next Pandemic Response

Malcolm Gladwell
Best-selling author, journalist, and podcast host

Imagine what a data-driven response to the Covid-19 pandemic would have looked like — if we could set aside politics and ego. Award-winning author and journalist Malcolm Gladwell discusses the lessons we can learn from the current crisis, and how data and data teams will be critical in solving the world’s toughest problems – including future pandemic outbreaks. He also reveals the essential role that data teams play in his own work every day.

Close

Ali Ghodsi

Summit 2020 Spark + AI Summit 2020: Wednesday Morning Keynotes

June 23, 2020 05:00 PM PT

Ali Ghodsi - Intro to Lakehouse, Delta Lake (Databricks) - 46:40
Matei Zaharia - Spark 3.0, Koalas 1.0 (Databricks) - 17:03
Brooke Wenig - DEMO: Koalas 1.0, Spark 3.0 (Databricks) - 35:46
Reynold Xin - Introducing Delta Engine (Databricks) - 1:01:50
Arik Fraimovich - Redash Overview & DEMO (Databricks) - 1:27:25
Vish Subramanian - Brewing Data at Scale (Starbucks) - 1:39:50

Realizing the Vision of the Data Lakehouse
Ali Ghodsi

Data warehouses have a long history in decision support and business intelligence applications. But, data warehouses were not well suited to dealing with the unstructured, semi-structured, and streaming data common in modern enterprises. This led to organizations building data lakes of raw data about a decade ago. But, they also lacked important capabilities. The need for a better solution has given rise to the data lakehouse, which implements similar data structures and data management features to those in a data warehouse, directly on the kind of low cost storage used for data lakes.

This keynote by Databricks CEO, Ali Ghodsi, explains why the open source Delta Lake project takes the industry closer to realizing the full potential of the data lakehouse, including new capabilities within the Databricks Unified Data Analytics platform to significantly accelerate performance. In addition, Ali will announce new open source capabilities to collaboratively run SQL queries against your data lake, build live dashboards, and alert on important changes to make it easier for all data teams to analyze and understand their data.

Introducing Apache Spark 3.0:
A retrospective of the Last 10 Years, and a Look Forward to the Next 10 Years to Come.
Matei Zaharia and Brooke Wenig

In this keynote from Matei Zaharia, the original creator of Apache Spark, we will highlight major community developments with the release of Apache Spark 3.0 to make Spark easier to use, faster, and compatible with more data sources and runtime environments. Apache Spark 3.0 continues the project’s original goal to make data processing more accessible through major improvements to the SQL and Python APIs and automatic tuning and optimization features to minimize manual configuration. This year is also the 10-year anniversary of Spark’s initial open source release, and we’ll reflect on how the project and its user base has grown, as well as how the ecosystem around Spark (e.g. Koalas, Delta Lake and visualization tools) is evolving to make large-scale data processing simpler and more powerful.

Delta Engine: High Performance Query Engine for Delta Lake
Reynold Xin

How Starbucks is Achieving its 'Enterprise Data Mission' to Enable Data and ML at Scale and Provide World-Class Customer Experiences
Vish Subramanian

Starbucks makes sure that everything we do is through the lens of humanity – from our commitment to the highest quality coffee in the world, to the way we engage with our customers and communities to do business responsibly. A key aspect to ensuring those world-class customer experiences is data. This talk highlights the Enterprise Data Analytics mission at Starbucks that helps making decisions powered by data at tremendous scale. This includes everything ranging from processing data at petabyte scale with governed processes, deploying platforms at the speed-of-business and enabling ML across the enterprise. This session will detail how Starbucks has built world-class Enterprise data platforms to drive world-class customer experiences.

Summit 2019 Official Announcement of Koalas Open Source Project

April 23, 2019 05:00 PM PT

Keynote from Spark + AI Summit 2019: Reynold Xin, Databricks, Brooke Wenig, Databricks

Summit Europe 2018 Unifying State-of-the-Art AI and Big Data in Apache Spark

December 10, 2021 06:06 PM PT

Summit 2018 Project Hydrogen: Unifying State-of-the-art AI and Big Data in Apache Spark

June 5, 2018 05:00 PM PT

Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models AI has always been on of the most exciting applications of big data and Apache Spark. Increasingly Spark users want to integrate Spark with distributed deep learning and machine learning frameworks built for state-of-the-art training. This talk introduces a new project that substantially improves the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark.

Summit 2015 From DataFrames to Tungsten: A Peek into Spark’s Future

June 15, 2015 05:00 PM PT

Summit Europe 2015 A Look Ahead at Spark’s Development

October 28, 2015 05:00 PM PT

Summit East 2016 The Future of Real-Time in Spark

February 17, 2016 04:00 PM PT

Co-founder & Chief Architect, Databricks

Past sessions