Experience Spark + AI Summit all over again
or for the first time!
Access all of the videos and slides
from this year’s virtual conference

Agenda subject to change.
Download PDF

Monday, 7:00 am (PDT)

What’s New in Apache Spark 3.0

Hands on tutorials

This course covers the new features and changes introduced to Apache Spark and the surrounding ecosystem during the past 12 months. It focuses on Spark 2.4 and3.0, updates to performance, monitoring, usability, stability, extensibility, PySpark, SparkR, Delta Lakes, Pandas, and MLFlow. Students will also learn about backwards compatibility with 2.x and the considerations required for...

Wednesday, 8:30 am (PDT)

Spark + AI Summit 2020: Wednesday Morning Keynotes

Ali Ghodsi, Databricks | Matei Zaharia, Databricks | Brooke Wenig, Databricks | Reynold Xin, Databricks | Vishwanath Subramanian, Starbucks |

Keynote

Ali Ghodsi – Intro to Lakehouse, Delta Lake (Databricks) – 46:40 Matei Zaharia – Spark 3.0, Koalas 1.0 (Databricks) – 17:03 Brooke Wenig – DEMO: Koalas 1.0, Spark 3.0 (Databricks) – 35:46 Reynold Xin – Introducing Delta Engine (Databricks) – 1:01:50 Arik Fraimovich – Redash Overview & DEMO (Databricks) – 1:27:25 Vish Subramanian – Brewing...

Wednesday, 10:30 am (PDT)

Modernize Your Data Warehouse and Data Lake to Databricks Delta with Informatica

Rodrigo Sanchez Bredee, Informatica |

Data Engineering & Architecture

Modern enterprises depend on trusted data for AI, analytics, and data science to drive deeper insights and business value. With intelligent and automated data management, you can take advantage of Databricks Delta to gain efficiencies, cost savings, and scale to succeed. In this session we will discuss how customers can modernize their on-premises Data Lake...

Wednesday, 10:30 am (PDT)

The Importance of Model Fairness and Interpretability in AI Systems

Francesca Lazzeri, Microsoft |

Data Science, ML & Deep Learning

Machine learning model fairness and interpretability are critical for data scientists, researchers and developers to explain their models and understand the value and accuracy of their findings. Interpretability is also important to debug machine learning models and make informed decisions about how to improve them. In this session, Francesca will go over a few methods...

Wednesday, 10:30 am (PDT)

Unlock Mainframe and IBM i for the Data Lakehouse

Ashwin Ramachandran, Precisely |

Data Engineering & Architecture

You’re moving more data and workloads to Databricks – but what do you do with the application data inside your mainframes or IBM i systems? You cannot ignore this critical data – but making it usable in Databricks is easier said than done. That is why you need an approach to data integration that helps...

Wednesday, 10:30 am (PDT)

Your Tools Suck!! Re-Imagining Apache Spark Development

Raj Bains, Prophecy.io |

Data Engineering & Architecture

Enterprises used ETL tools for decades for higher productivity and standardization. Data Engineers see these tools don’t work anymore and have moved to code. However, we’re back again to ad hoc scripts and frameworks -reminding us of the world before ETL tools. We show how a new generation of tools can be built for Spark...

Wednesday, 11:00 am (PDT)

Accelerating MLflow Hyper-parameter Optimization Pipelines with RAPIDS

John Zedlewski, NVIDIA |

Data Science, ML & Deep Learning

When combined with scale-out cloud infrastructure, modern hyperparameter optimization (HPO) libraries allow data scientists to deploy more compute power to improve model accuracy, running hundreds or thousands of model variants with minimal code changes. HPO has traditionally run into two barriers – complexity of model management and computational cost. In this talk, we walk through...

Wednesday, 11:00 am (PDT)

Building A Data Platform For Mission Critical Analytics

Igor Alekseev, AWS | Sally Hoppe, HP | Denis Dubeau, Databricks |

Data Engineering & Architecture

How are customers building enterprise data lakes on AWS with Databricks? Learn how Databricks complements the AWS data lake strategy and how HP has succeeded in transforming business with this approach.

Wednesday, 11:00 am (PDT)

Building a Feature Store around Dataframes and Apache Spark

Jim Dowling, Logical Clocks AB | Fabio Buso, Logical Clocks AB |

Data Science, ML & Deep Learning

A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they...

Wednesday, 11:00 am (PDT)

Composable Data Processing with Apache Spark

Dilip Biswal, Adobe, Inc. | Shone Sadler, Adobe, Inc. |

Developer

As the usage of Apache Spark continues to ramp up within the industry, a major challenge has been scaling our development. Too often we find that developers are re-implementing a similar set of cross-cutting concerns, sprinkled with some variance of use-case specific business logic as a concrete Spark App. The consequences of this anti-pattern are...

Wednesday, 11:00 am (PDT)

Deep Dive into the New Features of Apache Spark 3.0

Xiao Li, Databricks | Wenchen Fan, Databricks |

Developer

Continuing with the objectives to make Spark faster, easier, and smarter, Apache Spark 3.0 extends its scope with more than 3000 resolved JIRAs. We will talk about the exciting new developments in the Spark 3.0 as well as some other major initiatives that are coming in the future. In this talk, we want to share...

Wednesday, 11:00 am (PDT)

Delta from a Data Engineer’s Perspective

Palla Lentz, Databricks | Jake Therianos, Databricks |

Developer

Take a walk through the daily struggles of a data engineer in this presentation as we cover what is truly needed to create robust end to end Big Data solutions.

Wednesday, 11:00 am (PDT)

Disrupting Risk Management through Emerging Technologies

Badrish Davay, Capital One | Yuri Bogdanov, Capital One |

Data & ML Industry Use Cases

In the Financial markets with credit card companies there is always a need to measure the risk optimally and understand the performance of products before we could invest and make strategic decisions. At Capital One we are leveraging technologies to provide end to end analytical experiences for modelers and enable self service solutions for analysts...

Wednesday, 11:00 am (PDT)

End-to-End Deep Learning with Horovod on Apache Spark

Thomas Graves, NVIDIA | Travis Addair, Uber |

Data Science, ML & Deep Learning

Data processing and deep learning are often split into two pipelines, one for ETL processing, the second for model training. Enabling deep learning frameworks to integrate seamlessly with ETL jobs allows for more streamlined production jobs, with faster iteration between feature engineering and model training. The newly introduced Horovod Spark Estimator API enables TensorFlow and...

Wednesday, 11:00 am (PDT)

Find and Protect Your Crown Jewels in Databricks with Privacera and Apache Ranger

Dr. Srikanth Venkat, Privacera |

Data Engineering & Architecture

Optimize the performance of Databricks infrastructure by adding Privacera security and compliance workflows. Learn how Privacera’s Apache Ranger-based architecture in the cloud integrates with Databricks Delta Lake to enable a secure multi-tenant framework to efficiently find and easily manage sensitive data with centralized, fine-grained access control.

Wednesday, 11:00 am (PDT)

Make the Most of Your Talent and Time When Working on AI and ML Projects – Automate the Rest

Ramesh Menon, Infoworks | Kevin Holder, Infoworks |

Data Science, ML & Deep Learning

Tired of spending too much time on manually intensive tasks to onboard and prepare data for your AI and ML projects? A proliferation of loosely integrated point tools and the lack of automation results in a great deal of time spent writing glue code and coordinating tooling, instead of training and operationalizing your ML models....

Wednesday, 11:00 am (PDT)

Managing Business Risks of Large Scale Cloud Migrations

Madasamy Thandayuthapani, WANdisco |

Data Engineering & Architecture

The leading enterprises continue to drive digital transformation and are modernizing their data architecture to take advantage of the many economic and functional benefits enabled by the cloud. While the move to the cloud is making companies more competitive, lean and nimble, many technical teams are concerned about the complexities and business risks associated with...

Wednesday, 11:00 am (PDT)

Performant Streaming in Production: Preventing Common Pitfalls when Productionizing Streaming Jobs

Stefan van Wouw, Databricks | Max Thone, Databricks |

Data Engineering & Architecture

Running a stream in a development environment is relatively easy. However, some topics can cause serious issues in production when they are not addressed properly. In this presentation we want to cover 4 topics that, when not addressed, can lead to serious issues for streams in production. The first topic considers what happens if input...

Wednesday, 11:00 am (PDT)

Portable Scalable Data Visualization Techniques for Apache Spark and Python Notebook-based Analytics

Douglas Moore, Databricks |

Data Engineering & Architecture

Python Notebooks are great for communicating data analysis & research but how do you port these data visualizations between the many available platforms (Jupyter, Databricks, Zeppelin, Colab,…). Also learn about how to scale up your visualizations using Spark. This talk will address: 6-8 strategies to render Matplotlib that generalize well Reviewing the landscape of Python...

Wednesday, 11:00 am (PDT)

Power of Visualizing Embeddings

Pramod Singh, Bain & Company |

Data Science, ML & Deep Learning

Text or Image classification done using deep neural networks presents us with a unique way to identify each trained image/word via something known as ‘Embedding’. Embedding refers to fix sized vectors which are learnt during the training process of a neural network but it is very difficult to make sense of these random values. However,...

Looking for a talk from a past event? Check the Video Archive

Organized by Databricks
If you have questions, or would like information on sponsoring a Spark + AI Summit, please contact organizers@spark-summit.org.

Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event.