Data + AI Summit 2021

Training

Data + AI Summit 2021 training will be held on May 24-25, with an expanded curriculum of half and full day classes. Most training classes will include both lecture and hands-on exercises. Apache Spark™ 3.x certification is also offered as an exam.

Pricing: Full-Day $400 and Half-Day $200

Databricks Lakehouse Overview – Complimentary

Role: All audiences
Duration: Half-day
Labs: None

In this course, you’ll discover how the Databricks Lakehouse Platform can help you compete in the world of big data and artificial intelligence. In the first half of the course, we’ll introduce you to foundational concepts in big data, explain key roles and abilities to look for when building data teams, and familiarize you with all parts of a complete data landscape. In the second half, we’ll review how the Databricks Lakehouse Platform can help your organization streamline workflows, break down silos, and make the most out of your data.

Please note: This course provides a high-level overview of big data concepts and the Databricks Lakehouse platform. It does not contain hands-on labs or technical deep dives into Databricks functionality.

Prerequisites:

No programming experience required
No experience with Databricks required

Lakehouse with Delta Lake Deep Dive – Complimentary

Role: Technical
Duration: Half-day
Labs: None

In this course, we will provide a brief overview of data architecture concepts, an introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. You will learn about applying software engineering principles with Databricks as we demonstrate how to build end-to-end OLAP data pipelines using Delta Lake for batch and streaming data. The course also discusses serving data to end users through aggregate tables and Databricks SQL Analytics. Throughout the course, emphasis will be placed on using data engineering best practices with Databricks.

By the end of the course, students will understand how to apply data engineering best practices within Databricks.

Prerequisites:

Familiarity with data engineering concepts
Basic knowledge of Delta Lake core features and use cases

Introduction to SQL Analytics on Lakehouse

Role: Data Analyst
Duration: Half Day
Labs: Yes

Meet Databricks SQL Analytics and find out how you can achieve high performance while querying directly on your organization’s data lake. Using Databricks SQL Analytics, learners will practice writing and visualizing queries.

Students will leave this course having created a personal dashboard, complete with parameterized queries and automated alerts.

Prerequisites:

No experience with Apache Spark is required
Basic familiarity with ANSI SQL

Data Engineering with Lakehouse

Role: Data Engineer
Duration: Full day
Labs: Yes

Review data architecture concepts during this introduction to the Lakehouse paradigm and an in-depth look at Delta Lake features and functionality. Learn to build end-to-end OLAP data pipelines using Delta Lake.

Emphasis will be placed on using data engineering best practices within Databricks and exploring considerations around:

Normalization
Change data capture
Slow changing dimensions
Regulatory compliance
End user data through aggregate tables and SQL Analytics

Prerequisites:

Intermediate to advanced programming in Python/Scala
Beginning experience using the Spark DataFrames API
Intermediate to advanced SQL skills
Awareness of general data engineering concepts
Understanding of the core features and use cases of Delta Lake

Structured Streaming with Lakehouse

Role: Data Engineer
Duration: Full day
Labs: Yes

Learn how to quickly build, deploy, and maintain an enterprise Lakehouse platform by combining Delta Lake and Spark Structured Streaming on Databricks. This course highlights the use of cutting-edge Databricks features that simplify scaling streaming analytics.

At the end of this training, you’ll be able to:

Describe the basic programming model used by Structured Streaming
Use Databricks AutoLoader to write robust ingestion logic for JSON data that is robust to schema changes
Propagate streaming inserts, updates, and deletes through a series of Delta tables using Delta Change Stream
Join streaming and static data to generate near real-time analytic views

Prerequisites:

6+ months experience working with the Spark DataFrame API is recommended
Intermediate programming experience
Conceptual familiarity with the Delta Lake multi-hop architecture (technical experience preferred)

Certification Prep: Databricks Certified Associate Developer for Apache Spark

Role: Data Engineer, ML Engineer
Duration: Half Day
Labs: None

Prepare for the Databricks Certified Associate Developer for Apache Spark exam. This course will describe the certification program and the exam details, format, and structure. The course will also examine the types of questions that will be asked during the exam and strategies for answering them. While this course will provide students with an outline of topics assessed by the exam, it will not teach the topics.

Prerequisites:

Beginner-level understanding of Spark’s Architecture
Intermediate experience with the Spark DataFrame API in Python or Scala

Apache Spark™ Programming

Role: Data Engineer, ML Engineer
Duration: Full Day
Labs: Yes

Learn the fundamentals of Spark programming in a case study-driven course that explores the core components of the DataFrame API. During this Python/Scala based course you will:

Read and write data to various sources
Preprocess data by correcting schemas and parsing different data types
Apply multiple DataFrame transformations and actions to answer business questions.

Once completed, this course will give students the essential concepts and skills needed to navigate the Spark documentation and start programming immediately.

Prerequisites:

No experience with Apache Spark is required
Basic familiarity programming in Python or Scala

Scaling Machine Learning Pipelines with Apache Spark

Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes

Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:

Investigating common inefficiencies in machine learning
Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs

Prerequisites:

Beginning experience with the PySpark DataFrame API
Intermediate experience with Python
Working knowledge of machine learning and data science (and have built models before)

Foundations of Scalable Natural Language Processing

Role: Data Scientist
Duration: Half Day
Labs: Yes

Learn the fundamentals of natural language processing (NLP) and how to scale it as you solve classification, sentiment analysis, and text wrangling tasks. We will cover how to perform distributed model inference and hyperparameter search, as well as build distributed NLP models. Working through code examples and assignments in Python, students will learn how to use dimensionality reduction techniques, apply pre-trained word embeddings, leverage MLflow for experiment tracking and model management, and apply NLP models to massive datasets with Pandas UDFs.

Prerequisites:

Experience programming in Python
Beginning experience with the PySpark DataFrame API

Scaling Machine Learning Pipelines with Apache Spark

Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes

Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:

Investigating common inefficiencies in machine learning
Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs

Prerequisites:

Beginning experience with the PySpark DataFrame API
Intermediate experience with Python
Working knowledge of machine learning and data science (and have built models before)

Performance Tuning on Apache Spark

Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes

Complete guided challenges as you learn to diagnose and fix poorly performing queries. Using Python/Scala, participants will review performance problems to uncover solutions and best practices to be applied to your queries.

Prerequisites:

6+ months experience working with the Spark DataFrame API is recommended
Intermediate programming experience in Python or Scala

Machine Learning in Production: MLflow and Model Deployment

Role: ML Engineers
Duration: Full Day
Labs: Yes

Gain best practices for managing experiments, projects, models, and serving using MLflow. Students will build a pipeline to both log and deploy machine learning models. By the end of this course, data scientists and engineers will also be familiar with common production issues and how to monitor models once in production.

Prerequisites:

Experience programming in Python and pandas
Working knowledge of ML concepts
Familiarity with Apache Spark
Basic understanding of object storage, databases and networking

Databricks Platform Administration

Role: Platform Engineer
Duration: Half Day
Labs: No

Learn the fundamentals of Databricks administration and best practices for management and security. This course will guide you through essential concepts and configurations to consider, including:

Platform architecture and deployment guidelines for network security
Workspace provisioning and identity management
Methods to access and protect data in Databricks
Cluster provisioning strategies and cost management

Prerequisites:

No programming experience is required

Scaling Deep Learning with TensorFlow and Apache Spark™

Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes

This course covers how to scale your deep learning models with Apache Spark, including distributed model training, hyperparameter tuning, and inference. Taught entirely in Python, throughout this course students will be guided to:

Build deep learning models with TensorFlow
Perform distributed inference with Spark UDFs via MLflow, as well as distributed hyperparameter tuning with HyperOpt
Train a distributed model across a cluster using Horovod

Prerequisites:

Experience programming in Python and PySpark
Basic understanding of Machine Learning concepts
Prior experience with Keras/TensorFlow highly encouraged

Security Threat Detection with Databricks

Role: Data Engineer, Security Engineer, Threat Hunter, Security Analyst, Security Data Scientist, Detection Engineer
Duration: Half Day
Labs: Yes

Are you a security practitioner? Are you looking for better ways of threat detection, contextualization, threat hunting? Are you a data scientist wanting to work more closely with security ops teams? In this hands on course, you will learn how easy it is to ingest data into Delta Lake, analyze DNS data, enrich it using threat intel, create detections using ML models and detect cyber criminals. You will use Databricks notebooks to collaborate and MLFlow to deploy your models for automated, future use. Not familiar with data science or Databricks? Not to worry. The course’s live-support staff has decades of security operations and data science experience.

Prerequisites:

Experience programming in Python
Familiarity with SQL
Familiarity with security concepts of threat intelligence, ransomware, phishing
Familiarity with security operations concepts of threat detection, threat hunting

Databricks Lakehouse Overview – Complimentary

Role: All audiences
Duration: Half-day
Labs: None

Prerequisites:

No programming experience required
No experience with Databricks required

Lakehouse with Delta Lake Deep Dive

Role: Technical
Duration: Half-day
Labs: None

Benefit from an overview of data architecture concepts, introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. You will have the opportunity to:

View a demonstration of building end-to-end OLAP data pipelines with Delta Lake for batch and streaming data
Participate in conversations on serving data to end-users through aggregate tables and Databricks SQL Analytics.

By the end of the course, students will understand how to apply data engineering best practices within Databricks.

Prerequisites:

Familiarity with data engineering concepts
Basic knowledge of Delta Lake core features and use cases

Databricks Lakehouse Overview

Role: All audiences
Duration: Half-day
Labs: None

Prerequisites:

No programming experience required
No experience with Databricks required

Introduction to SQL Analytics on Lakehouse

Role: Data Analyst
Duration: Half Day
Labs: Yes

Students will leave this course having created a personal dashboard, complete with parameterized queries and automated alerts.

Prerequisites:

No experience with Apache Spark is required
Basic familiarity with ANSI SQL

Data Engineering with Lakehouse

Role: Data Engineer
Duration: Full day
Labs: Yes

Emphasis will be placed on using data engineering best practices within Databricks and exploring considerations around:

Normalization
Change data capture
Slow changing dimensions
Regulatory compliance
End user data through aggregate tables and SQL Analytics

Prerequisites:

Intermediate to advanced programming in Python/Scala
Beginning experience using the Spark DataFrames API
Intermediate to advanced SQL skills
Awareness of general data engineering concepts
Understanding of the core features and use cases of Delta Lake

Structured Streaming with Lakehouse

Role: Data Engineer
Duration: Full day
Labs: Yes

At the end of this training, you’ll be able to:

Describe the basic programming model used by Structured Streaming
Use Databricks AutoLoader to write robust ingestion logic for JSON data that is robust to schema changes
Propagate streaming inserts, updates, and deletes through a series of Delta tables using Delta Change Stream
Join streaming and static data to generate near real-time analytic views

Prerequisites:

6+ months experience working with the Spark DataFrame API is recommended
Intermediate programming experience
Conceptual familiarity with the Delta Lake multi-hop architecture (technical experience preferred)

Certification Prep: Databricks Certified Associate Developer for Apache Spark

Role: Data Engineer, ML Engineer
Duration: Half Day
Labs: None

Prerequisites:

Beginner-level understanding of Spark’s Architecture
Intermediate experience with the Spark DataFrame API in Python or Scala

Apache Spark™ Programming

Role: Data Engineer, ML Engineer
Duration: Full Day
Labs: Yes

Learn the fundamentals of Spark programming in a case study-driven course that explores the core components of the DataFrame API. During this Python/Scala based course you will:

Read and write data to various sources
Preprocess data by correcting schemas and parsing different data types
Apply multiple DataFrame transformations and actions to answer business questions.

Once completed, this course will give students the essential concepts and skills needed to navigate the Spark documentation and start programming immediately.

Prerequisites:

No experience with Apache Spark is required
Basic familiarity programming in Python or Scala

Performance Tuning on Apache Spark

Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes

Prerequisites:

6+ months experience working with the Spark DataFrame API is recommended
Intermediate programming experience in Python or Scala

Security Threat Detection with Databricks

Role: Data Engineer, Security Engineer, Threat Hunter, Security Analyst, Security Data Scientist, Detection Engineer
Duration: Half Day
Labs: Yes

Prerequisites:

Experience programming in Python
Familiarity with SQL
Familiarity with security concepts of threat intelligence, ransomware, phishing
Familiarity with security operations concepts of threat detection, threat hunting

Scaling Deep Learning with TensorFlow and Apache Spark™

Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes

Build deep learning models with TensorFlow
Perform distributed inference with Spark UDFs via MLflow, as well as distributed hyperparameter tuning with HyperOpt
Train a distributed model across a cluster using Horovod

Prerequisites:

Experience programming in Python and PySpark
Basic understanding of Machine Learning concepts
Prior experience with Keras/TensorFlow highly encouraged

Foundations of Scalable Natural Language Processing

Role: Data Scientist
Duration: Half Day
Labs: Yes

Prerequisites:

Experience programming in Python
Beginning experience with the PySpark DataFrame API

Scaling Machine Learning Pipelines with Apache Spark

Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes

Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:

Investigating common inefficiencies in machine learning
Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs

Prerequisites:

Beginning experience with the PySpark DataFrame API
Intermediate experience with Python
Working knowledge of machine learning and data science (and have built models before)

Performance Tuning on Apache Spark

Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes

Prerequisites:

6+ months experience working with the Spark DataFrame API is recommended
Intermediate programming experience in Python or Scala

Certification Prep: Databricks Certified Associate Developer for Apache Spark

Role: Data Engineer, ML Engineer
Duration: Half Day
Labs: None

Prerequisites:

Beginner-level understanding of Spark’s Architecture
Intermediate experience with the Spark DataFrame API in Python or Scala

Apache Spark™ Programming

Role: Data Engineer, ML Engineer
Duration: Full Day
Labs: Yes

Learn the fundamentals of Spark programming in a case study-driven course that explores the core components of the DataFrame API. During this Python/Scala based course you will:

Read and write data to various sources
Preprocess data by correcting schemas and parsing different data types
Apply multiple DataFrame transformations and actions to answer business questions.

Once completed, this course will give students the essential concepts and skills needed to navigate the Spark documentation and start programming immediately.

Prerequisites:

No experience with Apache Spark is required
Basic familiarity programming in Python or Scala

Scaling Machine Learning Pipelines with Apache Spark

Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes

Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:

Investigating common inefficiencies in machine learning
Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs

Prerequisites:

Beginning experience with the PySpark DataFrame API
Intermediate experience with Python
Working knowledge of machine learning and data science (and have built models before)

Machine Learning in Production: MLflow and Model Deployment

Role: ML Engineers
Duration: Full Day
Labs: Yes

Prerequisites:

Experience programming in Python and pandas
Working knowledge of ML concepts
Familiarity with Apache Spark
Basic understanding of object storage, databases and networking

Performance Tuning on Apache Spark

Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes

Prerequisites:

6+ months experience working with the Spark DataFrame API is recommended
Intermediate programming experience in Python or Scala

Databricks Platform Administration

Role: Platform Engineer
Duration: Half Day
Labs: No

Learn the fundamentals of Databricks administration and best practices for management and security. This course will guide you through essential concepts and configurations to consider, including:

Platform architecture and deployment guidelines for network security
Workspace provisioning and identity management
Methods to access and protect data in Databricks
Cluster provisioning strategies and cost management

Prerequisites:

No programming experience is required

Lakehouse with Delta Lake Deep Dive – Complimentary

Role: Technical
Duration: Half-day
Labs: None

Benefit from an overview of data architecture concepts, introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. You will have the opportunity to:

View a demonstration of building end-to-end OLAP data pipelines with Delta Lake for batch and streaming data
Participate in conversations on serving data to end-users through aggregate tables and Databricks SQL Analytics.

By the end of the course, students will understand how to apply data engineering best practices within Databricks.

Prerequisites:

Familiarity with data engineering concepts
Basic knowledge of Delta Lake core features and use cases