Training
Training
Data + AI Summit 2021 training will be held on May 24-25, with an expanded curriculum of half and full day classes. Most training classes will include both lecture and hands-on exercises. Apache Spark™ 3.x certification is also offered as an exam.
Pricing: Full-Day $400 and Half-Day $200
Databricks Lakehouse Overview – Complimentary
Role: All audiences
Duration: Half-day
Labs: None
In this course, you’ll discover how the Databricks Lakehouse Platform can help you compete in the world of big data and artificial intelligence. In the first half of the course, we’ll introduce you to foundational concepts in big data, explain key roles and abilities to look for when building data teams, and familiarize you with all parts of a complete data landscape. In the second half, we’ll review how the Databricks Lakehouse Platform can help your organization streamline workflows, break down silos, and make the most out of your data.
Please note: This course provides a high-level overview of big data concepts and the Databricks Lakehouse platform. It does not contain hands-on labs or technical deep dives into Databricks functionality.
Prerequisites:
- No programming experience required
- No experience with Databricks required
Lakehouse with Delta Lake Deep Dive – Complimentary
Role: Technical
Duration: Half-day
Labs: None
In this course, we will provide a brief overview of data architecture concepts, an introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. You will learn about applying software engineering principles with Databricks as we demonstrate how to build end-to-end OLAP data pipelines using Delta Lake for batch and streaming data. The course also discusses serving data to end users through aggregate tables and Databricks SQL Analytics. Throughout the course, emphasis will be placed on using data engineering best practices with Databricks.
By the end of the course, students will understand how to apply data engineering best practices within Databricks.
Prerequisites:
- Familiarity with data engineering concepts
- Basic knowledge of Delta Lake core features and use cases
Introduction to SQL Analytics on Lakehouse
Role: Data Analyst
Duration: Half Day
Labs: Yes
Meet Databricks SQL Analytics and find out how you can achieve high performance while querying directly on your organization’s data lake. Using Databricks SQL Analytics, learners will practice writing and visualizing queries.
Students will leave this course having created a personal dashboard, complete with parameterized queries and automated alerts.
Prerequisites:
- No experience with Apache Spark is required
- Basic familiarity with ANSI SQL
Data Engineering with Lakehouse
Role: Data Engineer
Duration: Full day
Labs: Yes
Review data architecture concepts during this introduction to the Lakehouse paradigm and an in-depth look at Delta Lake features and functionality. Learn to build end-to-end OLAP data pipelines using Delta Lake.
Emphasis will be placed on using data engineering best practices within Databricks and exploring considerations around:
- Normalization
- Change data capture
- Slow changing dimensions
- Regulatory compliance
- End user data through aggregate tables and SQL Analytics
Prerequisites:
- Intermediate to advanced programming in Python/Scala
- Beginning experience using the Spark DataFrames API
- Intermediate to advanced SQL skills
- Awareness of general data engineering concepts
- Understanding of the core features and use cases of Delta Lake
Structured Streaming with Lakehouse
Role: Data Engineer
Duration: Full day
Labs: Yes
Learn how to quickly build, deploy, and maintain an enterprise Lakehouse platform by combining Delta Lake and Spark Structured Streaming on Databricks. This course highlights the use of cutting-edge Databricks features that simplify scaling streaming analytics.
At the end of this training, you’ll be able to:
- Describe the basic programming model used by Structured Streaming
- Use Databricks AutoLoader to write robust ingestion logic for JSON data that is robust to schema changes
- Propagate streaming inserts, updates, and deletes through a series of Delta tables using Delta Change Stream
- Join streaming and static data to generate near real-time analytic views
Prerequisites:
- 6+ months experience working with the Spark DataFrame API is recommended
- Intermediate programming experience
- Conceptual familiarity with the Delta Lake multi-hop architecture (technical experience preferred)
Certification Prep: Databricks Certified Associate Developer for Apache Spark
Role: Data Engineer, ML Engineer
Duration: Half Day
Labs: None
Prepare for the Databricks Certified Associate Developer for Apache Spark exam. This course will describe the certification program and the exam details, format, and structure. The course will also examine the types of questions that will be asked during the exam and strategies for answering them. While this course will provide students with an outline of topics assessed by the exam, it will not teach the topics.
Prerequisites:
- Beginner-level understanding of Spark’s Architecture
- Intermediate experience with the Spark DataFrame API in Python or Scala
Apache Spark™ Programming
Role: Data Engineer, ML Engineer
Duration: Full Day
Labs: Yes
Learn the fundamentals of Spark programming in a case study-driven course that explores the core components of the DataFrame API. During this Python/Scala based course you will:
- Read and write data to various sources
- Preprocess data by correcting schemas and parsing different data types
- Apply multiple DataFrame transformations and actions to answer business questions.
Once completed, this course will give students the essential concepts and skills needed to navigate the Spark documentation and start programming immediately.
Prerequisites:
- No experience with Apache Spark is required
- Basic familiarity programming in Python or Scala
Scaling Machine Learning Pipelines with Apache Spark
Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes
Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:
- Investigating common inefficiencies in machine learning
- Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
- Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs
Prerequisites:
- Beginning experience with the PySpark DataFrame API
- Intermediate experience with Python
- Working knowledge of machine learning and data science (and have built models before)
Foundations of Scalable Natural Language Processing
Role: Data Scientist
Duration: Half Day
Labs: Yes
Learn the fundamentals of natural language processing (NLP) and how to scale it as you solve classification, sentiment analysis, and text wrangling tasks. We will cover how to perform distributed model inference and hyperparameter search, as well as build distributed NLP models. Working through code examples and assignments in Python, students will learn how to use dimensionality reduction techniques, apply pre-trained word embeddings, leverage MLflow for experiment tracking and model management, and apply NLP models to massive datasets with Pandas UDFs.
Prerequisites:
- Experience programming in Python
- Beginning experience with the PySpark DataFrame API
Scaling Machine Learning Pipelines with Apache Spark
Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes
Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:
- Investigating common inefficiencies in machine learning
- Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
- Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs
Prerequisites:
- Beginning experience with the PySpark DataFrame API
- Intermediate experience with Python
- Working knowledge of machine learning and data science (and have built models before)
Performance Tuning on Apache Spark
Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes
Complete guided challenges as you learn to diagnose and fix poorly performing queries. Using Python/Scala, participants will review performance problems to uncover solutions and best practices to be applied to your queries.
Prerequisites:
- 6+ months experience working with the Spark DataFrame API is recommended
- Intermediate programming experience in Python or Scala
Machine Learning in Production: MLflow and Model Deployment
Role: ML Engineers
Duration: Full Day
Labs: Yes
Gain best practices for managing experiments, projects, models, and serving using MLflow. Students will build a pipeline to both log and deploy machine learning models. By the end of this course, data scientists and engineers will also be familiar with common production issues and how to monitor models once in production.
Prerequisites:
- Experience programming in Python and pandas
- Working knowledge of ML concepts
- Familiarity with Apache Spark
- Basic understanding of object storage, databases and networking
Databricks Platform Administration
Role: Platform Engineer
Duration: Half Day
Labs: No
Learn the fundamentals of Databricks administration and best practices for management and security. This course will guide you through essential concepts and configurations to consider, including:
- Platform architecture and deployment guidelines for network security
- Workspace provisioning and identity management
- Methods to access and protect data in Databricks
- Cluster provisioning strategies and cost management
Prerequisites:
- No programming experience is required
Scaling Deep Learning with TensorFlow and Apache Spark™
Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes
This course covers how to scale your deep learning models with Apache Spark, including distributed model training, hyperparameter tuning, and inference. Taught entirely in Python, throughout this course students will be guided to:
- Build deep learning models with TensorFlow
- Perform distributed inference with Spark UDFs via MLflow, as well as distributed hyperparameter tuning with HyperOpt
- Train a distributed model across a cluster using Horovod
Prerequisites:
- Experience programming in Python and PySpark
- Basic understanding of Machine Learning concepts
- Prior experience with Keras/TensorFlow highly encouraged
Security Threat Detection with Databricks
Role: Data Engineer, Security Engineer, Threat Hunter, Security Analyst, Security Data Scientist, Detection Engineer
Duration: Half Day
Labs: Yes
Are you a security practitioner? Are you looking for better ways of threat detection, contextualization, threat hunting? Are you a data scientist wanting to work more closely with security ops teams? In this hands on course, you will learn how easy it is to ingest data into Delta Lake, analyze DNS data, enrich it using threat intel, create detections using ML models and detect cyber criminals. You will use Databricks notebooks to collaborate and MLFlow to deploy your models for automated, future use. Not familiar with data science or Databricks? Not to worry. The course’s live-support staff has decades of security operations and data science experience.
Prerequisites:
- Experience programming in Python
- Familiarity with SQL
- Familiarity with security concepts of threat intelligence, ransomware, phishing
- Familiarity with security operations concepts of threat detection, threat hunting
Databricks Lakehouse Overview – Complimentary
Role: All audiences
Duration: Half-day
Labs: None
In this course, you’ll discover how the Databricks Lakehouse Platform can help you compete in the world of big data and artificial intelligence. In the first half of the course, we’ll introduce you to foundational concepts in big data, explain key roles and abilities to look for when building data teams, and familiarize you with all parts of a complete data landscape. In the second half, we’ll review how the Databricks Lakehouse Platform can help your organization streamline workflows, break down silos, and make the most out of your data.
Please note: This course provides a high-level overview of big data concepts and the Databricks Lakehouse platform. It does not contain hands-on labs or technical deep dives into Databricks functionality.
Prerequisites:
- No programming experience required
- No experience with Databricks required
Lakehouse with Delta Lake Deep Dive
Role: Technical
Duration: Half-day
Labs: None
Benefit from an overview of data architecture concepts, introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. You will have the opportunity to:
- View a demonstration of building end-to-end OLAP data pipelines with Delta Lake for batch and streaming data
- Participate in conversations on serving data to end-users through aggregate tables and Databricks SQL Analytics.
By the end of the course, students will understand how to apply data engineering best practices within Databricks.
Prerequisites:
- Familiarity with data engineering concepts
- Basic knowledge of Delta Lake core features and use cases
Databricks Lakehouse Overview
Role: All audiences
Duration: Half-day
Labs: None
In this course, you’ll discover how the Databricks Lakehouse Platform can help you compete in the world of big data and artificial intelligence. In the first half of the course, we’ll introduce you to foundational concepts in big data, explain key roles and abilities to look for when building data teams, and familiarize you with all parts of a complete data landscape. In the second half, we’ll review how the Databricks Lakehouse Platform can help your organization streamline workflows, break down silos, and make the most out of your data.
Please note: This course provides a high-level overview of big data concepts and the Databricks Lakehouse platform. It does not contain hands-on labs or technical deep dives into Databricks functionality.
Prerequisites:
- No programming experience required
- No experience with Databricks required
Introduction to SQL Analytics on Lakehouse
Role: Data Analyst
Duration: Half Day
Labs: Yes
Meet Databricks SQL Analytics and find out how you can achieve high performance while querying directly on your organization’s data lake. Using Databricks SQL Analytics, learners will practice writing and visualizing queries.
Students will leave this course having created a personal dashboard, complete with parameterized queries and automated alerts.
Prerequisites:
- No experience with Apache Spark is required
- Basic familiarity with ANSI SQL
Data Engineering with Lakehouse
Role: Data Engineer
Duration: Full day
Labs: Yes
Review data architecture concepts during this introduction to the Lakehouse paradigm and an in-depth look at Delta Lake features and functionality. Learn to build end-to-end OLAP data pipelines using Delta Lake.
Emphasis will be placed on using data engineering best practices within Databricks and exploring considerations around:
- Normalization
- Change data capture
- Slow changing dimensions
- Regulatory compliance
- End user data through aggregate tables and SQL Analytics
Prerequisites:
- Intermediate to advanced programming in Python/Scala
- Beginning experience using the Spark DataFrames API
- Intermediate to advanced SQL skills
- Awareness of general data engineering concepts
- Understanding of the core features and use cases of Delta Lake
Structured Streaming with Lakehouse
Role: Data Engineer
Duration: Full day
Labs: Yes
Learn how to quickly build, deploy, and maintain an enterprise Lakehouse platform by combining Delta Lake and Spark Structured Streaming on Databricks. This course highlights the use of cutting-edge Databricks features that simplify scaling streaming analytics.
At the end of this training, you’ll be able to:
- Describe the basic programming model used by Structured Streaming
- Use Databricks AutoLoader to write robust ingestion logic for JSON data that is robust to schema changes
- Propagate streaming inserts, updates, and deletes through a series of Delta tables using Delta Change Stream
- Join streaming and static data to generate near real-time analytic views
Prerequisites:
- 6+ months experience working with the Spark DataFrame API is recommended
- Intermediate programming experience
- Conceptual familiarity with the Delta Lake multi-hop architecture (technical experience preferred)
Certification Prep: Databricks Certified Associate Developer for Apache Spark
Role: Data Engineer, ML Engineer
Duration: Half Day
Labs: None
Prepare for the Databricks Certified Associate Developer for Apache Spark exam. This course will describe the certification program and the exam details, format, and structure. The course will also examine the types of questions that will be asked during the exam and strategies for answering them. While this course will provide students with an outline of topics assessed by the exam, it will not teach the topics.
Prerequisites:
- Beginner-level understanding of Spark’s Architecture
- Intermediate experience with the Spark DataFrame API in Python or Scala
Apache Spark™ Programming
Role: Data Engineer, ML Engineer
Duration: Full Day
Labs: Yes
Learn the fundamentals of Spark programming in a case study-driven course that explores the core components of the DataFrame API. During this Python/Scala based course you will:
- Read and write data to various sources
- Preprocess data by correcting schemas and parsing different data types
- Apply multiple DataFrame transformations and actions to answer business questions.
Once completed, this course will give students the essential concepts and skills needed to navigate the Spark documentation and start programming immediately.
Prerequisites:
- No experience with Apache Spark is required
- Basic familiarity programming in Python or Scala
Performance Tuning on Apache Spark
Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes
Complete guided challenges as you learn to diagnose and fix poorly performing queries. Using Python/Scala, participants will review performance problems to uncover solutions and best practices to be applied to your queries.
Prerequisites:
- 6+ months experience working with the Spark DataFrame API is recommended
- Intermediate programming experience in Python or Scala
Security Threat Detection with Databricks
Role: Data Engineer, Security Engineer, Threat Hunter, Security Analyst, Security Data Scientist, Detection Engineer
Duration: Half Day
Labs: Yes
Are you a security practitioner? Are you looking for better ways of threat detection, contextualization, threat hunting? Are you a data scientist wanting to work more closely with security ops teams? In this hands on course, you will learn how easy it is to ingest data into Delta Lake, analyze DNS data, enrich it using threat intel, create detections using ML models and detect cyber criminals. You will use Databricks notebooks to collaborate and MLFlow to deploy your models for automated, future use. Not familiar with data science or Databricks? Not to worry. The course’s live-support staff has decades of security operations and data science experience.
Prerequisites:
- Experience programming in Python
- Familiarity with SQL
- Familiarity with security concepts of threat intelligence, ransomware, phishing
- Familiarity with security operations concepts of threat detection, threat hunting
Scaling Deep Learning with TensorFlow and Apache Spark™
Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes
This course covers how to scale your deep learning models with Apache Spark, including distributed model training, hyperparameter tuning, and inference. Taught entirely in Python, throughout this course students will be guided to:
- Build deep learning models with TensorFlow
- Perform distributed inference with Spark UDFs via MLflow, as well as distributed hyperparameter tuning with HyperOpt
- Train a distributed model across a cluster using Horovod
Prerequisites:
- Experience programming in Python and PySpark
- Basic understanding of Machine Learning concepts
- Prior experience with Keras/TensorFlow highly encouraged
Foundations of Scalable Natural Language Processing
Role: Data Scientist
Duration: Half Day
Labs: Yes
Learn the fundamentals of natural language processing (NLP) and how to scale it as you solve classification, sentiment analysis, and text wrangling tasks. We will cover how to perform distributed model inference and hyperparameter search, as well as build distributed NLP models. Working through code examples and assignments in Python, students will learn how to use dimensionality reduction techniques, apply pre-trained word embeddings, leverage MLflow for experiment tracking and model management, and apply NLP models to massive datasets with Pandas UDFs.
Prerequisites:
- Experience programming in Python
- Beginning experience with the PySpark DataFrame API
Scaling Machine Learning Pipelines with Apache Spark
Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes
Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:
- Investigating common inefficiencies in machine learning
- Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
- Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs
Prerequisites:
- Beginning experience with the PySpark DataFrame API
- Intermediate experience with Python
- Working knowledge of machine learning and data science (and have built models before)
Performance Tuning on Apache Spark
Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes
Complete guided challenges as you learn to diagnose and fix poorly performing queries. Using Python/Scala, participants will review performance problems to uncover solutions and best practices to be applied to your queries.
Prerequisites:
- 6+ months experience working with the Spark DataFrame API is recommended
- Intermediate programming experience in Python or Scala
Certification Prep: Databricks Certified Associate Developer for Apache Spark
Role: Data Engineer, ML Engineer
Duration: Half Day
Labs: None
Prepare for the Databricks Certified Associate Developer for Apache Spark exam. This course will describe the certification program and the exam details, format, and structure. The course will also examine the types of questions that will be asked during the exam and strategies for answering them. While this course will provide students with an outline of topics assessed by the exam, it will not teach the topics.
Prerequisites:
- Beginner-level understanding of Spark’s Architecture
- Intermediate experience with the Spark DataFrame API in Python or Scala
Apache Spark™ Programming
Role: Data Engineer, ML Engineer
Duration: Full Day
Labs: Yes
Learn the fundamentals of Spark programming in a case study-driven course that explores the core components of the DataFrame API. During this Python/Scala based course you will:
- Read and write data to various sources
- Preprocess data by correcting schemas and parsing different data types
- Apply multiple DataFrame transformations and actions to answer business questions.
Once completed, this course will give students the essential concepts and skills needed to navigate the Spark documentation and start programming immediately.
Prerequisites:
- No experience with Apache Spark is required
- Basic familiarity programming in Python or Scala
Scaling Machine Learning Pipelines with Apache Spark
Role: ML Engineer, Data Scientist
Duration: Half Day
Labs: Yes
Integrate machine learning solutions with scalable production pipelines backed by Apache Spark through:
- Investigating common inefficiencies in machine learning
- Scaling development and tuning of machine learning models using Spark MLlib and Hyperopt
- Parallelize model training & inference with Pandas UDFs and the Pandas Function APIs
Prerequisites:
- Beginning experience with the PySpark DataFrame API
- Intermediate experience with Python
- Working knowledge of machine learning and data science (and have built models before)
Machine Learning in Production: MLflow and Model Deployment
Role: ML Engineers
Duration: Full Day
Labs: Yes
Gain best practices for managing experiments, projects, models, and serving using MLflow. Students will build a pipeline to both log and deploy machine learning models. By the end of this course, data scientists and engineers will also be familiar with common production issues and how to monitor models once in production.
Prerequisites:
- Experience programming in Python and pandas
- Working knowledge of ML concepts
- Familiarity with Apache Spark
- Basic understanding of object storage, databases and networking
Performance Tuning on Apache Spark
Role: Data Engineer, ML Engineer, Data Scientist
Duration: Full Day
Labs: Yes
Complete guided challenges as you learn to diagnose and fix poorly performing queries. Using Python/Scala, participants will review performance problems to uncover solutions and best practices to be applied to your queries.
Prerequisites:
- 6+ months experience working with the Spark DataFrame API is recommended
- Intermediate programming experience in Python or Scala
Databricks Platform Administration
Role: Platform Engineer
Duration: Half Day
Labs: No
Learn the fundamentals of Databricks administration and best practices for management and security. This course will guide you through essential concepts and configurations to consider, including:
- Platform architecture and deployment guidelines for network security
- Workspace provisioning and identity management
- Methods to access and protect data in Databricks
- Cluster provisioning strategies and cost management
Prerequisites:
- No programming experience is required
Lakehouse with Delta Lake Deep Dive – Complimentary
Role: Technical
Duration: Half-day
Labs: None
Benefit from an overview of data architecture concepts, introduction to the Lakehouse paradigm, and an in-depth look at Delta Lake features and functionality. You will have the opportunity to:
- View a demonstration of building end-to-end OLAP data pipelines with Delta Lake for batch and streaming data
- Participate in conversations on serving data to end-users through aggregate tables and Databricks SQL Analytics.
By the end of the course, students will understand how to apply data engineering best practices within Databricks.
Prerequisites:
- Familiarity with data engineering concepts
- Basic knowledge of Delta Lake core features and use cases