Open source

Databricks engineers are the original creators of some of the world’s most popular open source data technologies

Join a meetup
background-image

Our most popular open source projects

icon-title
Apache SparkTM
Apache Spark is a unified engine for executing data engineering, data science and ML workloads.

What is Apache Spark? →

Comparing Spark and Databricks →

Visit spark.apache.org →

icon-title
Delta Lake
Delta Lake lets you build a lakehouse architecture on top of storage systems such as AWS S3, ADLS, GCS and HDFS.

Learn more about Delta Lake →

Visit delta.io →

Tech Talks: Getting Started with Delta Lake →

icon-title
MLflow
MLflow manages the ML lifecycle, including experimentation, reproducibility, deployment and a central model registry.

Managed MLflow on Databricks →

Visit mlflow.org →

Tech Talks: Managing the ML Lifecycle →

icon-title
Redash
Redash enables anyone to leverage SQL to explore, query, visualize, and share data from both big and small data sources.

Visit Redash on GitHub →

icon-title
Delta Sharing
Delta Sharing is the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations.

Visit Delta Sharing →

Databricks supports these additional popular open source technologies

icon-title
TensorFlow
Databricks supports TensorFlow, a library for deep learning and general computation on clusters

TensorFlow on Databricks →

icon-title
PyTorchTM
Facebook, the creator of PyTorch, and Databricks have collaborated on integrations

PyTorch on Databricks →

icon-title
KerasTM
Deep learning API written in Python, running on top of TensorFlow. Available in Databricks Runtime for ML.

Keras on Databricks →

icon-title
RStudio
An open source suite of tools for collaborative data science using R ​

R programming on big data →

icon-title
scikit-learn
Widely used Python package for machine learning built on top of NumPy, SciPy and Matplotlib​​

Scikit-learn on Databricks →

icon-title
XGBoost
A distributed gradient boosting library that has bindings in languages such as Python, R and C++

XGBoost on Databricks →

Ready to get started?