Open source
Databricks engineers are the original creators of some of the world’s most popular open source data technologies
Join a meetupOur most popular open source projects
Apache SparkTM
Apache Spark is a unified engine for executing data engineering, data science and ML workloads.
Delta Lake
Delta Lake lets you build a lakehouse architecture on top of storage systems such as AWS S3, ADLS, GCS and HDFS.
MLflow
MLflow manages the ML lifecycle, including experimentation, reproducibility, deployment and a central model registry.
Redash
Redash enables anyone to leverage SQL to explore, query, visualize, and share data from both big and small data sources.
Delta Sharing
Delta Sharing is the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations.
Databricks supports these additional popular open source technologies
TensorFlow
Databricks supports TensorFlow, a library for deep learning and general computation on clusters
PyTorchTM
Facebook, the creator of PyTorch, and Databricks have collaborated on integrations

KerasTM
Deep learning API written in Python, running on top of TensorFlow. Available in Databricks Runtime for ML.
RStudio
An open source suite of tools for collaborative data science using R
scikit-learn
Widely used Python package for machine learning built on top of NumPy, SciPy and Matplotlib
XGBoost
A distributed gradient boosting library that has bindings in languages such as Python, R and C++
Ready to get started?

