Databricks on Alibaba
Databricks DataInsight
Databricks DataInsight is a fully managed platform for data and analytics based on Apache SparkTM. DataInsight is built on the Databricks Runtime and Delta Lake. Integrated with Alibaba Cloud services, it ensures data security and allows you to configure monitoring and alert policies, as well as dynamic cluster scaling. It meets the analytics needs of data analysts, data engineers, and data scientists.
Better performance
Databricks Runtime provides a 50x improvement over open-source Apache SparkTM
Streaming & Batch integration
Databricks Delta Lake provides ACID transaction capabilities for data lake analytics, processing both batch and streaming datasets
Collaborative analysis
Databricks DataInsight meets the analytics needs of data scientists, data engineers and business analysts, and provides an interactive and collaborative Notebook environment
Real-time Data Insight
Separate computing and storage reduces data redundancy and enables data access for multiple audiences, reducing data storage costs, and providing independent scalability
A fully managed analytics platform
Quickly start up fully managed clusters with simple operation and pay for only what is used.
Cluster size
Set the number of nodes according to job needs, with high availability cluster support.
Instance selection
Supports three instance type families of ECS general type, computing type and memory type.
Interactive collaborative work
Multiple user roles share data and collaborate interactively.
Notebook
A collaborative work space that provides interactive job execution mode, supports Apache Spark, PySpark, Spark R and Spark SQL jobs, with visual display of analytics results.
Unified metadata
Meta-information of databases and tables can shared between clusters without duplication.
Fully compatible with Apache Spark ecosystem
100% compatible with open source Apache Spark.
Databricks Runtime
Performance optimized Databricks runtime based on Apache Spark. I/O optimized for Alibaba Cloud OSS, providing a faster and more efficient analytics engine.
Databricks Delta Lake
An optimized version of Delta Lake integrated with Alibaba Cloud Services.
Enterprise security
Integrated with Alibaba Cloud RAM to control permissions based on users and roles to ensure data security.
Big Data Analysis Engine That Unifies Batch and Stream Processing
Deeply integrated with Alibaba Cloud services and features, such as the data governance and data lineage of DataWorks and Machine Learning Platform for AI (PAI), to provide a more comprehensive data solution
Beijing Jizhi Technology Co., Ltd.
北京基智科技有限公司
Watch this 10-minute video to hear how Beijing Jizhi Technology uses Databricks DataInsight (DDI) for customer acquisition and management use case.
Databricks DataInsight typical architecture
Deeply integrate with Alibaba Cloud products to build a real-time/offline data warehouse.
Key Roles
- Data collection
Receive real-time streaming data and batch data on external cloud storage. - Data ETL
Continuously and efficiently process incremental data, support data rollback and deletion, and provide ACID transactional guarantee. - BI data analysis
Support Ad hoc queries, seamlessly integrated with a variety of BI analysis tools. - AI data exploration
Provide a complete machine learning platform.