Skills at a glance
Set up and configure an Azure Databricks environment (15–20%)
Secure and govern Unity Catalog objects (15–20%)
Prepare and process data (30–35%)
Deploy and maintain data pipelines and workloads (30–35%)
1. Set up and configure an Azure Databricks environment (15–20%)
Select and configure compute in a workspace
Choose an appropriate compute type, including job compute, serverless, warehouse, classic compute, and shared compute
Configure compute performance settings, including CPU, node count, autoscaling, termination, node type, cluster size, and pooling
Configure compute feature settings, including Photon acceleration, Azure Databricks runtime/Spark version, and machine learning
Install libraries for a compute resource
Configure access permissions to a compute resource
Create and organize objects in Unity Catalog
Apply naming conventions based on requirements
Create a catalog based on requirements
Create a schema based on requirements
Create volumes based on requirements
Create tables, views, and materialized views
Implement a foreign catalog by configuring connections
Implement DDL operations on managed and external tables
Configure AI/BI Genie instructions for data discovery
2. Secure and govern Unity Catalog objects (15–20%)
Grant privileges to a principal for securable objects in Unity Catalog
Implement table- and column-level access control and row-level security
Access Azure Key Vault secrets from within Azure Databricks
Authenticate data access by using service principals
Authenticate resource access by using managed identities
Create, implement, and preserve table and column definitions and descriptions
Configure attribute-based access control (ABAC) by using tags and policies
Configure row filters and column masks
Apply data retention policies
Set up and manage data lineage tracking by using Catalog Explorer
Configure audit logging
Design and implement a secure strategy for Delta Sharing
3. Prepare and process data (30–35%)
Design logic for data ingestion and data source configuration
Choose an appropriate data ingestion tool
Choose a data loading method, including batch and streaming
Choose a data table format, such as Parquet, Delta, CSV, JSON, or Iceberg
Design and implement a data partitioning scheme
Choose a slowly changing dimension (SCD) type
Choose granularity on a column or table based on requirements
Design and implement a temporal (history) table
Design and implement a clustering strategy
Choose between managed and unmanaged tables
Ingest data by using Lakeflow Connect
Ingest data by using notebooks
Ingest data by using SQL methods
Ingest data by using a change data capture (CDC) feed
Ingest data by using Spark Structured Streaming
Ingest streaming data from Azure Event Hubs
Ingest data by using Lakeflow Spark Declarative Pipelines
Profile data to generate summary statistics
Choose appropriate column data types
Identify and resolve duplicate, missing, and null values
Transform data including filtering, grouping, and aggregating
Transform data using join, union, intersect, and except
Transform data by denormalizing, pivoting, and unpivoting
Load data using merge, insert, and append operations
Implement validation checks
Implement data type checks
Implement schema enforcement and manage schema drift
Manage data quality with pipeline expectations
4. Deploy and maintain data pipelines and workloads (30–35%)
Design order of operations for a data pipeline
Choose between notebook and Lakeflow Spark Declarative Pipelines
Design task logic for Lakeflow Jobs
Design and implement error handling
Create a data pipeline by using a notebook
Create a data pipeline by using Lakeflow Spark Declarative Pipelines
Create and configure a job
Configure job triggers
Schedule a job
Configure alerts for a job
Configure automatic restarts
Apply version control best practices using Git
Manage branching, pull requests, and conflict resolution
Implement a testing strategy
Configure and package Databricks Asset Bundles
Deploy a bundle using CLI
Deploy a bundle using REST APIs
Monitor and manage cluster consumption
Troubleshoot and repair issues in Lakeflow Jobs
Troubleshoot and repair issues in Apache Spark jobs and notebooks
Investigate and resolve caching, skewing, spilling, and shuffle issues
Optimize Delta tables using OPTIMIZE and VACUUM
Implement log streaming using Azure Monitor
Configure alerts using Azure Monitor