Increase your chances of passing the Databricks Databricks-Certified-Professional-Data-Engineer exam questions on your first try. Practice with our free online Databricks-Certified-Professional-Data-Engineer exam mock test designed to help you prepare effectively and confidently.
A junior data engineer seeks to leverage Delta Lake's Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in a bronze table created with the property delta.enableChangeDataFeed = true. They plan to execute the following code as a daily job:

A DLT pipeline includes the following streaming tables:
• raw_iot ingests raw device measurement data from a heart rate tracking device.
• bpm_stats incrementally computes user statistics based on BPM measurements from raw_iot.
How can the data engineer configure this pipeline to be able to retain manually deleted or updated records in the raw_iot table, while recomputing the downstream table bpm_stats table when a pipeline update is run?
Which of the following statements best describes the use of Python wheels in Databricks ?
A junior data engineer is using the following code to de-duplicate raw streaming data and insert them in a target Delta table
1. spark.readStream
2. .table("orders_raw")
3. .dropDuplicates(["order_id", "order_timestamp"])
4. .writeStream
5. .option("checkpointLocation", "dbfs:/checkpoints")
6. .table("orders_unique")
A senior data engineer pointed out that this approach is not enough for having distinct records in the target table when there are late-arriving, duplicate records.
Which of the following could explain the senior data engineer’s remark?
The data engineering team has a Silver table called ‘sales_cleaned’ where new sales data is appended in near real-time.
They want to create a new Gold-layer entity against the ‘sales_cleaned’ table to calculate the year-to-date (YTD) of the sales amount. The new entity will have the following schema:
country_code STRING, category STRING, ytd_total_sales FLOAT, updated TIMESTAMP
It’s enough for these metrics to be recalculated once daily. But since they will be queried very frequently by several business teams, the data engineering team wants to cut down the potential costs and latency associated with materializing the results.
Which of the following solutions meets these requirements?
© Copyrights FreeMockExams 2026. All Rights Reserved
We use cookies to ensure that we give you the best experience on our website (FreeMockExams). If you continue without changing your settings, we'll assume that you are happy to receive all cookies on the FreeMockExams.