HTML Dropdown

Tuesday, 9 June 2026

Medallion Architecture in Microsoft Fabric (Bronze, Silver, Gold)


🚀 Introduction

To ensure data quality and scalability, Microsoft Fabric uses the Medallion Architecture.

This organizes data into layers of increasing quality. 




🥉 Bronze Layer (Raw Data)

  • Stores raw, unprocessed data
  • No transformations
  • Acts as source of truth

🥈 Silver Layer (Cleaned Data)

  • Data is cleaned and validated
  • Removes duplicates
  • Standardizes formats

🥇 Gold Layer (Business Data)

  • Aggregated and optimized
  • Used for reporting and dashboards
  • Supports ML and analytics


🔄 Data Flow

Raw Data → Bronze → Silver → Gold → Analytics

✅ Benefits

  • Improved data quality
  • Easy debugging (trace to raw data)
  • Better performance
  • Reprocessing capability

🎯 Conclusion

Medallion architecture ensures: 👉 Reliable, scalable, and maintainable data pipelines


Starter code – Bronze → Silver → Gold in a Fabric notebook

Bronze – ingest raw JSON with metadata

from pyspark.sql.functions import current_timestamp

df_bronze = (spark.read.format('json')
    .load('Files/raw/*.json')
    .withColumn('ingestion_time', current_timestamp()))

df_bronze.write.format('delta').mode('append').save('Tables/bronze_sales')

Silver – clean and deduplicate

from pyspark.sql.functions import col

df_silver = (spark.read.format('delta').load('Tables/bronze_sales')
    .dropDuplicates(['transaction_id'])
    .filter(col('amount').isNotNull()))

df_silver.write.format('delta').mode('overwrite').save('Tables/silver_sales')

Gold – aggregate for analytics

df_gold = spark.sql("""
SELECT customer_id, SUM(amount) AS total_spend
FROM silver_sales
GROUP BY customer_id
""")

df_gold.write.format('delta').mode('overwrite').save('Tables/gold_sales')


: Medallion Architecture Code

 

# Bronze

from pyspark.sql.functions import current_timestamp

 df_bronze = spark.read.format("json").load("Files/raw/*.json")

df_bronze.write.format("delta").save("Tables/bronze")

 

# Silver

 df_silver = spark.read.format("delta").load("Tables/bronze").dropDuplicates()

df_silver.write.format("delta").save("Tables/silver")

 

# Gold

 df_gold = spark.sql("SELECT category, COUNT(*) FROM silver GROUP BY category")

df_gold.write.format("delta").save("Tables/gold")

No comments:

Post a Comment