🚀 Introduction
Traditionally, organizations used:
- Data lakes → Scalable but unstructured
- Data warehouses → Structured but rigid
Microsoft Fabric introduces Lakehouse Architecture, combining both in one system.
🧠 What is Lakehouse in Fabric?
A Lakehouse:
- Stores all data in one place
- Supports analytics + AI + streaming
- Provides reliability through Delta Lake
🧱 Key Layers in Fabric Lakehouse
✅ Storage Layer – OneLake
- Unified data lake
- Stores all data types
- Eliminates duplication
✅ Delta Layer
- Provides ACID transactions
- Ensures data reliability
- Supports time travel
✅ Compute Layer
- Spark engines for large-scale processing
- SQL engines for analytics
✅ Consumption Layer
- Power BI dashboards
- AI models
- SQL queries
🔄 Unified Approach
Unlike traditional systems:
- No data copying
- No separate tools
- No siloed pipelines
👉 Everything runs on a single dataset.
🎯 Conclusion
The Fabric Lakehouse: 👉 Eliminates silos
👉 Reduces cost
👉 Enables real-time analytics
✅ Making it ideal for modern AI-driven systems
Starter
code – build a Lakehouse table and transform it
PySpark
– load a CSV and create a Delta table
df = spark.read.format('csv').option('header',
'true').load('Files/sales.csv')
df.write.format('delta').mode('overwrite').save('Tables/sales')
PySpark
– filter and write a refined table
from pyspark.sql.functions import col
df = spark.read.format('delta').load('Tables/sales')
df_filtered = df.filter(col('amount') > 100)
df_filtered.write.format('delta').mode('overwrite').saveAsTable('sales_filtered')
SQL
– aggregate business metrics
SELECT SUM(amount) AS total_sales
FROM sales_filtered;
Lakehouse Code
# Create Delta table
df =
spark.read.format("csv").option("header","true").load("Files/sales.csv")
df.write.format("delta").save("Tables/sales")
# Transform data
from pyspark.sql.functions import col
df = spark.read.format("delta").load("Tables/sales")
df_filtered =
df.filter(col("amount") > 100)
df_filtered.write.saveAsTable("sales_filtered")
# Load CSV
df = spark.read.format("csv").option("header", "true").load("/mnt/raw/sales.csv")
# Save as Delta
df.write.format("delta").save("/mnt/delta/sales")
# Transform
from pyspark.sql.functions import col
df_filtered =
df.filter(col("amount") > 100)
df_filtered.write.save("/mnt/delta/sales_filtered")
# SQL
SELECT SUM(amount) FROM
delta.`/mnt/delta/sales`;
No comments:
Post a Comment