Data Warehouse vs Data Lake: The Case for Lakehouse

You've heard the terms. Data warehouse. Data lake. Data lakehouse. Modern data stack. Data mesh. Each vendor has a different opinion, and each opinion comes with a sales pitch.

Meanwhile, you just want to answer basic questions about your business without exporting to Excel.

Here's our take: traditional data warehouses have had their day. They're rigid, expensive, and lock you into vendor-specific ecosystems. The smarter path for most businesses is a data lakehouse—a lake foundation with a serverless warehouse layer on top for BI and analytics.

Here's a practical breakdown of what these terms mean, why we've landed on this position, and how to avoid expensive mistakes.

The Simple Explanation

Let's cut through the jargon.

Data Warehouse

A data warehouse is a database optimised for analysis and reporting. It takes data from your operational systems—ERP, CRM, ecommerce—transforms it into a structure that makes sense for business questions, and serves it to BI tools and analysts.

Key characteristics:

Structured data with defined schemas
Transformed and cleaned before loading
Optimised for queries not transactions
Business logic built in (what counts as revenue, how to calculate margin)

The problems:

Rigid schemas mean changes are painful and slow
Proprietary formats lock you into a vendor
Always-on compute means you pay whether you're querying or not
Expensive at scale as data volumes grow, costs compound

Think of it as a well-organised library with strict rules. Everything has a place, but reorganising the shelves requires a renovation project—and the rent is high whether anyone's reading or not.

Data Lake

A data lake stores raw data in its original format. Everything gets dumped in—structured tables, unstructured files, logs, documents, images—and you figure out how to use it later.

Key characteristics:

Raw data in original formats
Schema on read (structure applied when querying, not when storing)
Handles any data type including unstructured
Cheaper storage for large volumes

Think of it as a storage unit. Everything goes in, nothing is organised, and you need to know what you're looking for to find anything useful.

Data Lakehouse

A lakehouse combines both approaches. Raw data storage with warehouse-like query performance and governance. You get the flexibility of a lake with the usability of a warehouse.

Key characteristics:

Raw and transformed data in one platform
Open formats (Parquet, Iceberg, Delta) that multiple tools can access
Separation of storage and compute so you control costs
Serverless query layers (Athena, Trino, Databricks SQL, Microsoft Fabric) for BI tools
No vendor lock-in because your data stays in open formats you own

Why this matters:

Pay for what you use with serverless compute instead of always-on warehouses
Evolve without migration because open formats work with any tool
Scale cost-effectively as cheap object storage (S3, GCS, Azure Blob) handles growth
Future-proof your architecture for AI/ML workloads that need raw data access

Think of it as owning your warehouse instead of renting it. Your data lives in open formats you control, and you plug in whatever query engine makes sense—swapping tools without moving data.

What Most Australian Businesses Actually Need

Here's the truth: most mid-sized Australian businesses have been sold data warehouses they didn't need—or needed, but paid too much for.

Your problem is that your ERP has all the answers but you can't get them out in a useful format. Traditional warehouses solve this, but at a cost: rigid schemas, vendor lock-in, and cloud bills that grow faster than your data.

There's a better way.

The lakehouse approach: lake foundation, serverless warehouse layer

Instead of paying for an always-on warehouse, build on a lake with a serverless query layer for BI:

Store data in open formats (Parquet, Iceberg, Delta) on cheap object storage
Transform with modern tools like dbt, applying your business logic
Query with serverless engines (Microsoft Fabric, Athena, Trino, Databricks SQL Serverless) that scale with demand
Connect your BI tools (Power BI, Tableau, Metabase) to the serverless layer

This gives you everything a warehouse does—clean, modelled data for business users—without the rigidity and runaway costs.

Why this works for mid-market:

You're not locked into traditional warehouse pricing models
You pay for queries you run, not compute sitting idle overnight
You can swap query engines or BI tools without migrating data
You're ready for AI/ML workloads when (if) you need them
Your data stays in formats you control, not proprietary vendor formats

This describes what most wholesalers, distributors, manufacturers, and professional services firms in Australia actually need. You don't have a big data problem—you have an accessible data problem. And you shouldn't have to pay warehouse premiums to solve it.

If you've already tried a traditional warehouse that didn't deliver, there are specific reasons why that happens—and a lakehouse approach often fixes them.

When a traditional warehouse still makes sense

To be fair, traditional warehouses aren't always wrong:

Very small data volumes (under 50GB) where simplicity trumps everything
Teams with zero technical capability who need a fully managed, opinionated solution
Existing warehouse investments where migration cost exceeds the savings
Tight timelines where a proven warehouse pattern ships faster than designing a lakehouse

But even in these cases, consider whether you're trading short-term convenience for long-term lock-in and cost.

When a raw lake makes sense

You might need a pure data lake (without the warehouse layer) if:

You're primarily doing machine learning on unstructured data
You have data science teams who need raw data access, not BI dashboards
Compliance requires retaining raw data exactly as received
You're in media, IoT, or adtech with extreme data volumes

Most mid-market businesses don't fit this profile. But the beauty of a lakehouse is that you get both—the lake for raw storage and flexibility, the warehouse layer for business users.

The Expensive Mistakes

Mistake 1: Buying a traditional warehouse without understanding the cost model

A vendor sells you a traditional warehouse. The demo looks great. The initial pricing seems reasonable.

Twelve months later, your cloud bill has tripled. Compute costs creep up as usage grows. You're paying for warehouses running overnight when nobody's querying. And migrating away would mean rewriting everything.

The fix: Understand what you're signing up for. Traditional warehouses charge for always-on compute. A lakehouse with serverless queries charges for what you actually use. For most mid-market workloads, the lakehouse model is significantly cheaper.

Mistake 2: Building a lake without the warehouse layer

This is the opposite mistake. A vendor pitches a "modern" data lake on AWS or Azure. You dump data into S3 buckets. It sounds impressive.

Six months later, you have raw data and no way for business users to access it. The data science team you don't have can't build the models you don't need. And you still can't answer basic questions about customer profitability.

The fix: A lake alone isn't a solution. You need the warehouse layer on top—transformation, business logic, data modelling, and a serverless query engine that BI tools can connect to. Lake + serverless warehouse layer = lakehouse that actually works.

Mistake 3: Treating a lakehouse like a lake

You bought Databricks or set up Delta Lake. The platform can do lakehouse things. So you dump raw data in and assume you're done.

But a lakehouse only works like a warehouse if you build the warehouse semantics on top. That means transformation, business logic, data modelling, governance. Without that layer, you have an expensive lake with a fancy name.

The fix: A lakehouse is infrastructure, not a solution. Budget for the transformation layer—built with tools like dbt—that turns raw data into business-ready datasets. That's where the value lives.

Mistake 4: Locking yourself into proprietary formats

You choose a warehouse and load all your data in proprietary formats. Two years later, you want to try a different query engine, add a machine learning workload, or switch BI tools. But your data is locked in.

Migration becomes a project. Costs become leverage. You're stuck.

The fix: Store data in open formats from day one. Parquet, Iceberg, and Delta Lake are open standards that work with multiple query engines. Your data should outlive your current tooling choices.

Mistake 5: Ignoring the transformation layer

Whether you choose warehouse, lake, or lakehouse, raw data isn't useful. It needs transformation: cleaning, joining, calculating, structuring.

Many implementations skip this—or do it badly—and end up with a data platform that doesn't work. The platform is fine; the implementation is the problem.

The fix: Budget for transformation as much as (or more than) storage. The value is in the modelling, not the infrastructure.

The Modern Data Stack for Australian Mid-Market

If you're a wholesaler, distributor, manufacturer, or service business in Australia doing $10M-$200M in revenue, here's what a sensible lakehouse architecture looks like:

Data sources

ERP (NetSuite, MYOB, SAP, Pronto, DEAR, Cin7)
CRM (Salesforce, HubSpot, Zoho)
Ecommerce (Shopify, WooCommerce, your B2B portal)
Other operational systems (WMS, 3PL, banking)

Extraction

Tools that pull data from your sources on a schedule. Fivetran, Airbyte, Stitch, or custom integrations. The goal is reliable, automated data movement—no manual exports.

Lake storage (the foundation)

Data lands in cloud object storage—S3, Google Cloud Storage, Azure Blob, or Microsoft Fabric's OneLake—in open formats like Parquet. This is cheap, durable, and vendor-neutral. Table formats like Apache Iceberg or Delta Lake add structure, versioning, and ACID transactions.

This is your data foundation. It belongs to you, not a vendor.

Transformation layer

The transformation layer—built with dbt or similar tools—applies your business logic on top of the lake. What counts as revenue. How to calculate COGS. Which customers are active. This creates the "warehouse semantics" that business users need.

This is where a data platform starts to actually work. The transformation layer is the real investment—not the storage.

Serverless query layer

Instead of paying for an always-on warehouse, use a serverless query engine:

Microsoft Fabric if you're a Microsoft shop (more on this below)
AWS Athena for straightforward S3 querying
Trino/Starburst for more complex federated queries
Databricks SQL Serverless if you're in the Databricks ecosystem

You pay for what you use, not for idle compute. For most mid-market workloads, this is dramatically cheaper than traditional warehouse pricing.

A note on Microsoft Fabric

If your organisation runs on Microsoft—Office 365, Azure, Power BI—Fabric deserves serious consideration.

Fabric is Microsoft's unified analytics platform, and it's genuinely lakehouse-native. Under the hood, it's built on OneLake, which stores everything in Delta Lake format (open Parquet files). You get:

One platform for data engineering, data science, and BI
Native Power BI integration so your existing reports and dashboards just work
Serverless SQL endpoints for querying transformed data without provisioning warehouses
Open formats by default so you're not locked in (your data is still Delta/Parquet on Azure storage)
Capacity-based pricing that can work out cheaper than per-query models at scale

For Microsoft-centric organisations, Fabric removes a lot of integration complexity. Your lake, transformation layer, and BI tool all live in one ecosystem—without sacrificing the open format benefits of a lakehouse approach.

The caveat: Fabric is relatively new, and pricing can be complex. Make sure you model your expected workloads before committing. But for teams already invested in Power BI and Azure, it's often the fastest path to a working lakehouse.

BI and visualisation

The tools your team actually uses. Power BI, Tableau, Looker, Metabase. Connected to your serverless query layer, reading from clean, transformed, fast data.

The key principles

Separate concerns. Extraction, storage, transformation, querying, and visualisation are different problems. Don't conflate them.

Own your data. Open formats mean you're not locked in. You can swap query engines, BI tools, or transformation frameworks without migrating data.

Pay for what you use. Serverless compute means you're not paying for idle warehouses. Cheap object storage means growth doesn't break the budget.

How to Choose: A Decision Framework

Our default recommendation is lakehouse with a serverless query layer. But here's how to think through the decision:

1. What are your data sources?

Mostly transactional systems → Lakehouse (you still get warehouse semantics, but with flexibility)
Mix of structured and unstructured → Lakehouse (built for this)
Primarily logs, documents, sensor data → Lake (you may not need the warehouse layer)

2. Who are your users?

Business users who need dashboards → Lakehouse with serverless SQL layer
Analysts who write SQL → Lakehouse (they'll appreciate the flexibility)
Data scientists who need raw data → Lakehouse (they get lake access AND can query transformed data)

3. What's your data volume?

Gigabytes → Lakehouse is still the right call (costs are negligible at this scale)
Terabytes → Lakehouse (this is where cost savings really show)
Petabytes → Lakehouse (traditional warehouses become prohibitively expensive)

4. What's your budget?

Lower budget → Lakehouse with serverless (pay-per-query beats always-on compute)
Higher budget → Lakehouse (invest in the transformation layer, not warehouse licensing)

5. What's your team capability?

No dedicated data team → Consider Microsoft Fabric or other managed lakehouse services
Small data team → Lakehouse (modern tooling like dbt makes it manageable)
Mature data team → Lakehouse (they'll thank you for the flexibility)

6. What's your existing ecosystem?

Heavy Microsoft (Office 365, Azure, Power BI) → Microsoft Fabric is the natural fit
AWS-native → Athena + S3 + dbt is a proven pattern
Multi-cloud or cloud-agnostic → Databricks or Trino/Starburst for portability

When to consider a traditional warehouse anyway

You have an existing warehouse that's working well and migration isn't worth it
You need to ship in weeks, not months, and a managed warehouse gets you there faster
Your data volumes are tiny (under 50GB) and simplicity genuinely matters more than cost

For most Australian mid-market businesses, a lakehouse with a serverless query layer delivers what you actually need: clean, transformed, accessible data for business users—without the rigidity and cost of traditional warehouses.

Getting Started

If you're evaluating data architecture:

Start with the problem, not the technology. What questions can't you answer today? What reports take too long? Where do numbers not match? That's what you're solving. The lakehouse approach solves these problems—just don't let infrastructure distract from the actual goal.

Question the warehouse sales pitch. When a vendor quotes you for a traditional warehouse, ask about the long-term compute costs. Ask what happens when your data volume doubles. Ask how you'd migrate if you needed to. The answers are often uncomfortable.

Plan for the transformation layer. The infrastructure is 30% of the work. The business logic, data modelling, and transformation is 70%. This is where real value gets created—don't underspend here to afford warehouse licensing.

Start with open formats. Even if you're not ready for a full lakehouse, store your raw data in Parquet on object storage. You're building optionality for the future.

Find the right partner. A lakehouse requires more architectural thinking than a plug-and-play warehouse. Work with people who've done it before and understand the trade-offs.

The goal isn't the most modern architecture for its own sake. The goal is reliable access to trusted data that helps your business make better decisions—without paying warehouse premiums or getting locked into rigid systems.

A lakehouse with a serverless query layer gives you that. Open formats, cost control, and the flexibility to evolve. It's the architecture we recommend for most mid-market businesses.

Want to explore a lakehouse approach for your business? Book a call with our team. We'll assess your current state, understand your questions, and design an architecture that gives you flexibility and cost control—not vendor lock-in.

Data Warehouse vs Data Lake: The Case for Lakehouse

TAGS

SHARE

Travis Sansome

Related Articles

Your Data Warehouse Isn't Working (Here's Why)

The Real Cost of Bad Data

Self-Service BI: Why It Never Works (Until It Does)

Let's Work Together