What is LiquidCache?

Distributed caching with efficient compute pushdown.

Xiangpeng Hao

August 2025

InfluxData, SpiralDB, and Bauplan are independent sponsors and are unaffiliated with LiquidCache; logos shown for acknowledgment only.

Self link: https://what-is-liquid-cache.xiangpeng.systems

Who are we?

Research project lead by Xiangpeng Hao from UW-Madison ADSL
Paper co-authors: Andrew Lamb, Yibo Wu, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau
The project began with support from InfluxData, with subsequent support from SpiralDB and Bauplan
Public-benefit project in appreciation of taxpayers, gifts, and open-source communities

LiquidCache at a Glance

As a cache service

All object stores (S3, GCS, Azure Blob, ...)
All all compute platforms compute (K8s, Lambda, ...)
All data services (anomaly detection, knowledge base, dashboards, ...)
Built on industry standards (Parquet, Arrow Flight, DataFusion)

LiquidCache overview — LiquidCache supports all storage, all compute, and all workloads.

LiquidCache at a Glance

As a pushdown engine

Filter and aggregation pushdown
Compressed execution
Network-efficient transfer
Efficient storage access

Why LiquidCache?

We like S3

Simple durability: 11 nines — you never worry about data loss
Simple scalability: virtually unlimited space and throughput

Why LiquidCache?

...but S3 is slow and expensive

100 ms first-byte latency + transfer latency; multiplies with round-trips
Complex pricing model: storage, request, and egress costs
Storage price unchanged for 10 years despite 20x cheaper hardware*

S3 price trend — S3 prices for the last 10 years(credit: Andrew Lamb)

*Meanwhile AWS revenue increased by 30x.

Why LiquidCache?

LiquidCache: foundation of diskless architectures

Caches are everywhere: compute-local, shared-nothing, and cache services
DLC trilemma: among Durability, low Latency, and low Cost, pick two

How LiquidCache Works

1

Pushdown

Execute filters and aggregations at cache

Send filtered data to compute

+

2

Liquid data

Cache-only format for efficient pushdown operations

First-class support for Parquet

LiquidCache - Pushdown

User side change: add a DataFusion physical optimizer rule

LiquidCache - Liquid Data

Parquet is industry standard

Supported by all major engines
Battle-tested and evolving
Open governance under ASF

More performance needed

Better encodings exist
Parquet evolves cautiously — it cannot break your data

Vortex

Nimble

FastLanes

LiquidCache - Liquid Data

Best encodings from decades of research*
Cache-only format: freely evolves without breaking user code
Transparent, progressive, selective transcoding from Parquet
Designed for efficient pushdown to save compute and network

*We use building blocks (e.g., FSST, FastLanes, ALP) from Vortex made by SpiralDB and CWI.

How much benefit?

Time breakdown of ClickBench Query 21's data scan.

- Arrow is the "decoded" Parquet.

- Theoretical shows the CPU time spent on filtering.

How much overhead?

ClickBench Query 20, x-axis shows the "storage" device on a cache miss, y-axis shows query latency.

- LiquidCache (blocking) disables async transcoding and blocks until transcoding is done.

- As long as IO time is higher than transcoding time (2.2s), the transcoding overhead can be overshadowed by IO.

- S3-far: cross continent S3 (Oregon to Euro-central)

- S3: near by S3 (Oregon to Utah)

- MinIO: same cluster

How to use LiquidCache?

Existing DataFusion

#[tokio::main]
pub async fn main() -> Result<()> {
    let ctx = SessionContext::new()?;

    ctx.register_table(table_name, ...)
        .await?;
    ctx.sql(&sql).await?.show().await?;
    Ok(())
}

With LiquidCache

#[tokio::main]
pub async fn main() -> Result<()> {
	  let cache_server = "http://localhost:8080";
    let ctx = LiquidCacheBuilder::new(cache_server)
        .with_object_store(...)
        .with_cache_mode(CacheMode::Liquid)
        .build(SessionConfig::from_env()?)?;

    ctx.register_table(table_name, ...)
        .await?;
    ctx.sql(&sql).await?.show().await?;
    Ok(())
}

Conclusions: What is LiquidCache?

1

Pushdown

- Execute filters and aggregations at cache

- Send filtered data to compute

+

2

Liquid data

- Cache-only format for efficient pushdown

- First-class support for Parquet