LiquidCache logo

What is LiquidCache?

Distributed caching with efficient compute pushdown.

August 2025
InfluxData, SpiralDB, and Bauplan are independent sponsors and are unaffiliated with LiquidCache; logos shown for acknowledgment only.

Who are we?

Xiangpeng Hao
Xiangpeng Hao

LiquidCache at a Glance

As a cache service

  • All object stores (S3, GCS, Azure Blob, ...)
  • All all compute platforms compute (K8s, Lambda, ...)
  • All data services (anomaly detection, knowledge base, dashboards, ...)
  • Built on industry standards (Parquet, Arrow Flight, DataFusion)
LiquidCache overview
LiquidCache supports all storage, all compute, and all workloads.

LiquidCache at a Glance

As a pushdown engine

  • Filter and aggregation pushdown
  • Compressed execution
  • Network-efficient transfer
  • Efficient storage access
LiquidCache overview
LiquidCache is optimized for pushdown

Why LiquidCache?

We like S3

  1. Simple durability: 11 nines — you never worry about data loss
  2. Simple scalability: virtually unlimited space and throughput
AWS revenue

Why LiquidCache?

...but S3 is slow and expensive

  1. 100 ms first-byte latency + transfer latency; multiplies with round-trips
  2. Complex pricing model: storage, request, and egress costs
  3. Storage price unchanged for 10 years despite 20x cheaper hardware*
S3 price trend
S3 prices for the last 10 years(credit: Andrew Lamb)
*Meanwhile AWS revenue increased by 30x.

Why LiquidCache?

LiquidCache: foundation of diskless architectures

  1. Caches are everywhere: compute-local, shared-nothing, and cache services
  2. DLC trilemma: among Durability, low Latency, and low Cost, pick two
DLC trilemma

How LiquidCache Works

1

Pushdown

Execute filters and aggregations at cache

Send filtered data to compute

+
2

Liquid data

Cache-only format for efficient pushdown operations

First-class support for Parquet

LiquidCache - Pushdown

LiquidCache pushdown

User side change: add a DataFusion physical optimizer rule

LiquidCache - Liquid Data

Parquet is industry standard

  • Supported by all major engines
  • Battle-tested and evolving
  • Open governance under ASF
Apache Parquet logo

More performance needed

  • Better encodings exist
  • Parquet evolves cautiously — it cannot break your data
Vortex data
Vortex
Meta logo
Nimble
CWI logo
FastLanes
Yet another file format?
Yet another file format?
No!
Yet another file format?
No!
Cache-only, ephemeral format

LiquidCache - Liquid Data

  1. Best encodings from decades of research*
  2. Cache-only format: freely evolves without breaking user code
  3. Transparent, progressive, selective transcoding from Parquet
  4. Designed for efficient pushdown to save compute and network
Liquid data
*We use building blocks (e.g., FSST, FastLanes, ALP) from Vortex made by SpiralDB and CWI.

How much benefit?

Time breakdown of ClickBench Query 21's data scan.
- Arrow is the "decoded" Parquet.
- Theoretical shows the CPU time spent on filtering.

How much overhead?

ClickBench Query 20, x-axis shows the "storage" device on a cache miss, y-axis shows query latency.
- LiquidCache (blocking) disables async transcoding and blocks until transcoding is done.
- As long as IO time is higher than transcoding time (2.2s), the transcoding overhead can be overshadowed by IO.
- S3-far: cross continent S3 (Oregon to Euro-central)
- S3: near by S3 (Oregon to Utah)
- MinIO: same cluster

How to use LiquidCache?

Existing DataFusion
#[tokio::main]
pub async fn main() -> Result<()> {
    let ctx = SessionContext::new()?;

    ctx.register_table(table_name, ...)
        .await?;
    ctx.sql(&sql).await?.show().await?;
    Ok(())
}
With LiquidCache
#[tokio::main]
pub async fn main() -> Result<()> {
	  let cache_server = "http://localhost:8080";
    let ctx = LiquidCacheBuilder::new(cache_server)
        .with_object_store(...)
        .with_cache_mode(CacheMode::Liquid)
        .build(SessionConfig::from_env()?)?;

    ctx.register_table(table_name, ...)
        .await?;
    ctx.sql(&sql).await?.show().await?;
    Ok(())
}

Conclusions: What is LiquidCache?

1

Pushdown

- Execute filters and aggregations at cache
- Send filtered data to compute
LiquidCache overview
+
2

Liquid data

- Cache-only format for efficient pushdown
- First-class support for Parquet