Row Tracking in Delta Lake, Tested from Scratch

I recently contributed to delta-kernel-rs, a Rust library that implements the Delta Lake protocol for connectors. The contribution was a set of integration tests for a protocol feature called row tracking. Before jumping into how the tests work, I want to spend some time explaining what row tracking solves and how it fits into the Delta protocol, because the design is genuinely elegant once you see the full picture.

Delta Tables Are Directories, Not Files

A Delta table is not a single file you open and read. It is a directory of Parquet data files governed by a transaction log — the _delta_log/ folder. Every time data is committed, Delta writes new Parquet files and appends a JSON entry to the log declaring which files were added (and possibly which were removed).

This architecture gives you transactional guarantees. Readers always see a consistent snapshot. Writers never corrupt existing data because they never modify files in place — they only append new ones. The log is the source of truth, and the Parquet files are the payload it points to.

But this design also means that the physical files underneath a table are not stable. A compaction job might rewrite three small files into one large file. An update might logically remove an old file and add a new one containing modified rows. The files are transient artifacts; the table’s identity lives in the log.

This raises a fundamental question: if files come and go, how do you refer to a specific row?

The Problem of Row Identity

In a traditional row-oriented database, each row has a physical address — a page number and a slot offset. You can point at a row unambiguously. In a Delta table, you cannot. The Parquet file that held your row might have been rewritten during compaction. The row still exists logically, but its physical container changed.

This matters most during operations like merge. When you say “update every row where id = 42,” the engine needs to locate those rows, modify them, and write them back. Without stable identifiers, the engine must re-derive everything from the raw data each time. With stable identifiers, it can track which rows were touched and which weren’t, enabling optimizations like incremental merge, targeted change data feed, and efficient row-level lineage.

Row tracking addresses this by assigning every row a permanent, globally unique integer ID at write time. That ID follows the row through compaction, checkpointing, and log replay. Once assigned, it never changes.

The Mechanics of Row Tracking

Row tracking operates at the file level, not the row level. When a commit adds Parquet files to the table, each file receives a starting integer called baseRowId. The rows inside that file are implicitly numbered sequentially from the base. A file with baseRowId = 300 and 100 rows contains rows 300 through 399. The per-row ID is never stored explicitly in the Parquet data — it is derived at read time from the file’s baseRowId and each row’s position within the file.

Coordination between commits is handled by a single counter called rowIdHighWaterMark, stored as domain metadata in the commit log. It records the highest row ID assigned so far. When a new commit arrives, it reads the current watermark, assigns contiguous baseRowId ranges to its files starting from watermark + 1, and writes an updated watermark back to the log.

There is a third field: defaultRowCommitVersion. It tags each file with the version of the commit that created it. This is not about identity but about lineage — it records when these rows entered the table, which is useful for change tracking and time travel queries.

These three values — baseRowId, rowIdHighWaterMark, and defaultRowCommitVersion — live in the commit JSON alongside the familiar add and remove actions. No changes to the Parquet files themselves, no additional storage overhead per row. The entire mechanism is metadata-only.

Walking Through Two Commits

Consider an empty table with row tracking enabled. The watermark starts at -1.

Commit 1 adds two Parquet files. The first contains 3 rows, the second contains 3 rows. The engine reads the watermark (-1), assigns baseRowId = 0 to the first file (rows 0, 1, 2) and baseRowId = 3 to the second file (rows 3, 4, 5), then writes the updated watermark as 5.

Commit 2 adds one file with 2 rows. It reads the watermark (5), assigns baseRowId = 6 (rows 6, 7), and writes the watermark as 7.

The result is a clean, gapless sequence of IDs across both commits. Each file owns a contiguous range, and the ranges never overlap. If commit 1’s files are later compacted into a single file, the compacted file carries forward the same base IDs so the row-level identifiers remain stable.

This monotonic, counter-based design is what makes row tracking both simple and reliable. There is no complex coordination protocol, no distributed ID generation. A single watermark, read-then-increment, is all it takes.

Row ID vs. Row Index

Delta kernel exposes two metadata columns that are related but distinct.

Row index is file-local. It counts rows within a single Parquet file starting from 0. Every file’s row index starts at 0 regardless of where that file sits in the table.

Row ID is table-global. It equals baseRowId + row_index. Because each file has a unique baseRowId and row index counts from 0 within the file, this formula naturally produces globally unique, non-overlapping IDs without any per-row storage.

When you request both columns in a scan, the distinction becomes concrete: row index resets for every file boundary, while row ID climbs monotonically across the entire table.

Opting In at the Protocol Level

Row tracking is not enabled by default. It is a protocol-level table feature declared in writerFeatures at table creation time. Once declared, the Delta protocol does not allow removing it — features are append-only.

Enabling row tracking also requires the domainMetadata writer feature, because the watermark is stored as a domain metadata action under the delta.rowTracking domain. Domain metadata is Delta’s namespacing mechanism for feature-specific state in the commit log, preventing different features from colliding with each other.

In practice, creating a row-tracking table means declaring two writer features (rowTracking and domainMetadata) and setting configuration entries like delta.enableRowTracking and delta.rowTracking.materializedRowIdColumnName. Get any of these wrong and the feature is silently inactive — which, as it turned out, was exactly the bug I found in the shared test utilities.

Durability Across Checkpoints and Compaction

Delta’s transaction log can grow large over time. To keep reads fast, Delta periodically writes a checkpoint — a Parquet file that captures the full table state at a given version. After a checkpoint exists, readers can load it directly instead of replaying every JSON commit from the beginning.

For row tracking, checkpoints pose a specific requirement: the baseRowId on every add action and the rowIdHighWaterMark in domain metadata must be faithfully preserved in the checkpoint. If they aren’t, a reader that loads from the checkpoint would reconstruct different row IDs than a reader who replayed the full log, violating the core guarantee of stable identifiers.

The same principle applies to log compaction, a mechanism that squashes multiple commit files into a single compacted JSON file. The compacted file must carry forward all row tracking metadata exactly as it appeared in the original commits. Any loss or corruption during compaction would cause ID drift.

These durability requirements are not optional niceties — they are protocol invariants. A correct implementation must preserve row tracking state through every log lifecycle event: normal commits, checkpointing, compaction, and snapshot loading.

Testing the Read Path

With the concepts in place, here is where my contribution fits in. The test file kernel/tests/row_tracking.rs already had write-path tests that inspect raw commit JSON to verify baseRowId, defaultRowCommitVersion, and rowIdHighWaterMark are written correctly. My work adds six integration tests for the read path — verifying that when you scan a table with MetadataColumnSpec::RowId in the schema, the engine returns the correct row IDs to the caller.

Two helpers keep the tests focused on scenarios rather than scan mechanics. read_row_id_scan builds a scan that appends a row_id metadata column to the snapshot’s schema and executes it:

fn read_row_id_scan(
    snapshot: Arc<Snapshot>,
    engine: Arc<dyn Engine>,
) -> DeltaResult<Vec<RecordBatch>> {
    let scan_schema = Arc::new(
        snapshot
            .schema()
            .add_metadata_column("row_id", MetadataColumnSpec::RowId)?,
    );
    let scan = snapshot.scan_builder()
        .with_schema(scan_schema)
        .build()?;
    read_scan(&scan, engine)
}

collect_row_ids flattens the row ID column from all returned batches into a single vector:

fn collect_row_ids(batches: &[RecordBatch]) -> Vec<i64> {
    batches.iter().flat_map(|b| {
        b.column_by_name("row_id")
            .expect("row_id column not found")
            .as_primitive::<Int64Type>()
            .values()
            .to_vec()
    }).collect()
}

With these in hand, each test writes data, scans it back, and asserts on the collected IDs.

Basic Sequential IDs

The simplest scenario: write one file with three rows, read them back, verify IDs are [0, 1, 2].

let snapshot = Snapshot::builder_for(table_url.clone())
    .build(engine.as_ref())?;
let batches = read_row_id_scan(snapshot, engine)?;

let mut row_ids = collect_row_ids(&batches);
row_ids.sort_unstable();
assert_eq!(row_ids, vec![0, 1, 2]);

If this fails, nothing downstream will hold. It establishes that the scan engine correctly derives row IDs from baseRowId at all.

Non-Overlapping IDs Across Files

Two files written in a single commit — one with 3 rows, one with 4. The test asserts two properties: the global ID set covers 0..7 with no duplicates, and each file’s IDs form a contiguous block from its base:

assert_eq!(all_ids, (0i64..7).collect::<Vec<_>>());

for batch in &batches {
    let ids: Vec<i64> = /* extract row IDs from batch */;
    let min = *ids.iter().min().unwrap();
    let expected = (min..min + ids.len() as i64).collect::<Vec<_>>();
    assert_eq!(ids, expected, "IDs within a file must be contiguous");
}

This catches a class of bugs where the engine might assign overlapping baseRowId values to files within the same commit.

Global Uniqueness Across Commits

Two separate transactions: commit 1 writes 3 rows, commit 2 writes 2. The assertion is simple — all five IDs should be [0, 1, 2, 3, 4] with no gaps or duplicates. This validates that the watermark handoff between commits works correctly on the read side.

Surviving a Checkpoint

This test writes data, creates a checkpoint, loads a fresh snapshot from the checkpoint, and verifies the same row IDs come back. Then it writes more data and verifies the new IDs continue from the watermark:

snapshot.checkpoint(mt_engine.as_ref())?;

let fresh_snapshot = Snapshot::builder_for(table_url.clone())
    .build(mt_engine.as_ref())?;
let batches = read_row_id_scan(fresh_snapshot, mt_engine.clone())?;

let mut ids_after_ckpt = collect_row_ids(&batches);
ids_after_ckpt.sort_unstable();
assert_eq!(ids_after_ckpt, vec![0, 1, 2]);

// Write 2 more rows after checkpoint
// ...
assert_eq!(all_ids, vec![0, 1, 2, 3, 4]);

One implementation detail worth noting: this test runs on a multi-threaded Tokio runtime (#[tokio::test(flavor = "multi_thread")]). Delta’s checkpoint code internally nests blocking calls in a way that deadlocks on a single-threaded executor. The test uses TokioMultiThreadExecutor for the checkpoint step while keeping the standard executor for writes, matching the rest of the test suite.

Row ID and Row Index Together

Both metadata columns requested in a single scan. The test verifies their relationship directly: row index resets to 0 for each file, row ID equals baseRowId + row_index, and the combined set of row IDs is globally unique:

assert_eq!(
    row_indexes,
    (0..n).collect::<Vec<_>>(),
    "Row index must reset to 0 for each file"
);

let base = *row_ids.iter().min().unwrap();
assert_eq!(
    row_ids,
    (base..base + n).collect::<Vec<_>>(),
    "Row IDs within a file must equal baseRowId + row_index"
);

This is the test that makes the relationship between the two columns explicit and verifiable.

Log Compaction (Pending)

The final test verifies row ID preservation through log compaction. It is written and ready but marked #[ignore] because log compaction support in delta-kernel-rs is still in progress, tracked in issue #2337. When compaction lands, the test will be there to catch regressions from day one.

Refactoring the Existing Tests

Beyond the new read-path tests, I cleaned up the existing write-path tests. Seven of them created a table with a single number: INTEGER column by repeating the same schema construction inline. I extracted a setup_number_table helper that does this in one call, cutting around 60 lines of repeated setup and making each test’s intent visible at a glance.

I also fixed a pair of configuration keys in the shared test utility (test-utils/src/lib.rs). The keys were written as delta.materializedRowIdColumnName and delta.materializedRowCommitVersionColumnName, missing the rowTracking. namespace prefix that the protocol requires. Without the correct prefix, the configuration was silently ignored, meaning test tables weren’t actually configured the way the protocol specifies. A small diff, but the kind of thing that undermines every test that depends on it.

What I Took Away

What made this contribution valuable to me was not the Rust code itself — the tests are straightforward once you understand the protocol. What was valuable was that writing them forced me to internalize the protocol. You cannot assert on a rowIdHighWaterMark without understanding why it exists. You cannot test checkpoint survival without understanding what checkpoints are required to preserve.

Writing tests turned out to be one of the most effective ways to learn a system’s invariants from the inside. You start by reading the spec, but you finish by encoding it.

The full change is at delta-io/delta-kernel-rs#2316.