<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joichiro Mitaka</title>
    <description>The latest articles on DEV Community by Joichiro Mitaka (@coldstorage).</description>
    <link>https://dev.clauneck.workers.dev/coldstorage</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3991890%2F8d80d521-aaef-4748-9485-eabc2f04b9ed.png</url>
      <title>DEV Community: Joichiro Mitaka</title>
      <link>https://dev.clauneck.workers.dev/coldstorage</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.clauneck.workers.dev/feed/coldstorage"/>
    <language>en</language>
    <item>
      <title>Using Zstd Frames to Egress Partial Parquet Files</title>
      <dc:creator>Joichiro Mitaka</dc:creator>
      <pubDate>Wed, 24 Jun 2026 15:26:22 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/coldstorage/using-zstd-frames-to-egress-partial-parquet-files-1fdd</link>
      <guid>https://dev.clauneck.workers.dev/coldstorage/using-zstd-frames-to-egress-partial-parquet-files-1fdd</guid>
      <description>&lt;h2&gt;
  
  
  Jump Tables, TLV Footers, and the Real Cost of Reading What You Don't Need
&lt;/h2&gt;

&lt;p&gt;You're paying for bytes you never read.&lt;/p&gt;

&lt;p&gt;A data engineer on a busy pipeline touches dozens of Parquet files a day: schema discovery, predicate pushdown, column pruning, metadata scrapes for a data catalog sync. In each case, the application needs maybe 200 KB of context from a file that is 4 GB on disk. Without a seekable archive format and a jump table to find the right frame, your HTTP client fetches the whole thing, and your cloud egress invoice reflects every unnecessary gigabyte.&lt;/p&gt;

&lt;p&gt;This post quantifies the problem, then walks through how &lt;a href="https://github.com/HuskHoard/HuskHoard" rel="noopener noreferrer"&gt;HuskHoard&lt;/a&gt; uses seekable Zstd frames, a per-volume jump table, and TLV-encoded footer metadata to make partial egress a first-class citizen across multi-volume archives — disk, cloud, and LTO tape alike.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem, In Dollars
&lt;/h2&gt;

&lt;p&gt;S3 standard egress runs $0.09/GB. GCS is $0.08/GB. Even Cloudflare R2, which is free for egress &lt;em&gt;from R2 to the internet&lt;/em&gt;, still costs you in latency and API call count when you cannot bound the range of bytes you need.&lt;/p&gt;

&lt;p&gt;Here is a representative read pattern for a cold analytics archive:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Bytes Needed&lt;/th&gt;
&lt;th&gt;Bytes Fetched (naïve)&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Schema discovery&lt;/td&gt;
&lt;td&gt;~50 KB (Parquet footer)&lt;/td&gt;
&lt;td&gt;1–8 GB (full file)&lt;/td&gt;
&lt;td&gt;~1:16,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single column scan&lt;/td&gt;
&lt;td&gt;~200 MB (one column chunk)&lt;/td&gt;
&lt;td&gt;4 GB (full row group)&lt;/td&gt;
&lt;td&gt;1:20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data catalog sync (1M files)&lt;/td&gt;
&lt;td&gt;~50 GB (footers only)&lt;/td&gt;
&lt;td&gt;~4 PB (full files)&lt;/td&gt;
&lt;td&gt;1:80,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Selective restore (1 row group)&lt;/td&gt;
&lt;td&gt;~128 MB&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;td&gt;1:32&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On 100 TB of cold Parquet data with $0.09/GB egress:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full read for schema sync&lt;/strong&gt;: 100 TB × $0.09 = &lt;strong&gt;$9,216&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partial read (footers only, avg 100 KB/file, 1M files)&lt;/strong&gt;: ~100 GB × $0.09 = &lt;strong&gt;$9.00&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings per catalog sync: $9,207 — 99.9% reduction&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even a conservative column-scan scenario (pulling 15% of each file's bytes) cuts a $9,216 monthly read bill to &lt;strong&gt;$1,382&lt;/strong&gt;. The ceiling on savings is determined entirely by how precisely you can address the bytes you actually need.&lt;/p&gt;

&lt;p&gt;That precision is what frames and jump tables buy you.&lt;/p&gt;




&lt;h2&gt;
  
  
  Zstd Frames: What They Are and Why They Matter
&lt;/h2&gt;

&lt;p&gt;A single &lt;code&gt;.zst&lt;/code&gt; file produced by the standard &lt;code&gt;zstd&lt;/code&gt; CLI is one frame. Everything inside is a single compressed stream. You have to start decompression at byte 0 to reach any byte inside.&lt;/p&gt;

&lt;p&gt;But the Zstd spec allows a concatenation of independent frames. Each frame is a complete, self-contained unit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Frame 0][Frame 1][Frame 2]...[Frame N]
 ^         ^         ^           ^
 16 MB     16 MB     16 MB       partial
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every frame has a known &lt;code&gt;compressed_size&lt;/code&gt; and &lt;code&gt;decompressed_size&lt;/code&gt;. If you know those sizes in advance (stored in a jump table), you can seek directly to Frame N by summing the compressed sizes of frames 0 through N-1. You never decompress anything you don't need. Frame N is fetched with a single HTTP Range request, decompressed independently, and the relevant bytes are piped downstream.&lt;/p&gt;

&lt;p&gt;This is the architectural core of HuskHoard's egress model, and it maps cleanly onto how the Parquet format itself carves up a file.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Parquet Parallel: Row Groups as Frames
&lt;/h2&gt;

&lt;p&gt;Parquet is deliberately designed for partial reads. A Parquet file contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row groups&lt;/strong&gt; — horizontal partitions of the data, each independently readable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column chunks&lt;/strong&gt; — vertical slices within a row group&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Page headers&lt;/strong&gt; — per-page metadata within each column chunk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Footer&lt;/strong&gt; — the &lt;code&gt;FileMetaData&lt;/code&gt; Thrift struct at the end of the file, containing the schema, row group offsets, column statistics, and key-value metadata. Preceded by a 4-byte footer length and terminated with the magic bytes &lt;code&gt;PAR1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A reader that wants only the footer performs two range requests: one to get the last 8 bytes (magic + footer length), and one to get the footer itself. Everything else stays on the remote. A reader that wants one column from one row group consults the footer to find the column chunk's byte offset and length, then fires a single range request.&lt;/p&gt;

&lt;p&gt;HuskHoard's frame model mirrors this exactly, but at the archive level rather than within a single Parquet file.&lt;/p&gt;




&lt;h2&gt;
  
  
  HuskHoard's Implementation: Frames, the Catalog, and the Jump Table
&lt;/h2&gt;

&lt;p&gt;When HuskHoard archives a file to any backend — a flat image file acting as a tape volume, a physical LTO cartridge, or an rclone cloud remote — it writes the payload as a sequence of 16 MB Zstd frames. For each frame, it records the mapping between uncompressed byte position and compressed byte position on the volume.&lt;/p&gt;

&lt;p&gt;That mapping is the &lt;code&gt;object_frames&lt;/code&gt; table in &lt;code&gt;husk_catalog.db&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;object_frames&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_path&lt;/span&gt;           &lt;span class="nb"&gt;TEXT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;version&lt;/span&gt;             &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;uncompressed_offset&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;-- where this frame starts in the original file&lt;/span&gt;
    &lt;span class="n"&gt;compressed_offset&lt;/span&gt;   &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;-- where this frame starts on the storage volume&lt;/span&gt;
    &lt;span class="n"&gt;compressed_size&lt;/span&gt;     &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;    &lt;span class="c1"&gt;-- how many bytes to fetch from the volume&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_frames&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;object_frames&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the jump table. Given a byte range request for &lt;code&gt;bytes=2147483648-2281701376&lt;/code&gt; (a 128 MB window starting at the 2 GB mark), the gateway does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;compressed_offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compressed_size&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;object_frames&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'/warehouse/events/2024-01-01.parquet'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="k"&gt;version&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;version&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;object_frames&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="n"&gt;uncompressed_offset&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;2147483648&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt;  &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;uncompressed_offset&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One row. One seek. One range request against the volume. Everything else stays dark.&lt;/p&gt;

&lt;p&gt;The HTTP gateway loop in StreamGate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP Range request arrives (bytes=X-Y)
        │
        ▼
Query object_frames → nearest frame boundary ≤ X
        │
        ▼
Seek to compressed_offset on volume (tape block, S3 range, local seek)
        │
        ▼
Decompress forward to exact byte X, stream through Y
        │
        ▼
Client receives exactly what it asked for
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a 4K video file seeking to the 2-hour mark, this is why &lt;code&gt;mpv&lt;/code&gt; can start playing from tape or S3 in under a second instead of waiting for a multi-gigabyte download.&lt;/p&gt;




&lt;h2&gt;
  
  
  TLV Footers: Turning the Frame Header Into a Parquet-Style Catalog Entry
&lt;/h2&gt;

&lt;p&gt;Every file archived by HuskHoard is preceded on the storage volume by a strict &lt;strong&gt;4,096-byte ObjectHeader&lt;/strong&gt;. The first 136 bytes carry the fixed-width mechanics: a magic string (&lt;code&gt;USTDHUSK&lt;/code&gt;), the file's UUID, POSIX permissions, BLAKE3 hash, compressed and uncompressed sizes, and a CRC32 of the header itself.&lt;/p&gt;

&lt;p&gt;The remaining &lt;strong&gt;3,960 bytes&lt;/strong&gt; are dedicated to &lt;strong&gt;TLV (Type-Length-Value)&lt;/strong&gt; encoded metadata — the same binary framing used in X.509 certificates, SNMP, and dozens of wire protocols chosen specifically because unknown type codes can be safely skipped by any forward-compatible parser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Byte  0 –  7: Magic "USTDHUSK"
Byte  8 – 23: Volume UUID (16 bytes)
Byte 24 – 55: BLAKE3 hash (32 bytes)
Byte 56 – 63: Uncompressed payload size
Byte 64 – 71: Compressed payload size
Byte 72 – 79: Original mtime
Byte 80 – 83: POSIX mode
Byte 84 – 87: Header CRC32
Byte 88 –135: File path (null-terminated, 48 bytes max inline)
Byte 136–4095: TLV region (3,960 bytes)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A TLV tag entry looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Type: u8][Key-Length: u16][Key: bytes][Value-Length: u32][Value: bytes]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the Parquet footer analogy becomes structural rather than metaphorical. For a Parquet file being archived, HuskHoard can embed the Parquet &lt;code&gt;FileMetaData&lt;/code&gt; statistics directly into this TLV region:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;TLV Type&lt;/th&gt;
&lt;th&gt;Key&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;parquet.schema&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Serialized Thrift schema (JSON or binary)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;parquet.row_count&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Total row count as little-endian u64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;parquet.col.event_ts.min&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Minimum value of &lt;code&gt;event_ts&lt;/code&gt; column&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;parquet.col.event_ts.max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Maximum value of &lt;code&gt;event_ts&lt;/code&gt; column&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;parquet.col.user_id.null_count&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Null count for &lt;code&gt;user_id&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;parquet.row_group.count&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Number of row groups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x01&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;workflow.pipeline&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;"ingest_v3"&lt;/code&gt; — POSIX xattr from source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;0x01&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;workflow.owner&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"data-eng-team"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These statistics travel &lt;strong&gt;physically bonded to the data&lt;/strong&gt; on every storage medium — disk image, tape cartridge, S3 object. If the SQLite catalog is lost, &lt;code&gt;husk rebuild&lt;/code&gt; walks the volume, reads every 4 KB header, and reconstructs the catalog complete with all column statistics. The tape is entirely self-describing.&lt;/p&gt;

&lt;p&gt;But the real payoff is what this enables while the catalog &lt;em&gt;is&lt;/em&gt; present.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Volume Catalog Queries: Pruning at the Volume Level
&lt;/h2&gt;

&lt;p&gt;A production HuskHoard deployment might span several volumes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Volume A (tape, 12 TB) — archive 2022–2023
Volume B (tape, 12 TB) — archive 2023–2024
Volume C (NVMe image, 2 TB) — archive 2024–present
Volume D (S3:us-east-1, 50 TB) — cloud replica
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;catalog&lt;/code&gt; table records which volume holds each archived version of each file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tape_uuid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;-- identifies the volume&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tape_offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;-- byte offset of the ObjectHeader on that volume&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compressed_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;custom_metadata&lt;/span&gt;     &lt;span class="c1"&gt;-- mirrors the TLV tags as JSON&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="k"&gt;catalog&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;json_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;custom_metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.parquet.col.event_ts.min'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="n"&gt;json_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;custom_metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.parquet.col.event_ts.max'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-03-31'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="n"&gt;json_extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;custom_metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$.parquet.row_count'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query executes in milliseconds against the SQLite catalog on your SSD. The tape drives stay spun down. S3 is never contacted. You get back a list of &lt;code&gt;(file_path, tape_uuid, tape_offset)&lt;/code&gt; tuples — the exact volumes and positions to touch.&lt;/p&gt;

&lt;p&gt;Then, per file, for each column you actually need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;compressed_offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compressed_size&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;   &lt;span class="n"&gt;object_frames&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;  &lt;span class="n"&gt;file_path&lt;/span&gt;           &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'/warehouse/events/2024-01-15.parquet'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="k"&gt;version&lt;/span&gt;             &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt;  &lt;span class="n"&gt;uncompressed_offset&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;col_chunk_start&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;col_chunk_end&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt;  &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;uncompressed_offset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You issue a range request for only those frames. For a 4 GB Parquet file where you need one 200 MB column chunk:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Data Transferred&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Catalog query (SQLite, local)&lt;/td&gt;
&lt;td&gt;0 bytes egress&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;object_frames lookup (SQLite, local)&lt;/td&gt;
&lt;td&gt;0 bytes egress&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP Range to S3 (compressed frame bytes)&lt;/td&gt;
&lt;td&gt;~85 MB (at 2.4:1 Zstd ratio)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total vs naïve full-file fetch&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;85 MB vs 1.7 GB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is a &lt;strong&gt;95% reduction&lt;/strong&gt; on a per-query basis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting Numbers to the Savings
&lt;/h2&gt;

&lt;p&gt;Let's use a concrete scenario: a data team maintains a 10 TB cold Parquet archive on S3, with an average file size of 4 GB. They run three workloads:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workload A — Nightly catalog sync (schema + statistics only)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Files: 2,500 Parquet files&lt;/li&gt;
&lt;li&gt;Data needed per file: footer only (~150 KB each)&lt;/li&gt;
&lt;li&gt;Total needed: ~375 MB&lt;/li&gt;
&lt;li&gt;Full-file cost: 10 TB × $0.09 = &lt;strong&gt;$921.60/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Partial-frame cost: 375 MB × $0.09 = &lt;strong&gt;$0.03/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Monthly savings: $921.57&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Workload B — Ad-hoc column scan (one column across 20% of files)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Files queried: 500 (selected by TLV statistics predicate)&lt;/li&gt;
&lt;li&gt;Column chunk per file: ~200 MB uncompressed → ~85 MB compressed frames&lt;/li&gt;
&lt;li&gt;Total fetched: ~42.5 GB&lt;/li&gt;
&lt;li&gt;Full-file cost: 500 × 4 GB × $0.09 = &lt;strong&gt;$180.00/query&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Partial-frame cost: 42.5 GB × $0.09 = &lt;strong&gt;$3.83/query&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Per-query savings: $176.17 (97.9%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Workload C — Point-in-time restore of a single row group&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 file × 1 row group = 128 MB uncompressed → ~54 MB compressed&lt;/li&gt;
&lt;li&gt;Full-file cost: 4 GB × $0.09 = &lt;strong&gt;$0.36&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Partial-frame cost: 54 MB × $0.09 = &lt;strong&gt;$0.005&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Per-restore savings: $0.355 (98.6%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At scale, Workload A alone — a nightly catalog sync that most teams run without thinking about the bill — generates &lt;strong&gt;~$11,000/year in unnecessary egress&lt;/strong&gt; on a 10 TB archive. The frame-indexed approach reduces that to under $1/year.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Self-Describing Volume: Your Catalog Backup Is on the Tape
&lt;/h2&gt;

&lt;p&gt;One underappreciated consequence of storing TLV column statistics in every ObjectHeader is that the volume itself becomes a data catalog. After a complete disaster recovery (catalog database lost, fresh server), &lt;code&gt;husk rebuild&lt;/code&gt; walks the storage volume 4 KB at a time, reads every ObjectHeader, validates the CRC32, and inserts a new catalog row including all TLV-encoded metadata — column statistics, schema, pipeline tags, everything.&lt;/p&gt;

&lt;p&gt;The catalog is not a separate system that the archive depends on. The catalog is a cache that accelerates access to information already encoded in the archive itself. This is the same philosophical commitment Parquet makes: the footer is not a separate sidecar file; it is part of the format.&lt;/p&gt;

&lt;p&gt;For teams integrating with external data catalogs (Apache Atlas, Hive Metastore, Unity Catalog), this means HuskHoard can emit catalog events on &lt;code&gt;husk rebuild&lt;/code&gt; just as well as on initial archive — the metadata survives the worst failure scenario, format-native.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wiring It Up: What a Partial Read Looks Like End-to-End
&lt;/h2&gt;

&lt;p&gt;A data engineer's dbt model lands at the StreamGate HTTP gateway with a &lt;code&gt;Range: bytes=536870912-671088640&lt;/code&gt; request (512 MB – 640 MB, pulling a specific row group from a 4 GB Parquet file on S3):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;1. GET http://localhost:8080/v1/stream/warehouse/events/2024-01-15.parquet
   Range: bytes=536870912-671088640

2. Gateway queries object_frames:
   → nearest frame boundary ≤ 536870912 is at uncompressed_offset=536870912
   → compressed_offset=225,978,112 on Volume D (S3:us-east-1)
   → 6 frames needed, compressed total = 56.3 MB

3. Gateway fires:
   GET s3://huskhoard-cold/volume-d.img
   Range: bytes=225978112-285884415

4. Gateway decompresses frames on the fly, streams bytes 536870912–671088640
   to the client.

5. Total egress from S3: 56.3 MB
   Total egress if client had fetched the full file: 1.71 GB
   Savings: 96.7%
   Time to first byte (LAN): ~180ms vs ~14s for full-file download
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The client — dbt, Spark, DuckDB, &lt;code&gt;curl&lt;/code&gt;, whatever — receives a standard HTTP 206 Partial Content response. No special client library. No SDK. Just the HTTP Range spec, universally supported.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Takeaways for Data Engineers
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Frame size is a tuning knob.&lt;/strong&gt; HuskHoard defaults to 16 MB frames, optimized for cloud PUT cost (fewer, larger requests) and Zstd compression ratio. For workloads with very fine-grained access patterns (column-level reads in narrow schemas), smaller frames (1–4 MB) reduce the minimum fetch size at the cost of more catalog rows and higher PUT count. Benchmark against your actual access patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. TLV statistics are opt-in per file type.&lt;/strong&gt; For video files you probably don't store column min/max values. For Parquet, CSV, and Arrow IPC files it's worth paying the archiver CPU time to extract and embed statistics at archive time — you pay once and recoup every time a catalog query avoids a volume read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The catalog query is your explain plan.&lt;/strong&gt; Before a restore or a scan, &lt;code&gt;husk catalog query --path "/warehouse/events/*.parquet" --filter "parquet.col.event_ts.min &amp;gt;= 2024-01-01"&lt;/code&gt; shows you which volumes and frame ranges will be touched. Run it first. If the egress estimate is unexpected, the TLV coverage on those files probably needs improving.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Multi-volume means cross-volume pruning is free.&lt;/strong&gt; A query that touches two volumes and skips three is doing volume-level predicate pushdown before any I/O. The catalog does this automatically based on the &lt;code&gt;tape_uuid&lt;/code&gt; in each matching row.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Egress savings compound with replication.&lt;/strong&gt; HuskHoard replicates to multiple volumes simultaneously. If your primary volume is on S3 ($0.09/GB egress) and your replica is on Cloudflare R2 ($0.00 egress), the gateway can route the range request to whichever backend minimizes cost. Partial reads from R2 are free. You still benefit from the jump table because API call count and latency still matter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://huskhoard.com/blog-post-cat.html" rel="noopener noreferrer"&gt;The Catalog Is the Archive: Inside HuskHoard's Ground Truth Engine&lt;/a&gt; — deep dive on the SQLite schema, WAL mode, and catalog rebuild&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huskhoard.com/blog-post-tag.html" rel="noopener noreferrer"&gt;Data Without Context is Entropy: The Architecture of Tagging&lt;/a&gt; — TLV byte packing, POSIX xattrs, and the self-healing archive&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/HuskHoard/HuskHoard" rel="noopener noreferrer"&gt;HuskHoard on GitHub&lt;/a&gt; — source, README, and the &lt;code&gt;object_frames&lt;/code&gt; implementation in &lt;code&gt;src/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/facebook/zstd/blob/dev/contrib/seekable_format/zstd_seekable_compression_format.md" rel="noopener noreferrer"&gt;Zstd Seekable Format spec&lt;/a&gt; — the upstream spec HuskHoard's frame model is compatible with&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huskhoard.com/blog-post-parquet.html" rel="noopener noreferrer"&gt;Apache Parquet with an Archive&lt;/a&gt; — Building a Zero-Impact Data Lake&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;HuskHoard is open-source under AGPL v3. If you're running cold data tiers on Linux and want to stop paying for bytes you never read, contributions and issues are welcome at &lt;a href="https://github.com/HuskHoard/HuskHoard" rel="noopener noreferrer"&gt;github.com/HuskHoard/HuskHoard&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>rust</category>
      <category>aws</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title># Why I Bypassed FUSE: Building a Transparent DataTiering Engine in Rust</title>
      <dc:creator>Joichiro Mitaka</dc:creator>
      <pubDate>Fri, 19 Jun 2026 05:51:33 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/coldstorage/-why-i-bypassed-fuse-building-a-transparent-datatiering-engine-in-rust-4b8d</link>
      <guid>https://dev.clauneck.workers.dev/coldstorage/-why-i-bypassed-fuse-building-a-transparent-datatiering-engine-in-rust-4b8d</guid>
      <description>&lt;p&gt;If you run a home lab or manage large datasets, you’ve hit this wall: NVMe drives are fast but too expensive to hoard data on. Hard drives or cloud buckets are cheap, but they are slow and a pain to manage manually.&lt;/p&gt;

&lt;p&gt;The enterprise world solves this with &lt;strong&gt;HSM Hierarchical Storage Management&lt;/strong&gt; automatically shuffling cold data to slow storage while keeping a transparent stub on the fast drive. But enterprise HSMs cost thousands of dollars and lock your data in proprietary black boxes.&lt;/p&gt;

&lt;p&gt;I wanted this for Linux, for free. So, I started building &lt;strong&gt;HuskHoard&lt;/strong&gt;, an opensource data tiering engine. &lt;/p&gt;

&lt;p&gt;My first thought, like almost every Linux developer building a virtual filesystem, was to use FUSE. But I quickly realized FUSE was the wrong tool for the job. Here is why I abandoned it, and how I used the Linux fanotify API and Rust to build a transparent, zero overhead archiving engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with FUSE
&lt;/h2&gt;

&lt;p&gt;FUSE is fantastic for creating custom filesystems like SSHFS or mounting an S3 bucket. But for an HSM, it creates a massive bottleneck.&lt;/p&gt;

&lt;p&gt;When you use FUSE, &lt;em&gt;every single read and write&lt;/em&gt; has to go through a context switch:&lt;br&gt;
Application &amp;gt; Kernel &amp;gt; FUSE Daemon Userspace &amp;gt; Kernel &amp;gt; Physical Drive.&lt;/p&gt;

&lt;p&gt;If 90% of your data is Hot actively being used on your fast NVMe, forcing it through FUSE overhead completely defeats the purpose of buying expensive NVMe drives in the first place. You sacrifice native I/O performance just to manage the 10% of Cold data.&lt;/p&gt;

&lt;p&gt;I needed a solution where the Hot data ran at native speed, touching nothing but the XFS/Ext4 kernel drivers. &lt;/p&gt;
&lt;h2&gt;
  
  
  The Solution: Enter fanotify
&lt;/h2&gt;

&lt;p&gt;Instead of intercepting every transaction via FUSE, I realized I only needed to intervene in one specific scenario: &lt;strong&gt;When a user tries to open a file that has been archived.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Linux has a kernel API called fanotify originally designed for antivirus scanners. It allows a userspace program to monitor a mount point and, crucially, &lt;em&gt;block&lt;/em&gt; an application from opening a file until the daemon says it’s okay.&lt;/p&gt;

&lt;p&gt;Here is how HuskHoard uses fanotify to create transparent tiering:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The Janitor:&lt;/strong&gt; A background Rust thread scans my NVMe drive. When it finds a file that hasnt been touched in 30 days, it compresses it Zstd and moves the payload to a cheap HDD, LTO Tape, or S3 bucket.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Husk Stub:&lt;/strong&gt; It leaves the original file on the NVMe drive but truncates its allocated size to 0 bytes creating a sparse file. To the OS and the user, the file still looks like it’s 50GB and sits in /home/movies.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Interceptor:&lt;/strong&gt; This is where fanotify shines. The HuskHoard daemon listens for FAN_ACCESS_PERM events. If VLC media player tries to open that Husk file, fanotify pauses VLCs execution in the kernel.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Recall:&lt;/strong&gt; HuskHoard intercepts the request, streams the 50GB payload from the tape/S3 bucket back into the sparse file on the NVMe, and then tells fanotify to allow VLC to proceed. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;VLC thinks it just opened a local file. It has no idea the data was fetched from an S3 bucket 50 milliseconds ago. &lt;/p&gt;
&lt;h2&gt;
  
  
  The Rust Implementation
&lt;/h2&gt;

&lt;p&gt;Rust was the obvious choice for this. When you are blocking kernellevel I/O requests, memory safety and predictable latency are nonnegotiable. &lt;/p&gt;

&lt;p&gt;Handling the fanotify loop requires a few specific Linux capabilities specifically CAP_SYS_ADMIN, but Rust allows us to safely manage the multithreaded heavy lifting of the Archive Worker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pub fn run_interceptor&lt;span class="o"&gt;(&lt;/span&gt;config: Arc&amp;lt;HuskConfig&amp;gt;, use_direct_io: bool&lt;span class="o"&gt;)&lt;/span&gt; -&amp;gt; std::io::Result&amp;lt;&lt;span class="o"&gt;()&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;let &lt;/span&gt;watch_dir &lt;span class="o"&gt;=&lt;/span&gt; &amp;amp;config.hot_tier&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nb"&gt;let &lt;/span&gt;db_path &lt;span class="o"&gt;=&lt;/span&gt; &amp;amp;config.db_path&lt;span class="p"&gt;;&lt;/span&gt;
    info!&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;[Daemon] Starting fanotify interceptor on '{}'..."&lt;/span&gt;, watch_dir&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nb"&gt;let &lt;/span&gt;abs_dir &lt;span class="o"&gt;=&lt;/span&gt; std::fs::canonicalize&lt;span class="o"&gt;(&lt;/span&gt;watch_dir&lt;span class="o"&gt;)&lt;/span&gt;?&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nb"&gt;let &lt;/span&gt;fan_fd &lt;span class="o"&gt;=&lt;/span&gt; unsafe &lt;span class="o"&gt;{&lt;/span&gt;
        libc::fanotify_init&lt;span class="o"&gt;(&lt;/span&gt;libc::FAN_CLASS_PRE_CONTENT, libc::O_RDWR as u32&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;fan_fd &amp;lt; 0 &lt;span class="o"&gt;{&lt;/span&gt; 
        &lt;span class="nb"&gt;let &lt;/span&gt;err &lt;span class="o"&gt;=&lt;/span&gt; std::io::Error::last_os_error&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        error!&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;" fanotify_init failed: {}. Missing Root or Capabilities!"&lt;/span&gt;, err&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return &lt;/span&gt;Err&lt;span class="o"&gt;(&lt;/span&gt;err&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
    &lt;span class="o"&gt;}&lt;/span&gt;


        &lt;span class="nb"&gt;let &lt;/span&gt;mark_mask &lt;span class="o"&gt;=&lt;/span&gt; libc::FAN_ACCESS_PERM | libc::FAN_CLOSE_WRITE | libc::FAN_EVENT_ON_CHILD&lt;span class="p"&gt;;&lt;/span&gt;

        // 1. Recursively mark the root watch directory and all current subdirectories
        info!&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"[Daemon]  Scanning and attaching listeners to all subdirectories..."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        mark_directory_recursive&lt;span class="o"&gt;(&lt;/span&gt;fan_fd, &amp;amp;abs_dir, mark_mask, &amp;amp;config&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Escaping Vendor Lockin The Easy Exit Promise
&lt;/h2&gt;

&lt;p&gt;One of the biggest issues with commercial HSMs is that if the daemon dies, your data is gone, trapped in proprietary metadata.&lt;/p&gt;

&lt;p&gt;Because I was building this for the opensource community, I enforced a strict Easy Exit architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Payload data is stored in standard &lt;strong&gt;Zstd&lt;/strong&gt; streams verified by &lt;strong&gt;BLAKE3&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  The catalog metadata the Brain tracking where the cold bytes live is an SQLite database.&lt;/li&gt;
&lt;li&gt;  You can natively export the entire catalog to &lt;strong&gt;Apache Parquet&lt;/strong&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means if you decide to stop using HuskHoard, you dont need my software to get your data back. You can query your catalog with DuckDB or Python and manually extract your Zstd archives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Whats Next?
&lt;/h2&gt;

&lt;p&gt;Building HuskHoard has been a massive deepdive into Linux kernel APIs and SCSI Tape drivers. Yes, it natively supports physical LTO drives via /dev/nstX to prevent tape shoeshining. &lt;/p&gt;

&lt;p&gt;The engine currently supports automated replication across local drives, tapes, and rclone supported cloud buckets.&lt;/p&gt;

&lt;p&gt;If you are a Rust developer, a HomeLab data hoarder, or just interested in Linux storage architecture, Id love your feedback or code reviews. It is fully AGPL v3 licensed. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check out the repo here:&lt;/strong&gt; [GitHub  HuskHoard]&lt;a href="https://github.com/huskhoard/huskhoard" rel="noopener noreferrer"&gt;https://github.com/huskhoard/huskhoard&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;More architecture details:&lt;/strong&gt; [HuskHoard Blog]&lt;a href="https://www.huskhoard.com/blog.html" rel="noopener noreferrer"&gt;https://www.huskhoard.com/blog.html&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>rust</category>
      <category>infrastructure</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
