PayloadHeader

Every rsteg-written carrier starts its embedded bitstream with a fixed 32-byte frame. The frame answers four questions for the extractor: "is this an rsteg file?", "which scheme was used?", "how many body bytes follow?", and "was it encrypted?" — without which extract would have to guess.

The 32-byte wire format

offset  size  field            role
 0      4     magic            b"RSTG" — false-positive filter on extract
 4      1     version          wire version, current = 1
 5      1     flags            bit0 encrypted, bit1 compressed, bit2 permuted
 6      4     crypto_fourcc    AEAD scheme id (e.g. b"XCA1"); all-zero for plaintext
10      4     scheme_fourcc    carrier + walk (BLSL, BLSP, WLSL, …)
14      1     density          bits written per sample, 1..=4
15      1     reserved         must be zero
16      4     body_len         u32 big-endian length of body bytes that follow
20      4     body_crc32       CRC32-IEEE of plaintext body; zero when encrypted
24      8     reserved         all zero, preserves 32-byte frame for future extensions

Source of truth: crates/rsteg-core/src/header.rs. All multi-byte integers are big-endian — picked because RSTG reads left-to-right in a hex dump and because network byte order is the default when there's no good reason to pick otherwise.

Try it — encode a header

Build a header with the form below. The page posts to POST /api/algo/header-encode, which calls PayloadHeader::encode() on the server and returns the 32 bytes plus field metadata. Every field is colored both in the hex and in the table so you can line them up visually.

field offset size value note
loading…

Why each field is there

magic b"RSTG"

A four-byte tag checked first on every extract. It doesn't provide any security — an attacker who knows the format can reproduce the magic — but it cheaply rules out the 99.99% of carrier files that aren't rsteg files. Without it, every extract would have to try the whole rest of the decode pipeline and probably fail deep inside, producing confusing errors.

version

A 1-byte wire version. Currently 0x01. decode() rejects anything else with Error::HeaderBadVersion. This gives us a clean upgrade path: version 2 can add fields by co-opting the reserved bytes, and old readers will refuse to misinterpret them.

flags

A 1-byte bitfield:

Rejecting unknown flags on read is deliberate. A future writer adding bit3 must also bump the version; a zero-policy on reserved bits catches accidental corruption that happens to land in a known-flag position.

crypto_fourcc + scheme_fourcc

Two orthogonal identifiers. scheme_fourcc tells the format adapter how to walk samples (which carrier and linear vs permuted — e.g. BLSL = BMP LSB linear). crypto_fourcc tells the crypto registry which seal/open to run; it's all-zero for plaintext payloads. Separating them lets every carrier × walk × crypto combination compose freely.

density

One byte in 1..=4. Determines how many low bits per sample are overwritten, which in turn determines capacity and perceptual delta. The decoder enforces the range and returns Error::DensityOutOfRange for anything else.

body_len + body_crc32

body_len is the big-endian u32 length of the payload that follows. It caps how many post-header bytes the extractor reads and therefore bounds the walk. body_crc32 is a CRC32-IEEE of the plaintext body for unencrypted payloads — a cheap integrity hint for detecting walk-miscount or a wrong density. For encrypted payloads it's zero: the Poly1305 tag inside the AEAD envelope already authenticates the ciphertext, and a second integrity check on plaintext would force the decoder to leak timing info about decryption success.

reserved bytes

One byte at offset 15, eight bytes at offset 24..32. All must be zero on write and are checked on read. They reserve space for forward-compatible fields — e.g. an explicit seed_hint or a content_type hint — without rewriting the frame.

Detection on extract

The extractor starts by reading 32 bytes' worth of embedding units from the carrier — which is 256 / density samples at density 1..4. For a 24-bit BMP that's:

The first 32 bytes of that bitstream are decoded as a PayloadHeader. If the magic isn't RSTG, decode() returns Error::HeaderMissing — the extractor stops immediately. Cost of being wrong: ~250 bytes read from a potentially multi-MB carrier. The false-positive rate on random data is 2⁻³² per trial, so we never mistake a non-rsteg file for an rsteg one.

The whole thing in 20-ish lines of Rust

pub fn encode(&self) -> [u8; 32] {
    let mut out = [0u8; 32];
    out[0..4].copy_from_slice(b"RSTG");
    out[4] = self.version;
    out[5] = self.flags;
    out[6..10].copy_from_slice(&self.crypto_fourcc);
    out[10..14].copy_from_slice(&self.scheme_fourcc.0);
    out[14] = self.density;
    // 15 stays zero (reserved)
    out[16..20].copy_from_slice(&self.body_len.to_be_bytes());
    out[20..24].copy_from_slice(&self.body_crc32.to_be_bytes());
    // 24..32 stay zero (reserved)
    out
}

No serde, no proc-macros, no framework. Fixed layout, hand-written, trivial to audit. This is what rsteg means by "supply-chain minimization" — every byte of this wire format is in the open and reproducible from the spec.