# Bit Packing Libraries — API & Interface Research Research across: dgryski/go-bitstream (Go), Prometheus chunkenc, InfluxDB tsm1, Rust bitstream-io, Python bitstring, Java BitSet. --- ## Common API Pattern Every serious implementation converges on the same shape: - **Separate `Reader` and `Writer` types** — not a single bidirectional object - **Single bit**: `ReadBit() (bool, error)` / `WriteBit(bool)` - **N bits**: `ReadBits(n) (uint64, error)` / `WriteBits(v uint64, n)` - **Finalise writes**: `Flush()` - **Errors returned as values**, not stored in struct (the InfluxDB approach of `r.Err()` is the outlier — less idiomatic) --- ## Library-by-Library Breakdown ### Go: dgryski/go-bitstream ```go func NewReader(r io.Reader) *BitReader func NewWriter(w io.Writer) *BitWriter func (b *BitReader) ReadBit() (bool, error) func (b *BitReader) ReadBits(n uint) (uint64, error) func (b *BitWriter) WriteBit(bit bool) error func (b *BitWriter) WriteBits(u uint64, nBits uint) error func (b *BitWriter) Flush(out byte) error ``` The reference Go implementation. Wraps `io.Reader`/`io.Writer` — stream-oriented. `Flush` pads the final byte with the supplied fill bit. --- ### Prometheus: chunkenc/xor.go (internal `bstream`) ```go type bstream struct { stream []byte count uint8 } func (b *bstream) writeBit(v bit) func (b *bstream) writeByte(byt byte) func (b *bstream) writeBits(u uint64, nbits int) func (b *bstreamReader) readBit() (bit, error) func (b *bstreamReader) readBits(nbits int) (uint64, error) func (b *bstreamReader) readByte() (byte, error) ``` Internal (unexported) but arguably the most battle-tested Go implementation. Same fundamental shape as dgryski. `bit` is just a `bool` typedef. --- ### InfluxDB: tsm1 BitReader/BitWriter ```go type BitReader struct { buf [8]byte // ... err error // ← stored in struct, not returned } func (r *BitReader) ReadBit() bool // error via r.Err() func (r *BitReader) ReadBits(nbits int) uint64 func (w *BitWriter) WriteBit(v bool) func (w *BitWriter) WriteBits(u uint64, nbits int) func (w *BitWriter) Flush() ``` Stores error in struct — callers check `r.Err()` after reads. Less idiomatic Go. Functional but the error-handling model is harder to compose. --- ### Rust: bitstream-io ```rust let mut reader = BitReader::endian(cursor, BigEndian); let bit: bool = reader.read_bit()?; let value: u32 = reader.read(8)?; // type inferred from context let mut writer = BitWriter::endian(vec, BigEndian); writer.write_bit(true)?; writer.write(8, 255u32)?; writer.byte_align()?; ``` Same conceptual shape. Endianness is explicit (constructor param). `byte_aligned()` check available. Type-safe via generics — no casting to uint64 needed. --- ### Python: bitstring ```python s = ConstBitStream(bytes=data) value = s.read('uint:8') # format string bool_val = s.read('bool') bs = BitStream() bs.append('uint:8=255') bs.append('bool=True') ``` Format-string based. High-level but heavyweight. Not suitable as a model for a low-level library. --- ### Java: java.util.BitSet ```java BitSet bs = new BitSet(64); bs.set(3); // set bit at index 3 bs.get(3); // read bit at index 3 ``` Index-based, not stream-oriented. Not comparable — different use case (sparse bit flags, not packed encoding). --- ## Your API vs the Field | Method | Your API | dgryski | Prometheus | InfluxDB | |--------|----------|---------|------------|----------| | Write single bit | `WriteBit(bool)` | `WriteBit(bool) error` | `writeBit(bit)` | `WriteBit(bool)` | | Write n bits | `WriteBits(uint64, uint8)` | `WriteBits(uint64, uint) error` | `writeBits(uint64, int)` | `WriteBits(uint64, int)` | | Flush + get bytes | `Flush() []byte` | `Flush(byte) error` (writes to io.Writer) | n/a (access `.stream`) | `Flush()` (writes to io.Writer) | | Non-destructive peek | `Snapshot() []byte` | — | — | — | | Read single bit | `ReadBit() (bool, error)` | `ReadBit() (bool, error)` | `readBit() (bit, error)` | `ReadBit() bool` | | Read n bits | `ReadBits(uint8) (uint64, error)` | `ReadBits(uint) (uint64, error)` | `readBits(int) (uint64, error)` | `ReadBits(int) uint64` | **Where your API differs:** 1. `uint8` for bit count (vs `uint` or `int`) — more honest about the constraint (max 64 bits). Tighter contract. 2. `Flush() []byte` returns bytes directly — more convenient than requiring an `io.Writer`. No equivalent found elsewhere; it's a clean ergonomic improvement. 3. `Snapshot() []byte` — no equivalent found. Non-destructive mid-stream peek at current bytes. Useful for the TychoDB use case. --- ## What "Hiding Complexity" Means in Practice The libraries that feel simple share two traits: **1. No configuration on construction** `NewWriter()` takes no options. You write bits, call Flush, get bytes. That's it. Complexity (endianness, padding, buffering) is handled internally with sensible defaults. **2. uint64 as the universal value type** Callers don't think about int sizes. Pass a uint64, specify how many bits. Casting is the caller's problem if they need a smaller type — not the library's. The libraries that feel complex require either format strings (Python bitstring) or index-based access (Java BitSet) — neither matches the mental model of "stream of bits". --- ## Verdict Your existing API is already at the right level of abstraction. It matches Prometheus's internal bstream almost line for line — arguably the most battle-tested Go implementation — with two ergonomic improvements: - `Flush()` returns `[]byte` directly instead of writing to an `io.Writer` - `uint8` for bit count is more honest than `int` - `Snapshot()` is a genuinely useful addition with no equivalent elsewhere **No changes needed to the API design.** It is simple, it hides complexity, and it aligns with every serious implementation in the field.