Bit Packing Libraries — API & Interface Research
Research across: dgryski/go-bitstream (Go), Prometheus chunkenc, InfluxDB tsm1, Rust bitstream-io, Python bitstring, Java BitSet.
Common API Pattern
Every serious implementation converges on the same shape:
- Separate
ReaderandWritertypes — not a single bidirectional object - Single bit:
ReadBit() (bool, error)/WriteBit(bool) - N bits:
ReadBits(n) (uint64, error)/WriteBits(v uint64, n) - Finalise writes:
Flush() - Errors returned as values, not stored in struct (the InfluxDB approach of
r.Err()is the outlier — less idiomatic)
Library-by-Library Breakdown
Go: dgryski/go-bitstream
func NewReader(r io.Reader) *BitReader
func NewWriter(w io.Writer) *BitWriter
func (b *BitReader) ReadBit() (bool, error)
func (b *BitReader) ReadBits(n uint) (uint64, error)
func (b *BitWriter) WriteBit(bit bool) error
func (b *BitWriter) WriteBits(u uint64, nBits uint) error
func (b *BitWriter) Flush(out byte) error
The reference Go implementation. Wraps io.Reader/io.Writer — stream-oriented. Flush pads the final byte with the supplied fill bit.
Prometheus: chunkenc/xor.go (internal bstream)
type bstream struct {
stream []byte
count uint8
}
func (b *bstream) writeBit(v bit)
func (b *bstream) writeByte(byt byte)
func (b *bstream) writeBits(u uint64, nbits int)
func (b *bstreamReader) readBit() (bit, error)
func (b *bstreamReader) readBits(nbits int) (uint64, error)
func (b *bstreamReader) readByte() (byte, error)
Internal (unexported) but arguably the most battle-tested Go implementation. Same fundamental shape as dgryski. bit is just a bool typedef.
InfluxDB: tsm1 BitReader/BitWriter
type BitReader struct {
buf [8]byte
// ...
err error // ← stored in struct, not returned
}
func (r *BitReader) ReadBit() bool // error via r.Err()
func (r *BitReader) ReadBits(nbits int) uint64
func (w *BitWriter) WriteBit(v bool)
func (w *BitWriter) WriteBits(u uint64, nbits int)
func (w *BitWriter) Flush()
Stores error in struct — callers check r.Err() after reads. Less idiomatic Go. Functional but the error-handling model is harder to compose.
Rust: bitstream-io
let mut reader = BitReader::endian(cursor, BigEndian);
let bit: bool = reader.read_bit()?;
let value: u32 = reader.read(8)?; // type inferred from context
let mut writer = BitWriter::endian(vec, BigEndian);
writer.write_bit(true)?;
writer.write(8, 255u32)?;
writer.byte_align()?;
Same conceptual shape. Endianness is explicit (constructor param). byte_aligned() check available. Type-safe via generics — no casting to uint64 needed.
Python: bitstring
s = ConstBitStream(bytes=data)
value = s.read('uint:8') # format string
bool_val = s.read('bool')
bs = BitStream()
bs.append('uint:8=255')
bs.append('bool=True')
Format-string based. High-level but heavyweight. Not suitable as a model for a low-level library.
Java: java.util.BitSet
BitSet bs = new BitSet(64);
bs.set(3); // set bit at index 3
bs.get(3); // read bit at index 3
Index-based, not stream-oriented. Not comparable — different use case (sparse bit flags, not packed encoding).
Your API vs the Field
| Method | Your API | dgryski | Prometheus | InfluxDB |
|---|---|---|---|---|
| Write single bit | WriteBit(bool) |
WriteBit(bool) error |
writeBit(bit) |
WriteBit(bool) |
| Write n bits | WriteBits(uint64, uint8) |
WriteBits(uint64, uint) error |
writeBits(uint64, int) |
WriteBits(uint64, int) |
| Flush + get bytes | Flush() []byte |
Flush(byte) error (writes to io.Writer) |
n/a (access .stream) |
Flush() (writes to io.Writer) |
| Non-destructive peek | Snapshot() []byte |
— | — | — |
| Read single bit | ReadBit() (bool, error) |
ReadBit() (bool, error) |
readBit() (bit, error) |
ReadBit() bool |
| Read n bits | ReadBits(uint8) (uint64, error) |
ReadBits(uint) (uint64, error) |
readBits(int) (uint64, error) |
ReadBits(int) uint64 |
Where your API differs:
uint8for bit count (vsuintorint) — more honest about the constraint (max 64 bits). Tighter contract.Flush() []bytereturns bytes directly — more convenient than requiring anio.Writer. No equivalent found elsewhere; it's a clean ergonomic improvement.Snapshot() []byte— no equivalent found. Non-destructive mid-stream peek at current bytes. Useful for the TychoDB use case.
What "Hiding Complexity" Means in Practice
The libraries that feel simple share two traits:
1. No configuration on construction
NewWriter() takes no options. You write bits, call Flush, get bytes. That's it. Complexity (endianness, padding, buffering) is handled internally with sensible defaults.
2. uint64 as the universal value type Callers don't think about int sizes. Pass a uint64, specify how many bits. Casting is the caller's problem if they need a smaller type — not the library's.
The libraries that feel complex require either format strings (Python bitstring) or index-based access (Java BitSet) — neither matches the mental model of "stream of bits".
Verdict
Your existing API is already at the right level of abstraction. It matches Prometheus's internal bstream almost line for line — arguably the most battle-tested Go implementation — with two ergonomic improvements:
Flush()returns[]bytedirectly instead of writing to anio.Writeruint8for bit count is more honest thanintSnapshot()is a genuinely useful addition with no equivalent elsewhere
No changes needed to the API design. It is simple, it hides complexity, and it aligns with every serious implementation in the field.