bitpack: Build a Go Bit-Packing Library

A test-driven guide to building github.com/tychodb/bitpack — a small, zero-dependency library for packing arbitrary numbers of bits into a byte slice and reading them back.

By the end you will have a Writer and a Reader that TychoDB imports as its first external dependency, replacing the inline BitWriter/BitReader types that lived in bits.go.

What we are building

bitpack/
  bitpack.go       — Writer, Reader
  bitpack_test.go  — all tests

Two types:

Writer  — accumulates bits MSB-first into a uint64 cache; emits whole bytes as the cache fills
Reader  — walks a byte slice bit by bit using an integer cursor

Three things that make this non-trivial:

MSB-first packing — the first bit written lands in the most-significant position of the first byte. Decoders read left to right just like humans read binary.
uint64 cache in the writer — WriteBits(v, n) fills or spans a cache word with a single OR+shift pair instead of a per-bit loop. Important when you are writing thousands of variable-length values per second.
Integer cursor in the reader — a simpler design would use a uint64 read-cache with a valid counter. That approach has a silent overflow bug when fewer than 8 bits remain; the bit-position cursor avoids it entirely.

Setup

mkdir bitpack && cd bitpack
go mod init github.com/tychodb/bitpack
touch bitpack.go bitpack_test.go

bitpack.go

package bitpack

bitpack_test.go

package bitpack

$ go test
[no test files]

Step 1 — Writer struct and MSB-first layout

Before writing any logic, establish how the bit layout works.

MSB-first means the first bit you write ends up as the most-significant bit (bit 7) of the first output byte. Write 1 then 0 and the first byte starts 10...... — exactly how binary is written on paper.

The Writer keeps an internal uint64 cache. Bits are packed into the high end of the cache: the first WriteBit call lands in bit 63, the second in bit 62, and so on. When the cache fills completely (all 64 bits used), those 8 bytes are appended to buf and the cache is cleared.

bitpack_test.go

import "testing"

func TestWriteSingleOne(t *testing.T) {
	var w Writer
	w.WriteBit(true)
	data := w.Flush()

	// A single 1 bit, MSB-first, zero-padded to one byte → 1000_0000 = 0x80
	if len(data) != 1 {
		t.Fatalf("expected 1 byte, got %d", len(data))
	}
	if data[0] != 0x80 {
		t.Fatalf("expected 0x80, got 0x%02x", data[0])
	}
}

$ go test
./bitpack_test.go: undefined: Writer

bitpack.go

package bitpack

// Writer packs bits into a uint64 cache and emits whole bytes MSB-first.
// The zero value is ready to use — no constructor required.
type Writer struct {
	cache uint64
	valid uint8  // number of bits currently occupied in cache (0–64)
	buf   []byte
}

func (w *Writer) WriteBit(v bool) {}

func (w *Writer) Flush() []byte { return nil }

$ go test
TestWriteSingleOne: expected 1 byte, got 0

Good — it compiles, fails with the right message. Now implement.

WriteBit places the bit at position 63 - valid in the cache. When valid == 0 (empty), 63 - 0 = 63 — the MSB. When valid == 1, it goes to bit 62, and so on. Each call increments valid.

Flush drains whatever is in the cache one byte at a time by reading the top byte (cache >> 56) and shifting the cache left by 8. Any bits below a full byte are zero-padded automatically because the unfilled low bits of the cache are already zero.

func (w *Writer) WriteBit(v bool) {
	if v {
		w.cache |= 1 << (63 - w.valid)
	}
	w.valid++
	if w.valid == 64 {
		w.flush64()
	}
}

func (w *Writer) Flush() []byte {
	for w.valid > 0 {
		w.buf = append(w.buf, byte(w.cache>>56))
		w.cache <<= 8
		if w.valid >= 8 {
			w.valid -= 8
		} else {
			w.valid = 0
		}
	}
	out := w.buf
	w.buf = nil
	w.cache = 0
	w.valid = 0
	return out
}

func (w *Writer) flush64() {
	w.buf = append(w.buf,
		byte(w.cache>>56),
		byte(w.cache>>48),
		byte(w.cache>>40),
		byte(w.cache>>32),
		byte(w.cache>>24),
		byte(w.cache>>16),
		byte(w.cache>>8),
		byte(w.cache),
	)
	w.cache = 0
	w.valid = 0
}

flush64 is a separate function for one reason: when valid reaches exactly 64 we know the cache holds exactly 8 bytes, so we can append them in one statement without the byte-by-byte loop in Flush. WriteBit calls it on the hot path.

$ go test
PASS

Step 2 — WriteBit in detail

Eight bits produce exactly one byte.

func TestWriteBit(t *testing.T) {
	var w Writer
	// 1 0 1 1 0 0 1 0 = 0b10110010 = 0xb2
	for _, b := range []bool{true, false, true, true, false, false, true, false} {
		w.WriteBit(b)
	}
	data := w.Flush()
	if len(data) != 1 {
		t.Fatalf("expected 1 byte, got %d", len(data))
	}
	if data[0] != 0b10110010 {
		t.Fatalf("expected 0xb2, got 0x%02x", data[0])
	}
}

$ go test
PASS

Walk through what happens bit by bit:

valid=0: write 1 → cache |= 1<<63  → cache = 1000...0
valid=1: write 0 → no-op           → cache = 1000...0
valid=2: write 1 → cache |= 1<<61  → cache = 1010...0
valid=3: write 1 → cache |= 1<<60  → cache = 1011...0
...
valid=7: write 0 → no-op           → cache = 10110010 000...0

Flush: cache >> 56 = 0b10110010 = 0xb2. ✓

Step 3 — Flush resets the Writer

Flush must return the accumulated bytes and reset all state so the Writer can be used again.

func TestFlushResetsWriter(t *testing.T) {
	var w Writer
	w.WriteBits(0xff, 8)
	w.Flush()

	// Second use after flush — should behave as if fresh.
	w.WriteBits(0xab, 8)
	data := w.Flush()
	if len(data) != 1 || data[0] != 0xab {
		t.Fatalf("after reset: got % x", data)
	}
}

This test uses WriteBits which doesn't exist yet. Add a stub so it compiles:

func (w *Writer) WriteBits(v uint64, n uint8) {}

$ go test
TestFlushResetsWriter: after reset: got []

Flush already resets cache and buf. The stub WriteBits writes nothing, so the second Flush returns empty. Implement WriteBits in step 4 and this test will pass automatically.

Step 4 — WriteBits: packing multiple bits at once

WriteBits(v, n) writes the low n bits of v, MSB first, without a per-bit loop.

The trick: align v to the top of a uint64 before doing anything else.

v = 0b101  n = 3
v <<= (64 - 3) = v <<= 61
v is now: 101 followed by 61 zero bits

Now v >> valid drops those top bits into exactly the right position in the cache.

func TestWriteBitsRoundTrip(t *testing.T) {
	cases := []struct {
		bits uint8
		val  uint64
	}{
		{1, 0}, {1, 1},
		{7, 0x5a},
		{8, 0xff},
		{9, 0x1ab},
		{16, 0xdead},
		{32, 0xdeadbeef},
		{64, 0xdeadbeefcafebabe},
	}
	for _, tc := range cases {
		var w Writer
		w.WriteBits(tc.val, tc.bits)
		data := w.Flush()

		r := NewReader(data)
		got, err := r.ReadBits(tc.bits)
		if err != nil {
			t.Fatalf("bits=%d val=%d: %v", tc.bits, tc.val, err)
		}
		if got != tc.val {
			t.Fatalf("bits=%d: wrote %d, read %d", tc.bits, tc.val, got)
		}
	}
}

This test also uses NewReader and ReadBits (step 5). Add stubs for now:

type Reader struct{}
func NewReader(data []byte) *Reader { return &Reader{} }
func (r *Reader) ReadBit() (bool, error) { return false, nil }
func (r *Reader) ReadBits(n uint8) (uint64, error) { return 0, nil }

$ go test
TestWriteBitsRoundTrip: bits=1 val=1: wrote 1, read 0

Implement WriteBits:

func (w *Writer) WriteBits(v uint64, n uint8) {
	v <<= (64 - n) // align the low n bits to the top of a uint64

	avail := 64 - w.valid
	if n <= avail {
		// All n bits fit in the current cache slot.
		w.cache |= v >> w.valid
		w.valid += n
		if w.valid == 64 {
			w.flush64()
		}
		return
	}

	// The bits span a cache boundary.
	// Fill the remaining cache space with the top `avail` bits of v,
	// flush the full cache, then place the leftover bits in the fresh cache.
	w.cache |= v >> w.valid
	w.valid = 64
	w.flush64()
	remaining := n - avail
	w.cache = v << avail
	w.valid = remaining
}

The cache-boundary case is worth reading carefully:

cache is 60 bits full (valid=60), avail=4.
Write 8 bits: 10110101

v after align: 10110101 followed by 56 zeroes
v >> 60 = 0000 1011  → fills the last 4 slots: cache is now full → flush64
remaining = 8-4 = 4 bits
v << 4 = 0101 followed by 60 zeroes → sits at top of fresh cache
valid = 4

Step 5 — multi-value write and byte-boundary alignment

Values of different widths written back to back must round-trip correctly, including when they straddle byte boundaries.

func TestMultipleValues(t *testing.T) {
	var w Writer
	w.WriteBits(0b101, 3)
	w.WriteBits(0b1010, 4)
	w.WriteBits(0xff, 8)
	w.WriteBits(0b1, 1)
	data := w.Flush()

	r := NewReader(data)
	v1, _ := r.ReadBits(3)
	v2, _ := r.ReadBits(4)
	v3, _ := r.ReadBits(8)
	v4, _ := r.ReadBits(1)

	if v1 != 0b101 || v2 != 0b1010 || v3 != 0xff || v4 != 1 {
		t.Fatalf("got %d %d %d %d", v1, v2, v3, v4)
	}
}

func TestByteBoundaryAlignment(t *testing.T) {
	// 3 + 5 + 8 = 16 bits — straddles two bytes.
	var w Writer
	w.WriteBits(0b110, 3)
	w.WriteBits(0b10101, 5)
	w.WriteBits(0xab, 8)
	data := w.Flush()

	if len(data) != 2 {
		t.Fatalf("expected 2 bytes, got %d", len(data))
	}
	r := NewReader(data)
	v1, _ := r.ReadBits(3)
	v2, _ := r.ReadBits(5)
	v3, _ := r.ReadBits(8)
	if v1 != 0b110 || v2 != 0b10101 || v3 != 0xab {
		t.Fatalf("mismatch: %d %d %d", v1, v2, v3)
	}
}

These will pass once Reader is implemented. Move on.

Step 6 — the 64-bit cache boundary

A value that straddles the 64-bit cache boundary (e.g., 4 bits remaining in the cache, writing 8 bits) exercises flush64 on the write side.

func TestStraddle64BitBoundary(t *testing.T) {
	// Write 60 bits then 8 bits — the 8-bit value straddles the cache boundary.
	var w Writer
	w.WriteBits(0, 60)
	w.WriteBits(0b10110101, 8)
	data := w.Flush()

	r := NewReader(data)
	r.ReadBits(60)
	v, err := r.ReadBits(8)
	if err != nil {
		t.Fatal(err)
	}
	if v != 0b10110101 {
		t.Fatalf("expected 181, got %d", v)
	}
}

This test will pass once ReadBits is implemented. Leave it for now.

Step 7 — Reader: why a bit-position cursor

Before writing the Reader, it is worth understanding why the design uses a plain integer cursor (bitPos) rather than the more obvious uint64 read-cache approach.

A cache-based Reader would look like:

cache uint64  — holds the next bits to read, left-aligned
valid uint8   — how many bits are loaded in cache

To read from a cache, you would shift: shift = 56 - valid. But valid is a uint8. If valid is 7, shift = 56 - 7 = 49. Fine. If valid is 3, shift = 56 - 3 = 53. Fine. But if the cache refill logic ever lets valid reach 0 before you check, 56 - 0 = 56 — but as a uint8, 56 - 64 = 248. You are now reading bits from the wrong position, silently.

The bit-position cursor has no subtraction that can underflow. bitPos advances by n each call. byteIdx = bitPos >> 3. bitOff = bitPos & 7. Both are plain positive integers. There is no opportunity for wrap-around.

func TestReadBit(t *testing.T) {
	// 0xb2 = 10110010
	r := NewReader([]byte{0xb2})
	expected := []bool{true, false, true, true, false, false, true, false}
	for i, want := range expected {
		got, err := r.ReadBit()
		if err != nil {
			t.Fatalf("bit %d: %v", i, err)
		}
		if got != want {
			t.Fatalf("bit %d: got %v, want %v", i, got, want)
		}
	}
}

$ go test
TestReadBit: bit 0: got false, want true

Replace the stub Reader with the real implementation:

// Reader reads bits MSB-first from a byte slice.
type Reader struct {
	data   []byte
	bitPos int // MSB-first bit offset into data (0 = MSB of data[0])
}

func NewReader(data []byte) *Reader {
	return &Reader{data: data}
}

func (r *Reader) ReadBit() (bool, error) {
	if r.bitPos >= len(r.data)*8 {
		return false, io.ErrUnexpectedEOF
	}
	// Within a byte, bit 0 is the MSB (bit 7 in Go's notation) and bit 7 is the LSB.
	// bitPos>>3 = which byte; bitPos&7 = which bit within that byte (0=MSB).
	bit := (r.data[r.bitPos>>3] >> (7 - uint(r.bitPos&7))) & 1
	r.bitPos++
	return bit != 0, nil
}

Add "io" to the import:

import "io"

$ go test
PASS  (TestReadBit passes; round-trip tests still failing due to ReadBits stub)

Step 8 — ReadBits

ReadBits(n) reads n bits and returns them right-aligned in a uint64. It works byte by byte, taking only as many bits from each byte as needed.

data = [0b10110010, 0xab]
bitPos = 3, n = 9

Iteration 1: byteIdx=0, bitOff=3, avail=5, take=5
  extract bits 3-7 of 0b10110010: 0b10010 = 18
  v = 18, pos=8, rem=4

Iteration 2: byteIdx=1, bitOff=0, avail=8, take=4
  extract bits 0-3 of 0xab (=0b10101011): 0b1010 = 10
  v = (18 << 4) | 10 = 298, pos=12, rem=0

bitPos = 12

func (r *Reader) ReadBits(n uint8) (uint64, error) {
	end := r.bitPos + int(n)
	if end > len(r.data)*8 {
		return 0, io.ErrUnexpectedEOF
	}
	var v uint64
	pos := r.bitPos
	rem := int(n)
	for rem > 0 {
		byteIdx := pos >> 3        // which byte
		bitOff  := pos & 7         // bit offset within byte (0 = MSB)
		avail   := 8 - bitOff      // remaining bits in this byte
		take    := rem
		if take > avail {
			take = avail
		}
		// Shift right to remove trailing bits, mask to `take` bits.
		bits := uint64((r.data[byteIdx] >> uint(avail-take)) & byte((1<<take)-1))
		v = (v << uint(take)) | bits
		pos += take
		rem -= take
	}
	r.bitPos = end
	return v, nil
}

$ go test
PASS

All the earlier round-trip and boundary tests now pass too.

Step 9 — EOF errors

Reading past the end of the buffer must return io.ErrUnexpectedEOF, not a panic or wrong data.

func TestReadBitEOF(t *testing.T) {
	r := NewReader([]byte{0xff})
	r.ReadBits(8) // consume all 8 bits
	_, err := r.ReadBit()
	if err != io.ErrUnexpectedEOF {
		t.Fatalf("expected ErrUnexpectedEOF, got %v", err)
	}
}

func TestReadBitsPartialEOF(t *testing.T) {
	r := NewReader([]byte{0xab})
	_, err := r.ReadBits(9) // only 8 bits available
	if err != io.ErrUnexpectedEOF {
		t.Fatalf("expected ErrUnexpectedEOF, got %v", err)
	}
}

$ go test
PASS

Both are already handled: ReadBit checks bitPos >= len(data)*8 and ReadBits checks end > len(data)*8 before doing any work.

Step 10 — Snapshot: non-destructive read

Flush is destructive — it drains the cache, resets all state, and returns the bytes. Sometimes you need the current bytes without resetting the writer (for example, to take a live snapshot of an in-progress encoding). That is what Snapshot is for.

The implementation clones the writer and calls Flush on the clone:

func TestSnapshot(t *testing.T) {
	var w Writer
	w.WriteBits(0xde, 8)
	w.WriteBits(0xad, 8)

	snap1 := w.Snapshot() // 0xde 0xad

	w.WriteBits(0xbe, 8)

	snap2 := w.Snapshot() // 0xde 0xad 0xbe

	// Original writer is unaffected — can keep writing.
	w.WriteBits(0xef, 8)
	final := w.Flush() // 0xde 0xad 0xbe 0xef

	if len(snap1) != 2 || snap1[0] != 0xde || snap1[1] != 0xad {
		t.Fatalf("snap1 wrong: % x", snap1)
	}
	if len(snap2) != 3 || snap2[2] != 0xbe {
		t.Fatalf("snap2 wrong: % x", snap2)
	}
	if len(final) != 4 || final[3] != 0xef {
		t.Fatalf("final wrong: % x", final)
	}
}

$ go test
./bitpack_test.go: undefined: w.Snapshot

// Snapshot returns the current accumulated bytes without resetting the Writer.
// Any partial byte is zero-padded to a boundary. The Writer remains valid for
// further WriteBit/WriteBits calls after Snapshot returns.
func (w *Writer) Snapshot() []byte {
	clone := Writer{cache: w.cache, valid: w.valid}
	clone.buf = make([]byte, len(w.buf))
	copy(clone.buf, w.buf)
	return clone.Flush()
}

$ go test
PASS

Complete file listing

bitpack.go

package bitpack

import "io"

// Writer packs bits into a uint64 cache and emits whole bytes MSB-first.
// The zero value is ready to use.
type Writer struct {
	cache uint64
	valid uint8
	buf   []byte
}

func (w *Writer) WriteBit(v bool) {
	if v {
		w.cache |= 1 << (63 - w.valid)
	}
	w.valid++
	if w.valid == 64 {
		w.flush64()
	}
}

func (w *Writer) WriteBits(v uint64, n uint8) {
	v <<= (64 - n)
	avail := 64 - w.valid
	if n <= avail {
		w.cache |= v >> w.valid
		w.valid += n
		if w.valid == 64 {
			w.flush64()
		}
		return
	}
	w.cache |= v >> w.valid
	w.valid = 64
	w.flush64()
	remaining := n - avail
	w.cache = v << avail
	w.valid = remaining
}

func (w *Writer) Flush() []byte {
	for w.valid > 0 {
		w.buf = append(w.buf, byte(w.cache>>56))
		w.cache <<= 8
		if w.valid >= 8 {
			w.valid -= 8
		} else {
			w.valid = 0
		}
	}
	out := w.buf
	w.buf = nil
	w.cache = 0
	w.valid = 0
	return out
}

func (w *Writer) Snapshot() []byte {
	clone := Writer{cache: w.cache, valid: w.valid}
	clone.buf = make([]byte, len(w.buf))
	copy(clone.buf, w.buf)
	return clone.Flush()
}

func (w *Writer) flush64() {
	w.buf = append(w.buf,
		byte(w.cache>>56),
		byte(w.cache>>48),
		byte(w.cache>>40),
		byte(w.cache>>32),
		byte(w.cache>>24),
		byte(w.cache>>16),
		byte(w.cache>>8),
		byte(w.cache),
	)
	w.cache = 0
	w.valid = 0
}

// Reader reads bits MSB-first from a byte slice.
type Reader struct {
	data   []byte
	bitPos int
}

func NewReader(data []byte) *Reader {
	return &Reader{data: data}
}

func (r *Reader) ReadBit() (bool, error) {
	if r.bitPos >= len(r.data)*8 {
		return false, io.ErrUnexpectedEOF
	}
	bit := (r.data[r.bitPos>>3] >> (7 - uint(r.bitPos&7))) & 1
	r.bitPos++
	return bit != 0, nil
}

func (r *Reader) ReadBits(n uint8) (uint64, error) {
	end := r.bitPos + int(n)
	if end > len(r.data)*8 {
		return 0, io.ErrUnexpectedEOF
	}
	var v uint64
	pos := r.bitPos
	rem := int(n)
	for rem > 0 {
		byteIdx := pos >> 3
		bitOff  := pos & 7
		avail   := 8 - bitOff
		take    := rem
		if take > avail {
			take = avail
		}
		bits := uint64((r.data[byteIdx] >> uint(avail-take)) & byte((1<<take)-1))
		v = (v << uint(take)) | bits
		pos += take
		rem -= take
	}
	r.bitPos = end
	return v, nil
}

What you built

github.com/tychodb/bitpack — a generic bit-packing library:

Type	Method	Purpose
`Writer`	`WriteBit(bool)`	Pack a single bit
`Writer`	`WriteBits(uint64, uint8)`	Pack n bits, no per-bit loop
`Writer`	`Flush() []byte`	Return bytes, reset writer
`Writer`	`Snapshot() []byte`	Return bytes, keep writer live
`Reader`	`ReadBit() (bool, error)`	Read one bit
`Reader`	`ReadBits(uint8) (uint64, error)`	Read n bits

Push to github.com/tychodb/bitpack, tag v0.1.0, then add to TychoDB:

go get github.com/tychodb/bitpack@v0.1.0

TychoDB's chunk.go imports bitpack.Writer and bitpack.Reader in place of its former inline BitWriter/BitReader types. The Snapshot method replaces the manual clone-and-flush pattern in Chunk.Bytes().