bitpack: Build a Go Bit-Packing Library
A test-driven guide to building github.com/tychodb/bitpack — a small, zero-dependency library for packing arbitrary numbers of bits into a byte slice and reading them back.
By the end you will have a Writer and a Reader that TychoDB imports as its first external dependency, replacing the inline BitWriter/BitReader types that lived in bits.go.
What we are building
bitpack/
bitpack.go — Writer, Reader
bitpack_test.go — all tests
Two types:
Writer — accumulates bits MSB-first into a uint64 cache; emits whole bytes as the cache fills
Reader — walks a byte slice bit by bit using an integer cursor
Three things that make this non-trivial:
- MSB-first packing — the first bit written lands in the most-significant position of the first byte. Decoders read left to right just like humans read binary.
- uint64 cache in the writer —
WriteBits(v, n)fills or spans a cache word with a single OR+shift pair instead of a per-bit loop. Important when you are writing thousands of variable-length values per second. - Integer cursor in the reader — a simpler design would use a uint64 read-cache with a
validcounter. That approach has a silent overflow bug when fewer than 8 bits remain; the bit-position cursor avoids it entirely.
Setup
mkdir bitpack && cd bitpack
go mod init github.com/tychodb/bitpack
touch bitpack.go bitpack_test.go
bitpack.go
package bitpack
bitpack_test.go
package bitpack
$ go test
[no test files]
Step 1 — Writer struct and MSB-first layout
Before writing any logic, establish how the bit layout works.
MSB-first means the first bit you write ends up as the most-significant bit (bit 7) of the first output byte. Write 1 then 0 and the first byte starts 10...... — exactly how binary is written on paper.
The Writer keeps an internal uint64 cache. Bits are packed into the high end of the cache: the first WriteBit call lands in bit 63, the second in bit 62, and so on. When the cache fills completely (all 64 bits used), those 8 bytes are appended to buf and the cache is cleared.
bitpack_test.go
import "testing"
func TestWriteSingleOne(t *testing.T) {
var w Writer
w.WriteBit(true)
data := w.Flush()
// A single 1 bit, MSB-first, zero-padded to one byte → 1000_0000 = 0x80
if len(data) != 1 {
t.Fatalf("expected 1 byte, got %d", len(data))
}
if data[0] != 0x80 {
t.Fatalf("expected 0x80, got 0x%02x", data[0])
}
}
$ go test
./bitpack_test.go: undefined: Writer
bitpack.go
package bitpack
// Writer packs bits into a uint64 cache and emits whole bytes MSB-first.
// The zero value is ready to use — no constructor required.
type Writer struct {
cache uint64
valid uint8 // number of bits currently occupied in cache (0–64)
buf []byte
}
func (w *Writer) WriteBit(v bool) {}
func (w *Writer) Flush() []byte { return nil }
$ go test
TestWriteSingleOne: expected 1 byte, got 0
Good — it compiles, fails with the right message. Now implement.
WriteBit places the bit at position 63 - valid in the cache. When valid == 0 (empty), 63 - 0 = 63 — the MSB. When valid == 1, it goes to bit 62, and so on. Each call increments valid.
Flush drains whatever is in the cache one byte at a time by reading the top byte (cache >> 56) and shifting the cache left by 8. Any bits below a full byte are zero-padded automatically because the unfilled low bits of the cache are already zero.
func (w *Writer) WriteBit(v bool) {
if v {
w.cache |= 1 << (63 - w.valid)
}
w.valid++
if w.valid == 64 {
w.flush64()
}
}
func (w *Writer) Flush() []byte {
for w.valid > 0 {
w.buf = append(w.buf, byte(w.cache>>56))
w.cache <<= 8
if w.valid >= 8 {
w.valid -= 8
} else {
w.valid = 0
}
}
out := w.buf
w.buf = nil
w.cache = 0
w.valid = 0
return out
}
func (w *Writer) flush64() {
w.buf = append(w.buf,
byte(w.cache>>56),
byte(w.cache>>48),
byte(w.cache>>40),
byte(w.cache>>32),
byte(w.cache>>24),
byte(w.cache>>16),
byte(w.cache>>8),
byte(w.cache),
)
w.cache = 0
w.valid = 0
}
flush64 is a separate function for one reason: when valid reaches exactly 64 we know the cache holds exactly 8 bytes, so we can append them in one statement without the byte-by-byte loop in Flush. WriteBit calls it on the hot path.
$ go test
PASS
Step 2 — WriteBit in detail
Eight bits produce exactly one byte.
func TestWriteBit(t *testing.T) {
var w Writer
// 1 0 1 1 0 0 1 0 = 0b10110010 = 0xb2
for _, b := range []bool{true, false, true, true, false, false, true, false} {
w.WriteBit(b)
}
data := w.Flush()
if len(data) != 1 {
t.Fatalf("expected 1 byte, got %d", len(data))
}
if data[0] != 0b10110010 {
t.Fatalf("expected 0xb2, got 0x%02x", data[0])
}
}
$ go test
PASS
Walk through what happens bit by bit:
valid=0: write 1 → cache |= 1<<63 → cache = 1000...0
valid=1: write 0 → no-op → cache = 1000...0
valid=2: write 1 → cache |= 1<<61 → cache = 1010...0
valid=3: write 1 → cache |= 1<<60 → cache = 1011...0
...
valid=7: write 0 → no-op → cache = 10110010 000...0
Flush: cache >> 56 = 0b10110010 = 0xb2. ✓
Step 3 — Flush resets the Writer
Flush must return the accumulated bytes and reset all state so the Writer can be used again.
func TestFlushResetsWriter(t *testing.T) {
var w Writer
w.WriteBits(0xff, 8)
w.Flush()
// Second use after flush — should behave as if fresh.
w.WriteBits(0xab, 8)
data := w.Flush()
if len(data) != 1 || data[0] != 0xab {
t.Fatalf("after reset: got % x", data)
}
}
This test uses WriteBits which doesn't exist yet. Add a stub so it compiles:
func (w *Writer) WriteBits(v uint64, n uint8) {}
$ go test
TestFlushResetsWriter: after reset: got []
Flush already resets cache and buf. The stub WriteBits writes nothing, so the second Flush returns empty. Implement WriteBits in step 4 and this test will pass automatically.
Step 4 — WriteBits: packing multiple bits at once
WriteBits(v, n) writes the low n bits of v, MSB first, without a per-bit loop.
The trick: align v to the top of a uint64 before doing anything else.
v = 0b101 n = 3
v <<= (64 - 3) = v <<= 61
v is now: 101 followed by 61 zero bits
Now v >> valid drops those top bits into exactly the right position in the cache.
func TestWriteBitsRoundTrip(t *testing.T) {
cases := []struct {
bits uint8
val uint64
}{
{1, 0}, {1, 1},
{7, 0x5a},
{8, 0xff},
{9, 0x1ab},
{16, 0xdead},
{32, 0xdeadbeef},
{64, 0xdeadbeefcafebabe},
}
for _, tc := range cases {
var w Writer
w.WriteBits(tc.val, tc.bits)
data := w.Flush()
r := NewReader(data)
got, err := r.ReadBits(tc.bits)
if err != nil {
t.Fatalf("bits=%d val=%d: %v", tc.bits, tc.val, err)
}
if got != tc.val {
t.Fatalf("bits=%d: wrote %d, read %d", tc.bits, tc.val, got)
}
}
}
This test also uses NewReader and ReadBits (step 5). Add stubs for now:
type Reader struct{}
func NewReader(data []byte) *Reader { return &Reader{} }
func (r *Reader) ReadBit() (bool, error) { return false, nil }
func (r *Reader) ReadBits(n uint8) (uint64, error) { return 0, nil }
$ go test
TestWriteBitsRoundTrip: bits=1 val=1: wrote 1, read 0
Implement WriteBits:
func (w *Writer) WriteBits(v uint64, n uint8) {
v <<= (64 - n) // align the low n bits to the top of a uint64
avail := 64 - w.valid
if n <= avail {
// All n bits fit in the current cache slot.
w.cache |= v >> w.valid
w.valid += n
if w.valid == 64 {
w.flush64()
}
return
}
// The bits span a cache boundary.
// Fill the remaining cache space with the top `avail` bits of v,
// flush the full cache, then place the leftover bits in the fresh cache.
w.cache |= v >> w.valid
w.valid = 64
w.flush64()
remaining := n - avail
w.cache = v << avail
w.valid = remaining
}
The cache-boundary case is worth reading carefully:
cache is 60 bits full (valid=60), avail=4.
Write 8 bits: 10110101
v after align: 10110101 followed by 56 zeroes
v >> 60 = 0000 1011 → fills the last 4 slots: cache is now full → flush64
remaining = 8-4 = 4 bits
v << 4 = 0101 followed by 60 zeroes → sits at top of fresh cache
valid = 4
Step 5 — multi-value write and byte-boundary alignment
Values of different widths written back to back must round-trip correctly, including when they straddle byte boundaries.
func TestMultipleValues(t *testing.T) {
var w Writer
w.WriteBits(0b101, 3)
w.WriteBits(0b1010, 4)
w.WriteBits(0xff, 8)
w.WriteBits(0b1, 1)
data := w.Flush()
r := NewReader(data)
v1, _ := r.ReadBits(3)
v2, _ := r.ReadBits(4)
v3, _ := r.ReadBits(8)
v4, _ := r.ReadBits(1)
if v1 != 0b101 || v2 != 0b1010 || v3 != 0xff || v4 != 1 {
t.Fatalf("got %d %d %d %d", v1, v2, v3, v4)
}
}
func TestByteBoundaryAlignment(t *testing.T) {
// 3 + 5 + 8 = 16 bits — straddles two bytes.
var w Writer
w.WriteBits(0b110, 3)
w.WriteBits(0b10101, 5)
w.WriteBits(0xab, 8)
data := w.Flush()
if len(data) != 2 {
t.Fatalf("expected 2 bytes, got %d", len(data))
}
r := NewReader(data)
v1, _ := r.ReadBits(3)
v2, _ := r.ReadBits(5)
v3, _ := r.ReadBits(8)
if v1 != 0b110 || v2 != 0b10101 || v3 != 0xab {
t.Fatalf("mismatch: %d %d %d", v1, v2, v3)
}
}
These will pass once Reader is implemented. Move on.
Step 6 — the 64-bit cache boundary
A value that straddles the 64-bit cache boundary (e.g., 4 bits remaining in the cache, writing 8 bits) exercises flush64 on the write side.
func TestStraddle64BitBoundary(t *testing.T) {
// Write 60 bits then 8 bits — the 8-bit value straddles the cache boundary.
var w Writer
w.WriteBits(0, 60)
w.WriteBits(0b10110101, 8)
data := w.Flush()
r := NewReader(data)
r.ReadBits(60)
v, err := r.ReadBits(8)
if err != nil {
t.Fatal(err)
}
if v != 0b10110101 {
t.Fatalf("expected 181, got %d", v)
}
}
This test will pass once ReadBits is implemented. Leave it for now.
Step 7 — Reader: why a bit-position cursor
Before writing the Reader, it is worth understanding why the design uses a plain integer cursor (bitPos) rather than the more obvious uint64 read-cache approach.
A cache-based Reader would look like:
cache uint64 — holds the next bits to read, left-aligned
valid uint8 — how many bits are loaded in cache
To read from a cache, you would shift: shift = 56 - valid. But valid is a uint8. If valid is 7, shift = 56 - 7 = 49. Fine. If valid is 3, shift = 56 - 3 = 53. Fine. But if the cache refill logic ever lets valid reach 0 before you check, 56 - 0 = 56 — but as a uint8, 56 - 64 = 248. You are now reading bits from the wrong position, silently.
The bit-position cursor has no subtraction that can underflow. bitPos advances by n each call. byteIdx = bitPos >> 3. bitOff = bitPos & 7. Both are plain positive integers. There is no opportunity for wrap-around.
func TestReadBit(t *testing.T) {
// 0xb2 = 10110010
r := NewReader([]byte{0xb2})
expected := []bool{true, false, true, true, false, false, true, false}
for i, want := range expected {
got, err := r.ReadBit()
if err != nil {
t.Fatalf("bit %d: %v", i, err)
}
if got != want {
t.Fatalf("bit %d: got %v, want %v", i, got, want)
}
}
}
$ go test
TestReadBit: bit 0: got false, want true
Replace the stub Reader with the real implementation:
// Reader reads bits MSB-first from a byte slice.
type Reader struct {
data []byte
bitPos int // MSB-first bit offset into data (0 = MSB of data[0])
}
func NewReader(data []byte) *Reader {
return &Reader{data: data}
}
func (r *Reader) ReadBit() (bool, error) {
if r.bitPos >= len(r.data)*8 {
return false, io.ErrUnexpectedEOF
}
// Within a byte, bit 0 is the MSB (bit 7 in Go's notation) and bit 7 is the LSB.
// bitPos>>3 = which byte; bitPos&7 = which bit within that byte (0=MSB).
bit := (r.data[r.bitPos>>3] >> (7 - uint(r.bitPos&7))) & 1
r.bitPos++
return bit != 0, nil
}
Add "io" to the import:
import "io"
$ go test
PASS (TestReadBit passes; round-trip tests still failing due to ReadBits stub)
Step 8 — ReadBits
ReadBits(n) reads n bits and returns them right-aligned in a uint64. It works byte by byte, taking only as many bits from each byte as needed.
data = [0b10110010, 0xab]
bitPos = 3, n = 9
Iteration 1: byteIdx=0, bitOff=3, avail=5, take=5
extract bits 3-7 of 0b10110010: 0b10010 = 18
v = 18, pos=8, rem=4
Iteration 2: byteIdx=1, bitOff=0, avail=8, take=4
extract bits 0-3 of 0xab (=0b10101011): 0b1010 = 10
v = (18 << 4) | 10 = 298, pos=12, rem=0
bitPos = 12
func (r *Reader) ReadBits(n uint8) (uint64, error) {
end := r.bitPos + int(n)
if end > len(r.data)*8 {
return 0, io.ErrUnexpectedEOF
}
var v uint64
pos := r.bitPos
rem := int(n)
for rem > 0 {
byteIdx := pos >> 3 // which byte
bitOff := pos & 7 // bit offset within byte (0 = MSB)
avail := 8 - bitOff // remaining bits in this byte
take := rem
if take > avail {
take = avail
}
// Shift right to remove trailing bits, mask to `take` bits.
bits := uint64((r.data[byteIdx] >> uint(avail-take)) & byte((1<<take)-1))
v = (v << uint(take)) | bits
pos += take
rem -= take
}
r.bitPos = end
return v, nil
}
$ go test
PASS
All the earlier round-trip and boundary tests now pass too.
Step 9 — EOF errors
Reading past the end of the buffer must return io.ErrUnexpectedEOF, not a panic or wrong data.
func TestReadBitEOF(t *testing.T) {
r := NewReader([]byte{0xff})
r.ReadBits(8) // consume all 8 bits
_, err := r.ReadBit()
if err != io.ErrUnexpectedEOF {
t.Fatalf("expected ErrUnexpectedEOF, got %v", err)
}
}
func TestReadBitsPartialEOF(t *testing.T) {
r := NewReader([]byte{0xab})
_, err := r.ReadBits(9) // only 8 bits available
if err != io.ErrUnexpectedEOF {
t.Fatalf("expected ErrUnexpectedEOF, got %v", err)
}
}
$ go test
PASS
Both are already handled: ReadBit checks bitPos >= len(data)*8 and ReadBits checks end > len(data)*8 before doing any work.
Step 10 — Snapshot: non-destructive read
Flush is destructive — it drains the cache, resets all state, and returns the bytes. Sometimes you need the current bytes without resetting the writer (for example, to take a live snapshot of an in-progress encoding). That is what Snapshot is for.
The implementation clones the writer and calls Flush on the clone:
func TestSnapshot(t *testing.T) {
var w Writer
w.WriteBits(0xde, 8)
w.WriteBits(0xad, 8)
snap1 := w.Snapshot() // 0xde 0xad
w.WriteBits(0xbe, 8)
snap2 := w.Snapshot() // 0xde 0xad 0xbe
// Original writer is unaffected — can keep writing.
w.WriteBits(0xef, 8)
final := w.Flush() // 0xde 0xad 0xbe 0xef
if len(snap1) != 2 || snap1[0] != 0xde || snap1[1] != 0xad {
t.Fatalf("snap1 wrong: % x", snap1)
}
if len(snap2) != 3 || snap2[2] != 0xbe {
t.Fatalf("snap2 wrong: % x", snap2)
}
if len(final) != 4 || final[3] != 0xef {
t.Fatalf("final wrong: % x", final)
}
}
$ go test
./bitpack_test.go: undefined: w.Snapshot
// Snapshot returns the current accumulated bytes without resetting the Writer.
// Any partial byte is zero-padded to a boundary. The Writer remains valid for
// further WriteBit/WriteBits calls after Snapshot returns.
func (w *Writer) Snapshot() []byte {
clone := Writer{cache: w.cache, valid: w.valid}
clone.buf = make([]byte, len(w.buf))
copy(clone.buf, w.buf)
return clone.Flush()
}
$ go test
PASS
Complete file listing
bitpack.go
package bitpack
import "io"
// Writer packs bits into a uint64 cache and emits whole bytes MSB-first.
// The zero value is ready to use.
type Writer struct {
cache uint64
valid uint8
buf []byte
}
func (w *Writer) WriteBit(v bool) {
if v {
w.cache |= 1 << (63 - w.valid)
}
w.valid++
if w.valid == 64 {
w.flush64()
}
}
func (w *Writer) WriteBits(v uint64, n uint8) {
v <<= (64 - n)
avail := 64 - w.valid
if n <= avail {
w.cache |= v >> w.valid
w.valid += n
if w.valid == 64 {
w.flush64()
}
return
}
w.cache |= v >> w.valid
w.valid = 64
w.flush64()
remaining := n - avail
w.cache = v << avail
w.valid = remaining
}
func (w *Writer) Flush() []byte {
for w.valid > 0 {
w.buf = append(w.buf, byte(w.cache>>56))
w.cache <<= 8
if w.valid >= 8 {
w.valid -= 8
} else {
w.valid = 0
}
}
out := w.buf
w.buf = nil
w.cache = 0
w.valid = 0
return out
}
func (w *Writer) Snapshot() []byte {
clone := Writer{cache: w.cache, valid: w.valid}
clone.buf = make([]byte, len(w.buf))
copy(clone.buf, w.buf)
return clone.Flush()
}
func (w *Writer) flush64() {
w.buf = append(w.buf,
byte(w.cache>>56),
byte(w.cache>>48),
byte(w.cache>>40),
byte(w.cache>>32),
byte(w.cache>>24),
byte(w.cache>>16),
byte(w.cache>>8),
byte(w.cache),
)
w.cache = 0
w.valid = 0
}
// Reader reads bits MSB-first from a byte slice.
type Reader struct {
data []byte
bitPos int
}
func NewReader(data []byte) *Reader {
return &Reader{data: data}
}
func (r *Reader) ReadBit() (bool, error) {
if r.bitPos >= len(r.data)*8 {
return false, io.ErrUnexpectedEOF
}
bit := (r.data[r.bitPos>>3] >> (7 - uint(r.bitPos&7))) & 1
r.bitPos++
return bit != 0, nil
}
func (r *Reader) ReadBits(n uint8) (uint64, error) {
end := r.bitPos + int(n)
if end > len(r.data)*8 {
return 0, io.ErrUnexpectedEOF
}
var v uint64
pos := r.bitPos
rem := int(n)
for rem > 0 {
byteIdx := pos >> 3
bitOff := pos & 7
avail := 8 - bitOff
take := rem
if take > avail {
take = avail
}
bits := uint64((r.data[byteIdx] >> uint(avail-take)) & byte((1<<take)-1))
v = (v << uint(take)) | bits
pos += take
rem -= take
}
r.bitPos = end
return v, nil
}
What you built
github.com/tychodb/bitpack — a generic bit-packing library:
| Type | Method | Purpose |
|---|---|---|
Writer |
WriteBit(bool) |
Pack a single bit |
Writer |
WriteBits(uint64, uint8) |
Pack n bits, no per-bit loop |
Writer |
Flush() []byte |
Return bytes, reset writer |
Writer |
Snapshot() []byte |
Return bytes, keep writer live |
Reader |
ReadBit() (bool, error) |
Read one bit |
Reader |
ReadBits(uint8) (uint64, error) |
Read n bits |
Push to github.com/tychodb/bitpack, tag v0.1.0, then add to TychoDB:
go get github.com/tychodb/bitpack@v0.1.0
TychoDB's chunk.go imports bitpack.Writer and bitpack.Reader in place of its former inline BitWriter/BitReader types. The Snapshot method replaces the manual clone-and-flush pattern in Chunk.Bytes().