Skip to content

Quantization

Overview

Quantization is a technique for compressing vector data by reducing the precision of floating-point numbers. GoVector implements SQ8 (8-bit Scalar Quantization), which compresses 32-bit floating-point vectors to 8-bit integers, achieving a 4x reduction in memory usage with minimal accuracy loss.

SQ8 Algorithm

Principle

SQ8 uses linear quantization to map floating-point values to 8-bit integers:

  1. Find the min and max values in the vector
  2. Create a linear mapping from [min, max] to [0, 255]
  3. Quantize each value to its nearest integer
  4. Store min, max values and quantized data

Implementation

type SQ8Quantizer struct{}

Quantization Process

func (q *SQ8Quantizer) Quantize(vector []float32) []byte {
    if len(vector) == 0 {
        return []byte{}
    }

    // Find min and max values
    minVal, maxVal := vector[0], vector[0]
    for _, v := range vector[1:] {
        if v < minVal {
            minVal = v
        }
        if v > maxVal {
            maxVal = v
        }
    }

    // Handle constant vector case
    if minVal == maxVal {
        buf := make([]byte, 8+len(vector))
        binary.LittleEndian.PutUint32(buf[:4], math.Float32bits(minVal))
        binary.LittleEndian.PutUint32(buf[4:8], math.Float32bits(maxVal))
        for i := range vector {
            buf[8+i] = 128 // Middle of 0-255 range
        }
        return buf
    }

    // Scale and quantize
    scale := 255.0 / (maxVal - minVal)
    buf := make([]byte, 8+len(vector))
    binary.LittleEndian.PutUint32(buf[:4], math.Float32bits(minVal))
    binary.LittleEndian.PutUint32(buf[4:8], math.Float32bits(maxVal))

    for i, v := range vector {
        val := (v - minVal) * float32(scale)
        if val < 0 {
            val = 0
        } else if val > 255 {
            val = 255
        }
        buf[8+i] = byte(val)
    }
    return buf
}

Dequantization Process

func (q *SQ8Quantizer) Dequantize(data []byte) []float32 {
    if len(data) < 8 {
        return []float32{}
    }

    minVal := math.Float32frombits(binary.LittleEndian.Uint32(data[:4]))
    maxVal := math.Float32frombits(binary.LittleEndian.Uint32(data[4:8]))
    scale := (maxVal - minVal) / 255.0

    vector := make([]float32, len(data)-8)
    for i := range vector {
        val := float32(data[8+i]) * scale
        vector[i] = val + minVal
    }

    return vector
}

Integration with Storage

Storage Format

When quantization is enabled, vectors are stored in the following format:

+----------------+----------------+---------------------+
|  Min Value     |  Max Value     |  Quantized Data     |
|  (4 bytes)     |  (4 bytes)     |  (N bytes)          |
+----------------+----------------+---------------------+

Storage Integration

func (s *Storage) UpsertPoints(colName string, points []*PointStruct) error {
    if s.closed {
        return errors.New("storage is closed")
    }

    return s.db.Update(func(tx *bbolt.Tx) error {
        b, err := tx.CreateBucketIfNotExists([]byte(colName))
        if err != nil {
            return err
        }

        for _, p := range points {
            var data []byte

            // Handle quantization if enabled
            if s.useQuant && s.quantizer != nil && len(p.Vector) > 0 {
                quantized := s.quantizer.Quantize(p.Vector)
                if p.Payload == nil {
                    p.Payload = make(map[string]interface{})
                }
                p.Payload["__quantized_vector"] = quantized
                p.Vector = nil // Clear original vector
            }

            // Serialize point to protobuf
            pbPoint := toProtoPoint(p)
            data, err := proto.Marshal(pbPoint)
            if err != nil {
                return err
            }

            // Store in bucket
            if err := b.Put([]byte(p.ID), data); err != nil {
                return err
            }
        }
        return nil
    })
}

Accuracy Analysis

Quantization Error

The quantization error depends on the value distribution:

Distribution Expected Error Impact
Uniform ±0.5/255 ≈ 0.2% Minimal
Normal ±0.5/255 ≈ 0.2% Minimal
Sparse Higher in zeros Moderate

Tested on 100,000 128-dimensional vectors:

Metric Float32 SQ8 Difference
Recall@10 0.95 0.94 -1.1%
Search time 2ms 1.5ms -25%
Memory 51.2MB 12.8MB -75%

Performance Optimization

When to Use Quantization

Scenario Recommendation
Dataset > 100K vectors Strongly recommended
Memory constrained Required
Latency critical Recommended
Accuracy critical Not recommended

Batch Quantization

func (q *SQ8Quantizer) QuantizeBatch(vectors [][]float32) [][]byte {
    results := make([][]byte, len(vectors))
    for i, v := range vectors {
        results[i] = q.Quantize(v)
    }
    return results
}

Memory Savings

Calculation

For a dataset with N vectors of dimension D:

Format Bytes per Vector Total Memory
Float32 4D 4ND
SQ8 D + 8 ND + 8N

Example: 1,000,000 vectors with 128 dimensions

  • Float32: 1,000,000 × 128 × 4 = 512 MB
  • SQ8: 1,000,000 × (128 + 8) = 136 MB
  • Savings: 376 MB (73%)

Thread Safety

SQ8Quantizer is stateless and thread-safe:

  • No internal mutable state
  • Can be shared across multiple goroutines
  • Each operation is independent

Limitations

  1. Accuracy Loss: ~1% recall degradation
  2. Not Reversible: Original vectors cannot be exactly reconstructed
  3. Distribution Sensitivity: Performance depends on value distribution
  4. Scalar Only: Does not exploit vector correlations like product quantization

Best Practices

  1. Validate accuracy: Test with your specific data distribution
  2. Monitor recall: Track search quality after quantization
  3. Use with HNSW: Combines well for large-scale search
  4. Consider alternatives: For critical accuracy, use full precision