Performance Optimization¶

This guide focuses on GoVector performance optimization practices, covering HNSW index parameter tuning (M value, efConstruction, efSearch), memory usage optimization (vector quantization, caching and garbage collection), disk I/O optimization (batch writing, index rebuild strategy, storage format) and benchmark testing and performance analysis methods. The document also provides suggestions for large-scale data processing and scalability, and empirical explanations combined with benchmark tools and implementation details in the repository.

Project Structure¶

GoVector uses a layered organization of "core library + benchmark tool + storage engine": - Core library: Collection management, HNSW/Flat index, distance metrics, filters, quantizer, storage interface - Benchmark tool: Command-line benchmark suite, supports building and querying at different scales and dimensions - Storage engine: Local persistence based on bbolt, combined with Protobuf serialization

graph TB
subgraph "Application Layer"
App["Application/Service"]
end
subgraph "Core Library(core)
Col["Collection
Collection management"]
HNSW["HNSWIndex
Graph index"]
Flat["FlatIndex
Linear index"]
Dist["Distance
Distance metrics"]
Filt["Filter
Filters"]
Q["Quantizer
Quantizer"]
Store["Storage
Persistence"]
end
subgraph "External Dependencies"
BB["bbolt
BoltDB"]
PB["Protobuf"]
HnswLib["coder/hnsw
HNSW graph library"]
end
App --> Col
Col --> HNSW
Col --> Flat
Col --> Store
HNSW --> Dist
HNSW --> Filt
Store --> BB
Store --> PB
HNSW -.-> HnswLib

Core Components¶

Collection: Unified management of vector dimension, distance metric, index type (HNSW/Flat), and coordinates consistency between storage and in-memory index
HNSWIndex: Encapsulates underlying HNSW graph library, supports Cosine/Euclid/Dot three distance metrics, configurable M, EfConstruction, EfSearch, TopK
Storage: Local persistence based on bbolt, supports Protobuf serialization and optional vector quantization compression
SQ8Quantizer Quantizer: Compresses 32-bit floating-point vectors to 8-bit integers, saving disk and memory usage
Filter: Supports exact match, range, prefix, contains, regex and other conditions, used for post-filtering during retrieval

Architecture Overview¶

The diagram below shows the overall flow from application to index to storage, and the position of key parameters in the flow.

sequenceDiagram
participant App as "Application"
participant Col as "Collection"
participant Idx as "VectorIndex(HNSW/Flat)"]
participant St as "Storage(BoltDB)"]
App->>Col : "Upsert(points)"
Col->>St : "EnsureCollection + SaveMetadata"
Col->>St : "UpsertPoints(optional quantization)"
Col->>Idx : "Upsert(points)"
Note over Col,Idx : "In-memory index and storage remain consistent"
App->>Col : "Search(query, filter, topK)"
Col->>Idx : "Search(query, filter, topK)"
Idx-->>Col : "ScoredPoint list"
Col-->>App : "Return results"

Detailed Component Analysis¶

HNSW Parameters and Search Flow¶

Key parameters
M: Maximum connections per node, affecting graph density and query/build complexity
EfConstruction: Dynamic candidate list size during build phase, larger is more refined but build is slower
EfSearch: Dynamic candidate list size during query phase, larger recall is higher but latency increases
K: Number of TopK results to return
Search flow key points
Before query, determine fetchK based on whether filtering is needed (post-filter strategy), to avoid missed detections caused by filtering
First oversample on HNSW graph, then apply filtering and exact scoring in memory map, finally take TopK

flowchart TD
Start(["Start Search"]) --> NeedFilter{"Filter needed?"}
NeedFilter -- No --> FetchK1["fetchK = topK"]
NeedFilter -- Yes --> FetchK2["fetchK = topK * 10
limited not to exceed total"]
FetchK1 --> HNSW["HNSW graph search
return neighbors"]
FetchK2 --> HNSW
HNSW --> PostFilter["Filter by Payload"]
PostFilter --> Recalc["Recalculate exact score for hit points"]
Recalc --> TopK["Take TopK and return"]
TopK --> End(["End"])

Quantizer SQ8 Implementation and Usage¶

Compression strategy: Each vector independently calculates min/max, maps elements to [0,255] interval, stores 8-byte metadata (min/max) + 1 byte per dimension
Usage scenario: When quantization is enabled, compress before writing to storage, decompress on load; can significantly reduce disk and memory usage
Notes: After decompression, compressed fields are removed from payload to save memory

classDiagram
class Quantizer {
    +Quantize(vector []float32) []byte
    +Dequantize(data []byte) []float32
    +GetCompressedSize(dim int) int
}
class SQ8Quantizer {
    +Quantize(vector []float32) []byte
    +Dequantize(data []byte) []float32
    +GetCompressedSize(dim int) int
}
Quantizer <|.. SQ8Quantizer

Storage and Serialization¶

Storage engine: bbolt (BoltDB) as local KV storage, collections isolated by bucket namespace
Serialization: Uses Protobuf for point structure serialization, supports multiple numeric and boolean types
Quantization integration: When enabled, writes compressed bytes to special key in payload, automatically decompresses and cleans the key on read

sequenceDiagram
participant Col as "Collection"
participant St as "Storage"]
participant BB as "bbolt"]
participant PB as "Protobuf"]
Col->>St : "EnsureCollection(name)"
Col->>St : "SaveCollectionMeta(name, meta)"
Col->>St : "UpsertPoints(name, points)"
St->>PB : "Marshal(PointStruct)"
St->>BB : "Put(key=id, value=bytes)"
Note over St,BB : "Optional: write __quantized_vector"

Distance Metrics and Filters¶

Distance metrics: Supports Cosine, Euclid, Dot three types, respectively suitable for different scenarios
Filters: Supports Must/MustNot condition combinations, covering exact, range, prefix, contains, regex, etc.

Dependency Analysis¶

External libraries
coder/hnsw: HNSW graph library, provides efficient approximate nearest neighbor search
go.etcd.io/bbolt: Embedded key-value database, provides local persistence capability
google.golang.org/protobuf: High-performance serialization framework
Internal coupling
Collection uniformly orchestrates index and storage
HNSWIndex encapsulates coder/hnsw and exposes unified interface
Storage is tightly coupled with Protobuf and bbolt, responsible for serialization and persistence

graph LR
Col["core/collection.go"] --> HNSW["core/hnsw_index.go"]
Col --> Store["core/storage.go"]
HNSW --> HnswLib["github.com/coder/hnsw"]
Store --> BB["go.etcd.io/bbolt"]
Store --> PB["google.golang.org/protobuf"]

Performance Considerations and Optimization Strategies¶

HNSW Parameter Tuning Strategy¶

M (maximum connections per node)
Impact: Larger M results in denser graph, improved recall but increased memory and build time
Suggestion: Default 16; can reduce to 8-12 for small scale or memory-constrained scenarios; can try 24-32 for large-scale high-recall scenarios
efConstruction (build phase candidate list)
Impact: Larger results in more refined build, higher index quality but significantly increased build time
Suggestion: Default 200; can reduce to 100-150 for build-latency-sensitive scenarios; can increase to 300-500 for recall-sensitive scenarios
efSearch (query phase candidate list)
Impact: Larger recall is higher, average latency increases; proportional to TopK
Suggestion: Default 64; can reduce to 32-64 for low-latency scenarios; can increase to 128-256 for high-recall scenarios
TopK (return count)
Impact: Directly affects post-filter fetchK scale and final sorting cost
Suggestion: Set according to business needs; if filtering is heavy, appropriately increase fetchK to ensure recall

Memory Usage Optimization¶

Vector quantization (SQ8)
Compression ratio: 1 byte per dimension + 8 bytes metadata; typical 128-dimensional vector reduces from 512 bytes to about 136 bytes
Usage suggestion: Enabling quantization can significantly reduce memory and disk usage; automatically decompresses and cleans compressed keys on load
Caching strategy
Index layer: HNSW graph itself is an in-memory cache; control hit rate and latency balance through reasonable efSearch
Application layer: LRU cache hot query results (can be self-implemented at upper layer)
Garbage collection tuning
Benchmark tool demonstrates the practice of actively triggering GC after large-scale runs, helping to stabilize subsequent performance
Suggestion: Manually trigger GC after batch import, before query peaks, during system idle periods; observe memory retreat

Disk I/O Optimization¶

Batch writing
Benchmark tool uses fixed batch size for batch Upsert, reducing transaction overhead and I/O frequency
Suggestion: Set reasonable batch based on available memory (e.g., 10,000 items), avoid loading too much data at once
Index rebuild strategy
When parameters change or data structure changes, can first clear old index, then batch import; avoid frequent incremental updates
Storage format selection
Using Protobuf serialization, small volume, fast parsing; further compressed combined with quantization
Data lifecycle
Regularly clean invalid/expired data to release storage space and index nodes

Benchmark Testing and Performance Analysis¶

Benchmark tool capabilities
Supports building and querying at different scales (10K/100K/1M) and dimensions (default 128)