Performance Optimization¶
This guide focuses on GoVector performance optimization practices, covering HNSW index parameter tuning (M value, efConstruction, efSearch), memory usage optimization (vector quantization, caching and garbage collection), disk I/O optimization (batch writing, index rebuild strategy, storage format) and benchmark testing and performance analysis methods. The document also provides suggestions for large-scale data processing and scalability, and empirical explanations combined with benchmark tools and implementation details in the repository.
Project Structure¶
GoVector uses a layered organization of "core library + benchmark tool + storage engine": - Core library: Collection management, HNSW/Flat index, distance metrics, filters, quantizer, storage interface - Benchmark tool: Command-line benchmark suite, supports building and querying at different scales and dimensions - Storage engine: Local persistence based on bbolt, combined with Protobuf serialization
graph TB
subgraph "Application Layer"
App["Application/Service"]
end
subgraph "Core Library(core)
Col["Collection
Collection management"]
HNSW["HNSWIndex
Graph index"]
Flat["FlatIndex
Linear index"]
Dist["Distance
Distance metrics"]
Filt["Filter
Filters"]
Q["Quantizer
Quantizer"]
Store["Storage
Persistence"]
end
subgraph "External Dependencies"
BB["bbolt
BoltDB"]
PB["Protobuf"]
HnswLib["coder/hnsw
HNSW graph library"]
end
App --> Col
Col --> HNSW
Col --> Flat
Col --> Store
HNSW --> Dist
HNSW --> Filt
Store --> BB
Store --> PB
HNSW -.-> HnswLib
Core Components¶
- Collection: Unified management of vector dimension, distance metric, index type (HNSW/Flat), and coordinates consistency between storage and in-memory index
- HNSWIndex: Encapsulates underlying HNSW graph library, supports Cosine/Euclid/Dot three distance metrics, configurable M, EfConstruction, EfSearch, TopK
- Storage: Local persistence based on bbolt, supports Protobuf serialization and optional vector quantization compression
- SQ8Quantizer Quantizer: Compresses 32-bit floating-point vectors to 8-bit integers, saving disk and memory usage
- Filter: Supports exact match, range, prefix, contains, regex and other conditions, used for post-filtering during retrieval
Architecture Overview¶
The diagram below shows the overall flow from application to index to storage, and the position of key parameters in the flow.
sequenceDiagram
participant App as "Application"
participant Col as "Collection"
participant Idx as "VectorIndex(HNSW/Flat)"]
participant St as "Storage(BoltDB)"]
App->>Col : "Upsert(points)"
Col->>St : "EnsureCollection + SaveMetadata"
Col->>St : "UpsertPoints(optional quantization)"
Col->>Idx : "Upsert(points)"
Note over Col,Idx : "In-memory index and storage remain consistent"
App->>Col : "Search(query, filter, topK)"
Col->>Idx : "Search(query, filter, topK)"
Idx-->>Col : "ScoredPoint list"
Col-->>App : "Return results"
Detailed Component Analysis¶
HNSW Parameters and Search Flow¶
- Key parameters
- M: Maximum connections per node, affecting graph density and query/build complexity
- EfConstruction: Dynamic candidate list size during build phase, larger is more refined but build is slower
- EfSearch: Dynamic candidate list size during query phase, larger recall is higher but latency increases
- K: Number of TopK results to return
- Search flow key points
- Before query, determine fetchK based on whether filtering is needed (post-filter strategy), to avoid missed detections caused by filtering
- First oversample on HNSW graph, then apply filtering and exact scoring in memory map, finally take TopK
flowchart TD
Start(["Start Search"]) --> NeedFilter{"Filter needed?"}
NeedFilter -- No --> FetchK1["fetchK = topK"]
NeedFilter -- Yes --> FetchK2["fetchK = topK * 10
limited not to exceed total"]
FetchK1 --> HNSW["HNSW graph search
return neighbors"]
FetchK2 --> HNSW
HNSW --> PostFilter["Filter by Payload"]
PostFilter --> Recalc["Recalculate exact score for hit points"]
Recalc --> TopK["Take TopK and return"]
TopK --> End(["End"])
Quantizer SQ8 Implementation and Usage¶
- Compression strategy: Each vector independently calculates min/max, maps elements to [0,255] interval, stores 8-byte metadata (min/max) + 1 byte per dimension
- Usage scenario: When quantization is enabled, compress before writing to storage, decompress on load; can significantly reduce disk and memory usage
- Notes: After decompression, compressed fields are removed from payload to save memory
classDiagram
class Quantizer {
+Quantize(vector []float32) []byte
+Dequantize(data []byte) []float32
+GetCompressedSize(dim int) int
}
class SQ8Quantizer {
+Quantize(vector []float32) []byte
+Dequantize(data []byte) []float32
+GetCompressedSize(dim int) int
}
Quantizer <|.. SQ8Quantizer
Storage and Serialization¶
- Storage engine: bbolt (BoltDB) as local KV storage, collections isolated by bucket namespace
- Serialization: Uses Protobuf for point structure serialization, supports multiple numeric and boolean types
- Quantization integration: When enabled, writes compressed bytes to special key in payload, automatically decompresses and cleans the key on read
sequenceDiagram
participant Col as "Collection"
participant St as "Storage"]
participant BB as "bbolt"]
participant PB as "Protobuf"]
Col->>St : "EnsureCollection(name)"
Col->>St : "SaveCollectionMeta(name, meta)"
Col->>St : "UpsertPoints(name, points)"
St->>PB : "Marshal(PointStruct)"
St->>BB : "Put(key=id, value=bytes)"
Note over St,BB : "Optional: write __quantized_vector"
Distance Metrics and Filters¶
- Distance metrics: Supports Cosine, Euclid, Dot three types, respectively suitable for different scenarios
- Filters: Supports Must/MustNot condition combinations, covering exact, range, prefix, contains, regex, etc.
Dependency Analysis¶
- External libraries
- coder/hnsw: HNSW graph library, provides efficient approximate nearest neighbor search
- go.etcd.io/bbolt: Embedded key-value database, provides local persistence capability
- google.golang.org/protobuf: High-performance serialization framework
- Internal coupling
- Collection uniformly orchestrates index and storage
- HNSWIndex encapsulates coder/hnsw and exposes unified interface
- Storage is tightly coupled with Protobuf and bbolt, responsible for serialization and persistence
graph LR
Col["core/collection.go"] --> HNSW["core/hnsw_index.go"]
Col --> Store["core/storage.go"]
HNSW --> HnswLib["github.com/coder/hnsw"]
Store --> BB["go.etcd.io/bbolt"]
Store --> PB["google.golang.org/protobuf"]
Performance Considerations and Optimization Strategies¶
HNSW Parameter Tuning Strategy¶
- M (maximum connections per node)
- Impact: Larger M results in denser graph, improved recall but increased memory and build time
- Suggestion: Default 16; can reduce to 8-12 for small scale or memory-constrained scenarios; can try 24-32 for large-scale high-recall scenarios
- efConstruction (build phase candidate list)
- Impact: Larger results in more refined build, higher index quality but significantly increased build time
- Suggestion: Default 200; can reduce to 100-150 for build-latency-sensitive scenarios; can increase to 300-500 for recall-sensitive scenarios
- efSearch (query phase candidate list)
- Impact: Larger recall is higher, average latency increases; proportional to TopK
- Suggestion: Default 64; can reduce to 32-64 for low-latency scenarios; can increase to 128-256 for high-recall scenarios
- TopK (return count)
- Impact: Directly affects post-filter fetchK scale and final sorting cost
- Suggestion: Set according to business needs; if filtering is heavy, appropriately increase fetchK to ensure recall
Memory Usage Optimization¶
- Vector quantization (SQ8)
- Compression ratio: 1 byte per dimension + 8 bytes metadata; typical 128-dimensional vector reduces from 512 bytes to about 136 bytes
- Usage suggestion: Enabling quantization can significantly reduce memory and disk usage; automatically decompresses and cleans compressed keys on load
- Caching strategy
- Index layer: HNSW graph itself is an in-memory cache; control hit rate and latency balance through reasonable efSearch
- Application layer: LRU cache hot query results (can be self-implemented at upper layer)
- Garbage collection tuning
- Benchmark tool demonstrates the practice of actively triggering GC after large-scale runs, helping to stabilize subsequent performance
- Suggestion: Manually trigger GC after batch import, before query peaks, during system idle periods; observe memory retreat
Disk I/O Optimization¶
- Batch writing
- Benchmark tool uses fixed batch size for batch Upsert, reducing transaction overhead and I/O frequency
- Suggestion: Set reasonable batch based on available memory (e.g., 10,000 items), avoid loading too much data at once
- Index rebuild strategy
- When parameters change or data structure changes, can first clear old index, then batch import; avoid frequent incremental updates
- Storage format selection
- Using Protobuf serialization, small volume, fast parsing; further compressed combined with quantization
- Data lifecycle
- Regularly clean invalid/expired data to release storage space and index nodes
Benchmark Testing and Performance Analysis¶
- Benchmark tool capabilities
- Supports building and querying at different scales (10K/100K/1M) and dimensions (default 128)