Performance Issues¶
This guide focuses on systematic troubleshooting of GoVector performance issues, covering diagnosis and resolution paths for problems such as excessive memory usage, disk I/O latency, slow query response, and providing performance monitoring metric interpretation, HNSW parameter tuning, storage configuration optimization, query optimization strategies, benchmark tool usage and bottleneck identification techniques, as well as optimization suggestions for different scale datasets.
Project Structure¶
GoVector uses an "embedded vector database" design, with core composed of the following modules:
- Storage layer: Persistence based on bbolt (BoltDB), supports collection metadata and point data read/write.
- Index layer: Supports flat index and HNSW graph index, select as needed to balance build cost and query latency.
- Model and filtering: Unified data model and filters, supports multiple matching types and range conditions.
- Quantization: Built-in SQ8 8-bit quantization, reducing disk usage and memory pressure.
- Service layer: Provides Qdrant-compatible REST API, supporting collection management, point write, search and delete.
graph TB
subgraph "Application Layer"
API["HTTP API Server
api/server.go"]
CLI["Command-line entry
cmd/govector/main.go"]
end
subgraph "Core Engine"
COL["Collection
core/collection.go"]
IDX_HNSW["HNSWIndex
core/hnsw_index.go"]
IDX_FLAT["FlatIndex
core/flat_index_test.go"]
STORE["Storage(Bbolt)
core/storage.go"]
QUANT["SQ8 Quantization
core/quantization.go"]
MODELS["Model and filtering
core/models.go"]
end
API --> COL
CLI --> COL
COL --> IDX_HNSW
COL --> IDX_FLAT
COL --> STORE
STORE --> QUANT
COL --> MODELS
Core Components¶
- Collection: Encapsulates collection Upsert/Search/Delete operations, coordinates in-memory index and persistent storage, ensures consistency.
- HNSWIndex: Approximate nearest neighbor index based on external graph library, supports Cosine/Euclidean/Dot distance metrics and adjustable parameters.
- Storage: Bbolt encapsulation, responsible for collection buckets, metadata, point serialization and loading, batch delete, etc.
- SQ8 Quantization: Compresses floating-point vectors to 8-bit representation, reducing disk and memory usage.
- Filters and models: Unified Payload structure and multi-type matching conditions, supporting exact, range, prefix, contains, regex.
Architecture Overview¶
The diagram below shows key interaction paths from API requests to index and storage, as well as the post-filter strategy during queries.
sequenceDiagram
participant Client as "Client"
participant API as "API Server
api/server.go"
participant Col as "Collection
core/collection.go"
participant Idx as "Index(HNSW/Flat)
hnsw_index.go"
participant Store as "Storage(Bbolt)
storage.go"
Client->>API : "POST /collections/{name}/points/search"
API->>Col : "Search(vector, filter, limit)"
Col->>Idx : "Search(query, filter, topK)"
alt Needs filtering
Idx->>Idx : "Oversample(fetchK=topK*10)"
Idx-->>Col : "Candidate results(with Payload)"
Col->>Col : "Apply filter(MatchFilter)"
else No filter
Idx-->>Col : "TopK results"
end
Col-->>API : "ScoredPoint list"
API-->>Client : "JSON response"
Note over Col,Store : "Upsert/delete operations persist to disk first, then update in-memory index"
Detailed Component Analysis¶
HNSW Index Parameters and Behavior¶
- Key parameters
- M: Maximum connections per node, affecting graph density and query accuracy/speed trade-off.
- EfConstruction: Dynamic candidate list size during build phase, larger is more accurate but build is slower.
- EfSearch: Dynamic candidate list size during query phase, larger recall is higher but latency increases.
- K: Number of TopK results to return, affecting post-filter fetch multiplier.
- Query strategy
- When filter exists, uses "oversample + post-filter" strategy, fetches multiple times the topK quantity, then applies filter in memory, finally takes the first K.
- Concurrency control
- Uses read-write locks to protect graph and map, avoiding inconsistency from concurrent read/write.
flowchart TD
Start(["Start query"]) --> NeedFilter{"Filter exists?"}
NeedFilter --> |Yes| FetchMore["fetchK = topK * 10
upper bound is graph size"]
NeedFilter --> |No| UseTopK["fetchK = topK"]
FetchMore --> Search["HNSW Graph.Search(query, fetchK)"]
UseTopK --> Search
Search --> PostFilter["Iterate neighbors and apply MatchFilter"]
PostFilter --> CalcScore["Recalculate score by metric"]
CalcScore --> Collect["Collect first K results"]
Collect --> End(["Return results"])
Storage and Persistence¶
- Write flow
- First write to bbolt collection bucket, key is point ID, value is Protobuf serialized point; if quantization is enabled, original vector is replaced with compressed bytes and stored in Payload.
- After success, update in-memory index; on failure, try to rollback by deleting written points.
- Read flow
- Traverse all points from collection bucket, deserialize to memory objects; if quantization is enabled, read compressed bytes from Payload and decompress back to floating-point vectors.
- Metadata
- Collection meta information is stored in special bucket, automatically loaded and Collection and index rebuilt on restart.
sequenceDiagram
participant Col as "Collection"
participant Store as "Storage"
participant BB as "bbolt bucket"
participant Idx as "In-memory index"
Col->>Store : "UpsertPoints(batch)"
Store->>BB : "Put(ID, Protobuf)"
alt Quantization enabled
Store->>Store : "Quantize(vector) -> Payload['__quantized_vector']"
end
Store-->>Col : "Success"
Col->>Idx : "Upsert(points)"
Idx-->>Col : "Done"
Query Optimization Strategies¶
- Reduce post-filter overhead
- In high filter rate scenarios, appropriately increase EfSearch to improve recall, while controlling memory and CPU usage through reasonable TopK and fetch multiplier.
- Establish suitable Payload structure for high-frequency filter fields, avoid complex regex or large range scans.
- Batch write
- Use larger batch Upsert to reduce bbolt transaction count and serialization overhead.
- Quantization and memory
- Enable SQ8 quantization when accuracy requirements are met, significantly reducing memory and disk usage, improving cold start and large-scale loading efficiency.
Dependency Analysis¶
- External dependencies
- HNSW graph library: Used for building and querying HNSW index.
- bbolt: Lightweight key-value database, providing collection buckets and metadata storage.
- Protobuf: Serializing points and metadata.
- Internal coupling
- Collection depends on Storage and Index interface, exposing unified Upsert/Search/Delete upward.
- HNSWIndex maintains graph and point map, requires concurrent-safe access during Search.
- Storage and Collection together ensure data consistency.
graph LR
API["api/server.go"] --> COL["core/collection.go"]
CLI["cmd/govector/main.go"] --> API
COL --> HNSW["core/hnsw_index.go"]
COL --> FLAT["core/flat_index_test.go"]
COL --> STORE["core/storage.go"]
STORE --> QUANT["core/quantization.go"]
STORE --> MODELS["core/models.go"]
Performance Considerations and Optimization Suggestions¶
Excessive Memory Usage¶
- Symptoms
- Increased GC count, rising Alloc/Sys, query latency jitter.
- Possible causes
- Oversampling causing intermediate result bloat; quantization not enabled causing excessive vector usage; small batch Upsert causing frequent allocation.
- Optimization methods
- Enable SQ8 quantization (supported by both storage and loading), reducing memory and disk usage.
- Adjust HNSW parameters: Appropriately reduce EfSearch or increase M to balance accuracy and memory.
- Batch write: Increase Upsert batch size to reduce temporary objects and copies.
- Control TopK and fetch multiplier: Reduce fetchK as much as possible while meeting recall requirements.
Disk I/O Latency¶
- Symptoms
- High Upsert latency, long time to load collection, delete operation stalling.
- Possible causes
- Write amplification: Each record serialized multiple times; small batch transactions causing frequent I/O commits.
- Quantization not enabled: Vectors stored directly as floating-point arrays, large volume.
- Optimization methods
- Enable SQ8 quantization to reduce per-record volume and write amplification.
- Batch Upsert: Refer to batch processing mode in benchmark tool to reduce transaction count.
- Choose appropriate disk: SSD is better than HDD; ensure filesystem cache and read-ahead strategy are reasonable.
Slow Query Response¶
- Symptoms
- Rising average latency, significantly higher P95/P99, throughput decline.
- Possible causes
- High filter rate causing large post-filter overhead; EfSearch insufficient causing insufficient recall; large index scale but improper parameters.
- Optimization methods
- Increase EfSearch: Improve query candidate list size within acceptable range.
- Set M reasonably: Increasing M improves graph density, but increases memory and build time.
- Optimize filters: Avoid full-table regex scan; prioritize fields with high selectivity.
- Evaluate if HNSW is needed: At million-scale, HNSW already has sub-millisecond latency.
HNSW Parameter Tuning¶
- M
- Increase: Better recall, higher memory; decrease: Lower memory, slightly lower recall.
- EfConstruction
- Increase: More accurate build, longer time; default value is usually reasonable.
- EfSearch
- Increase: Better recall, higher latency; adjust based on SLA.
- K
- Affects fetch multiplier and post-filter cost; keep consistent with TopK.
Storage Configuration Optimization¶
- Quantization switch
- Enable quantization when constructing Storage, or pass parameters when creating collection via API.
- Metadata and collection buckets
- Automatically discover collections and metadata, automatically restored after restart.