Configuration Issues¶
This guide focuses on GoVector configuration issue troubleshooting and fixing, covering the following aspects:
- Configuration file format errors (e.g., invalid JSON)
- Improper parameter settings (e.g., vector dimensions, distance metrics, HNSW parameters)
- Port conflicts and service startup failures
- Server configuration, collection configuration, index configuration verification methods
- Configuration syntax checking and parameter validity verification
- Log diagnostics and configuration rollback and repair strategies
- Configuration templates and best practices
Project Structure¶
GoVector provides two usage modes: embedded library and independent microservice. The microservice mode is started via command-line entry, loads storage engine and default collection, and exposes Qdrant-compatible HTTP API.
graph TB
subgraph "Command-line Entry"
MAIN["cmd/govector/main.go
Parse command-line arguments and start service"]
end
subgraph "API Layer"
API["api/server.go
HTTP server and route handling"]
end
subgraph "Core Layer"
COL["core/collection.go
Collection management and consistency guarantee"]
IDX["core/index.go
Index interface abstraction"]
HNSW["core/hnsw_index.go
HNSW implementation and parameters"]
STOR["core/storage.go
BoltDB persistence and metadata"]
MODELS["core/models.go
Data model and filters"]
end
MAIN --> API
API --> COL
COL --> IDX
COL --> STOR
IDX --> HNSW
API --> MODELS
Core Components¶
- Command-line entry is responsible for parsing parameters such as port, database path, whether to enable HNSW, and initializing storage, default collection, and HTTP service.
- API layer is responsible for loading persisted collection metadata, registering collections, and providing REST interfaces for collection and point operations.
- Storage layer is based on BoltDB, providing collection buckets, metadata bucket, point serialization and quantization support.
- Collection layer is responsible for dimension validation, in-memory index and storage consistency writes.
- HNSW index layer provides adjustable parameters (M, EfConstruction, EfSearch, K) and adaptation for different distance metrics.
Architecture Overview¶
The diagram below shows the overall flow from command-line to API, then to storage and index, as well as key error points and log locations.
sequenceDiagram
participant CLI as "Command-line entry"
participant API as "API Server"
participant COL as "Collection"
participant STOR as "Storage(BoltDB)"
participant IDX as "Index(HNSW/Flat)"
CLI->>STOR : Initialize storage(open database)
STOR-->>CLI : Success/failure
CLI->>COL : Create/load default collection(dimension/metric/HNSW)
COL->>STOR : Save collection metadata
COL->>IDX : Initialize in-memory index
CLI->>API : Start HTTP service(listen on port)
API->>STOR : Load collection metadata on startup
STOR-->>API : Return collection metadata list
API->>COL : Rebuild collection instance for each metadata
API-->>CLI : Service ready/error
Detailed Component Analysis¶
Server Configuration and Startup Flow¶
- Command-line parameters:
- Port: Default value and valid range not explicitly limited in code, need to consider system port availability and permissions.
- Database path: Used for bbolt file path, need to ensure directory exists and has read/write permissions.
- Whether to enable HNSW: Affects default collection index type.
- Startup sequence:
- Initialize storage -> Load/create default collection -> Register collection to API -> Start HTTP service -> Listen for signals for graceful shutdown.
flowchart TD
Start(["Startup entry"]) --> Parse["Parse command-line arguments"]
Parse --> InitStore["Initialize storage engine"]
InitStore --> CreateCol["Create/load default collection"]
CreateCol --> RegAPI["Register collection to API"]
RegAPI --> Listen["Start HTTP listen"]
Listen --> Wait["Wait for signals or errors"]
Wait --> Graceful["Graceful shutdown(timeout)"]
Graceful --> End(["End"])
Collection Configuration and Parameter Validation¶
- Required fields and constraints:
- Name: Unique identifier, returns conflict if already exists on creation.
- Vector dimension: Must be positive; dimension consistency is also validated on query/insert.
- Distance metric: Supports Euclidean, Dot, Cosine; returns error for unknown values.
- HNSW switch: Boolean; optional parameter object contains M, EfConstruction, EfSearch, K.
- Metadata persistence:
- Collection metadata is stored in special bucket; automatically loaded and collection instances rebuilt on restart.
flowchart TD
Req["Create collection request(JSON)"] --> Decode["Decode JSON"]
Decode --> Exists{"Name already exists?"}
Exists -- Yes --> Err409["Return 409 conflict"]
Exists -- No --> DimCheck{"Vector dimension > 0?"}
DimCheck -- No --> Err400A["Return 400 invalid dimension"]
DimCheck -- Yes --> Metric["Parse distance metric"]
Metric --> MetricOK{"Metric valid?"}
MetricOK -- No --> Err400B["Return 400 invalid metric"]
MetricOK -- Yes --> Params["Parse HNSW parameters(optional)"]
Params --> Create["Create collection(memory+storage)"]
Create --> Register["Register to server"]
Register --> OK["Return 200 success"]
Index Configuration and Parameters¶
- HNSW parameters:
- M: Maximum connections per node, default 16.
- EfConstruction: Candidate list size during build phase, default 200.
- EfSearch: Candidate list size during search phase, default 64.
- K: Number of neighbors to return, default 10.
- Distance metrics:
- Euclidean, Dot, Cosine; Dot needs to be negated to adapt to the underlying library's convention of minimizing distance.
- Parameter application path:
- Parameters passed through collection creation interface; or use default parameters.
classDiagram
class HNSWParams {
+int M
+int EfConstruction
+int EfSearch
+int K
}
class HNSWIndex {
+Upsert(points)
+Search(query, filter, topK)
+Delete(id)
+Count()
+GetIDsByFilter(filter)
+DeleteByFilter(filter)
}
HNSWIndex --> HNSWParams : "uses"
Data Model and Filters¶
- Point structure contains ID, version, vector and optional payload.
- Filter supports exact match, range, prefix, contains, regex and other condition types.
- Matching logic is executed on the server side, note that regex compilation failure directly results in non-match.
Dependency Analysis¶
- External dependencies:
- bbolt: Local key-value storage.
- hnsw: HNSW graph search library.
- protobuf: Point structure serialization.
- Internal modules:
- api: HTTP service and routing.
- core: Collection, index, storage, model.
graph LR
GOVEC["github.com/DotNetAge/govector"] --> BBOLT["go.etcd.io/bbolt"]
GOVEC --> HNSWLIB["github.com/coder/hnsw"]
GOVEC --> PROTO["google.golang.org/protobuf"]
Performance Considerations¶
- HNSW default parameters are suitable for most scenarios; when data scale is large and latency-sensitive, can adjust EfConstruction/EfSearch/K.
- Dot distance is negated in HNSW to satisfy the underlying library's assumption of "minimizing distance".
- Storage layer supports optional quantization, reduces disk usage but may sacrifice accuracy.
[This section is general guidance, no specific file reference needed]
Troubleshooting Guide¶
1. Configuration File Format Errors¶
- Symptoms
- Returns 400 when creating collection or writing points,提示 JSON 无效.
- Cause identification
- Request body is not valid JSON; API layer returns 400 directly on decode.
- Troubleshooting steps
- Use curl or Postman to send minimal valid request body, add fields gradually to confirm issue.
- Refer to collection creation interface JSON field definition and examples.
- Fix suggestions
- Ensure JSON structure is complete, key names are correct, numeric types meet expectations (e.g., integer parameters should be numbers not strings).
2. Improper Parameter Settings¶
- Invalid vector dimension
- Symptoms: Returns 400 when creating collection,提示 dimension must be positive; or dimension mismatch when writing points.
- Troubleshooting: Verify vector_size in request matches actual vector length.
- Fix: Unify dimensions, ensure all point vector lengths match the collection.
- Invalid distance metric
- Symptoms: Returns 400 when creating collection,提示 metric invalid.
- Troubleshooting: Only allows euclidean, dot, cosine (case-insensitive).
- Fix: Use supported metric names.
- HNSW parameter out of range or type error