Monitoring and Alerting¶

This guide is for operations and platform engineering teams, providing actionable configuration recommendations for GoVector's monitoring and alerting system. The content covers:

Collection and observation methods for key performance indicators (query latency, memory usage, disk I/O, network throughput)
Prometheus monitoring integration and Grafana dashboard template recommendations
Alert rule design (thresholds and levels)
Log collection and analysis (access logs, error logs, performance logs)
Fault detection and automatic recovery mechanism configuration
Monitoring data visualization and report generation

Since the current repository does not include built-in Prometheus/Grafana configuration or logging framework integration, this guide provides general best practices and scalable integration paths.

Project Structure¶

GoVector uses a layered organization of "command-line entry + HTTP API server + core engine":

Command-line entry: Responsible for parameter parsing, storage initialization, collection loading and HTTP service startup/graceful shutdown
API layer: Provides Qdrant-compatible REST interfaces, encapsulates collection management and vector operations
Storage layer: Local persistence based on bbolt, supports metadata and point read/write
Model and filtering: Defines point structure, scored point, filter and multiple matching types
Benchmark tool: Provides memory statistics and measurement of query throughput/latency

graph TB
subgraph "Process"
CLI["Command-line entry
cmd/govector/main.go"]
API["HTTP API Server
api/server.go"]
CORE["Core engine and model
core/*"]
end
CLI --> API
API --> CORE

Core Components¶

Command-line entry and lifecycle
Parse parameters such as port, database path, index type
Initialize storage engine and default collection
Start HTTP service and register signal handling for graceful shutdown
HTTP API server
Provides collection management and point operation interfaces
Internally maintains collection mapping and concurrency safety
Storage engine
Bucket-based storage based on bbolt, supports collection metadata and point read/write
Supports optional vector quantization compression
Model and filtering
Defines point structure, scored point, filter and multiple matching types
Benchmark tool
Outputs memory statistics, index build time, average query latency and QPS

Architecture Overview¶

The diagram below shows the call chain from client to storage and key observation points:

sequenceDiagram
participant Client as "Client"
participant HTTP as "HTTP API Server"
participant Coll as "Collection"
participant Store as "Storage(Storage/BoltDB)"
Client->>HTTP : "PUT /collections/{name}/points"
HTTP->>Coll : "Upsert(batch points)"
Coll->>Store : "UpsertPoints(serialize+persist)"
Store-->>Coll : "Write result"
Coll-->>HTTP : "Status code/results"
HTTP-->>Client : "Response"
Client->>HTTP : "POST /collections/{name}/points/search"
HTTP->>Coll : "Search(query)"
Coll->>Store : "Read/deserialized as needed"
Store-->>Coll : "Point set"
Coll-->>HTTP : "TopK results"
HTTP-->>Client : "Response"

Detailed Component Analysis¶

Component 1: HTTP API Server (Monitoring Focus)¶

Observation dimensions
Total request volume, success rate, error code distribution
Route-level latency (distinguished by endpoint)
Concurrent connections and queue wait time
Recommended instrumentation locations
Record timestamps and status codes at request entry and return
Separate statistics for collection management and point operation endpoints
Observability output
Export metrics in Prometheus text format
Combine with existing logs to form access logs

flowchart TD
Start(["Request entry"]) --> Pre["Record start time/tags"]
Pre --> Route{"Route match"}
Route --> |Upsert| Upsert["Handle Upsert"]
Route --> |Search| Search["Handle Search"]
Route --> |Delete| Delete["Handle Delete"]
Route --> |Management| Manage["Handle collection management"]
Upsert --> Post["Record end time/status code"]
Search --> Post
Delete --> Post
Manage --> Post
Post --> Export["Export metrics/write logs"]
Export --> End(["Done"])

Component 2: Storage Engine (Disk I/O and Memory)¶

Observation dimensions
Upsert/Load/Delete time consumption and throughput
bbolt transaction commit count and failure rate
Serialization/deserialization overhead (Protobuf)
CPU and memory changes when vector quantization is enabled
Recommended instrumentation locations
Instrument before and after UpsertPoints/LoadCollection/DeletePoints
Record batch size, point count, byte length
Observability output
Metrics: Per-batch write latency, read latency, failure count
Logs: Exception stack and context information

flowchart TD
UStart["UpsertPoints start"] --> Serialize["Serialize points(Protobuf)"]
Serialize --> Tx["bbolt transaction write"]
Tx --> UEnd["UpsertPoints end"]
LStart["LoadCollection start"] --> Tx2["bbolt transaction read"]
Tx2 --> Deserialize["Deserialize points(Protobuf)"]
Deserialize --> LEnd["LoadCollection end"]
DStart["DeletePoints start"] --> Tx3["bbolt transaction delete"]
Tx3 --> DEnd["DeletePoints end"]

Component 3: Collection and Filtering (Query Latency)¶

Observation dimensions
Search average latency, P95/P99 latency
Impact of filter condition complexity on latency
Comparison of different distance metrics and index types (HNSW/Flat)
Recommended instrumentation locations
Instrument at Search entry and return
Distinguish whether filter is used, TopK size
Observability output
Latency histogram and summary
Error classification statistics

flowchart TD
SStart["Search entry"] --> Filter["Apply filter conditions"]
Filter --> Index["Index search(HNSW/Flat)"]
Index --> Sort["Sort/take TopK"]
Sort --> SEpilogue["Record latency/results"]
SEpilogue --> SEnd["Return"]

Component 4: Benchmark Tool (Performance Baseline)¶

Observation dimensions
Index build time, average time per point
Query average latency, QPS
Runtime memory allocation and GC count
Recommended instrumentation locations
Print key metrics in benchmark script
Observability output
Structured logs (JSON), easy for subsequent analysis
Comparison with different scales/dimensions/index types

flowchart TD
BStart["Start benchmark"] --> Build["Batch Upsert"]
Build --> Warm["Warm-up(optional)"]
Warm --> Queries["Random query N times"]
Queries --> Metrics["Calculate latency/QPS/latency"]
Metrics --> BEnd["Output report"]

Dependency Analysis¶

External dependencies
bbolt: Local key-value storage
Protobuf: Point structure serialization
HNSW: Approximate nearest neighbor index
Internal coupling
API server depends on collection and storage
Storage depends on Protobuf and bbolt
Benchmark tool is independent of runtime, only reuses core models

graph LR
API["api/server.go"] --> CORE["core/*"]
CORE --> BBOLT["bbolt"]
CORE --> PROTO["Protobuf"]
CORE --> HNSW["HNSW index"]
BENCH["cmd/bench/main.go"] --> CORE

Performance Considerations¶

Query latency
Using HNSW can significantly reduce query latency; Flat is suitable for small scale or low latency requirement scenarios
More complex filter conditions result in higher query cost
Memory usage
Benchmark tool shows memory allocation and GC count under different scales, can be used for capacity planning
Disk I/O
Batch Upsert/Load/delete triggers bbolt transactions, avoid too small batches causing frequent IO
Network throughput
API layer is a single-machine HTTP service, bottleneck is usually network stack and CPU; can be optimized through concurrency and connection pooling