Skip to content

Monitoring and Alerting

This guide is for operations and platform engineering teams, providing actionable configuration recommendations for GoVector's monitoring and alerting system. The content covers:

  • Collection and observation methods for key performance indicators (query latency, memory usage, disk I/O, network throughput)
  • Prometheus monitoring integration and Grafana dashboard template recommendations
  • Alert rule design (thresholds and levels)
  • Log collection and analysis (access logs, error logs, performance logs)
  • Fault detection and automatic recovery mechanism configuration
  • Monitoring data visualization and report generation

Since the current repository does not include built-in Prometheus/Grafana configuration or logging framework integration, this guide provides general best practices and scalable integration paths.

Project Structure

GoVector uses a layered organization of "command-line entry + HTTP API server + core engine":

  • Command-line entry: Responsible for parameter parsing, storage initialization, collection loading and HTTP service startup/graceful shutdown
  • API layer: Provides Qdrant-compatible REST interfaces, encapsulates collection management and vector operations
  • Storage layer: Local persistence based on bbolt, supports metadata and point read/write
  • Model and filtering: Defines point structure, scored point, filter and multiple matching types
  • Benchmark tool: Provides memory statistics and measurement of query throughput/latency
graph TB
subgraph "Process"
CLI["Command-line entry
cmd/govector/main.go"] API["HTTP API Server
api/server.go"] CORE["Core engine and model
core/*"] end CLI --> API API --> CORE

Core Components

  • Command-line entry and lifecycle
  • Parse parameters such as port, database path, index type
  • Initialize storage engine and default collection
  • Start HTTP service and register signal handling for graceful shutdown
  • HTTP API server
  • Provides collection management and point operation interfaces
  • Internally maintains collection mapping and concurrency safety
  • Storage engine
  • Bucket-based storage based on bbolt, supports collection metadata and point read/write
  • Supports optional vector quantization compression
  • Model and filtering
  • Defines point structure, scored point, filter and multiple matching types
  • Benchmark tool
  • Outputs memory statistics, index build time, average query latency and QPS

Architecture Overview

The diagram below shows the call chain from client to storage and key observation points:

sequenceDiagram
participant Client as "Client"
participant HTTP as "HTTP API Server"
participant Coll as "Collection"
participant Store as "Storage(Storage/BoltDB)"
Client->>HTTP : "PUT /collections/{name}/points"
HTTP->>Coll : "Upsert(batch points)"
Coll->>Store : "UpsertPoints(serialize+persist)"
Store-->>Coll : "Write result"
Coll-->>HTTP : "Status code/results"
HTTP-->>Client : "Response"
Client->>HTTP : "POST /collections/{name}/points/search"
HTTP->>Coll : "Search(query)"
Coll->>Store : "Read/deserialized as needed"
Store-->>Coll : "Point set"
Coll-->>HTTP : "TopK results"
HTTP-->>Client : "Response"

Detailed Component Analysis

Component 1: HTTP API Server (Monitoring Focus)

  • Observation dimensions
  • Total request volume, success rate, error code distribution
  • Route-level latency (distinguished by endpoint)
  • Concurrent connections and queue wait time
  • Recommended instrumentation locations
  • Record timestamps and status codes at request entry and return
  • Separate statistics for collection management and point operation endpoints
  • Observability output
  • Export metrics in Prometheus text format
  • Combine with existing logs to form access logs
flowchart TD
Start(["Request entry"]) --> Pre["Record start time/tags"]
Pre --> Route{"Route match"}
Route --> |Upsert| Upsert["Handle Upsert"]
Route --> |Search| Search["Handle Search"]
Route --> |Delete| Delete["Handle Delete"]
Route --> |Management| Manage["Handle collection management"]
Upsert --> Post["Record end time/status code"]
Search --> Post
Delete --> Post
Manage --> Post
Post --> Export["Export metrics/write logs"]
Export --> End(["Done"])

Component 2: Storage Engine (Disk I/O and Memory)

  • Observation dimensions
  • Upsert/Load/Delete time consumption and throughput
  • bbolt transaction commit count and failure rate
  • Serialization/deserialization overhead (Protobuf)
  • CPU and memory changes when vector quantization is enabled
  • Recommended instrumentation locations
  • Instrument before and after UpsertPoints/LoadCollection/DeletePoints
  • Record batch size, point count, byte length
  • Observability output
  • Metrics: Per-batch write latency, read latency, failure count
  • Logs: Exception stack and context information
flowchart TD
UStart["UpsertPoints start"] --> Serialize["Serialize points(Protobuf)"]
Serialize --> Tx["bbolt transaction write"]
Tx --> UEnd["UpsertPoints end"]
LStart["LoadCollection start"] --> Tx2["bbolt transaction read"]
Tx2 --> Deserialize["Deserialize points(Protobuf)"]
Deserialize --> LEnd["LoadCollection end"]
DStart["DeletePoints start"] --> Tx3["bbolt transaction delete"]
Tx3 --> DEnd["DeletePoints end"]

Component 3: Collection and Filtering (Query Latency)

  • Observation dimensions
  • Search average latency, P95/P99 latency
  • Impact of filter condition complexity on latency
  • Comparison of different distance metrics and index types (HNSW/Flat)
  • Recommended instrumentation locations
  • Instrument at Search entry and return
  • Distinguish whether filter is used, TopK size
  • Observability output
  • Latency histogram and summary
  • Error classification statistics
flowchart TD
SStart["Search entry"] --> Filter["Apply filter conditions"]
Filter --> Index["Index search(HNSW/Flat)"]
Index --> Sort["Sort/take TopK"]
Sort --> SEpilogue["Record latency/results"]
SEpilogue --> SEnd["Return"]

Component 4: Benchmark Tool (Performance Baseline)

  • Observation dimensions
  • Index build time, average time per point
  • Query average latency, QPS
  • Runtime memory allocation and GC count
  • Recommended instrumentation locations
  • Print key metrics in benchmark script
  • Observability output
  • Structured logs (JSON), easy for subsequent analysis
  • Comparison with different scales/dimensions/index types
flowchart TD
BStart["Start benchmark"] --> Build["Batch Upsert"]
Build --> Warm["Warm-up(optional)"]
Warm --> Queries["Random query N times"]
Queries --> Metrics["Calculate latency/QPS/latency"]
Metrics --> BEnd["Output report"]

Dependency Analysis

  • External dependencies
  • bbolt: Local key-value storage
  • Protobuf: Point structure serialization
  • HNSW: Approximate nearest neighbor index
  • Internal coupling
  • API server depends on collection and storage
  • Storage depends on Protobuf and bbolt
  • Benchmark tool is independent of runtime, only reuses core models
graph LR
API["api/server.go"] --> CORE["core/*"]
CORE --> BBOLT["bbolt"]
CORE --> PROTO["Protobuf"]
CORE --> HNSW["HNSW index"]
BENCH["cmd/bench/main.go"] --> CORE

Performance Considerations

  • Query latency
  • Using HNSW can significantly reduce query latency; Flat is suitable for small scale or low latency requirement scenarios
  • More complex filter conditions result in higher query cost
  • Memory usage
  • Benchmark tool shows memory allocation and GC count under different scales, can be used for capacity planning
  • Disk I/O
  • Batch Upsert/Load/delete triggers bbolt transactions, avoid too small batches causing frequent IO
  • Network throughput
  • API layer is a single-machine HTTP service, bottleneck is usually network stack and CPU; can be optimized through concurrency and connection pooling

Troubleshooting Guide