Skip to content

Production Deployment

This guide is for production environments, providing a complete deployment solution for GoVector. It covers hardware resource recommendations, system dependencies, multiple deployment methods (Homebrew, SystemD, Docker, Kubernetes), network and security hardening, performance tuning, and deployment verification and health check methods. GoVector is a pure Go implemented embedded vector database, providing Qdrant-compatible REST API, supporting HNSW approximate nearest neighbor retrieval and persistent storage.

Project Structure

  • Top-level module definition and Go version requirements are in go.mod.
  • Service entry is at cmd/govector/main.go, responsible for parsing command-line parameters, initializing storage and collections, and starting the HTTP API server.
  • API layer is at api/server.go, providing /collections, /points and other endpoints, supporting create, delete, query, delete and other operations.
  • Core storage and index are in the core package: bbolt as local persistence engine; Flat/HNSW index implementation; Payload filtering logic; vector quantization (SQ8) capability.
  • Build and release scripts are in scripts/, including Homebrew Formula, SystemD service template and cross-platform release packaging scripts.
  • Performance benchmark and demo scripts are at cmd/bench/main.go and demo.sh.
graph TB
subgraph "Application Layer"
S["HTTP API Server
api/server.go"] E["Command-line entry
cmd/govector/main.go"] end subgraph "Core Layer" ST["Storage Engine
core/storage.go"] F["Flat Index
core/flat_index.go"] H["HNSW Index
core/hnsw_index.go"] M["Data model and filtering
core/models.go"] Q["Vector Quantization(SQ8)
core/quantization.go"] end subgraph "External Dependencies" BB["bbolt(BoltDB)"] HNSW["coder/hnsw graph library"] PB["Protocol Buffers"] end E --> S S --> ST ST --> BB ST --> PB S --> F S --> H H --> HNSW S --> M ST --> Q

Core Components

  • Storage Engine (Storage)
  • Local persistence based on bbolt, uses Protocol Buffers to serialize point data.
  • Supports optional vector quantization (SQ8) to reduce disk usage and memory pressure.
  • Provides collection metadata management, batch write, load, delete and other capabilities.
  • Index Engine
  • FlatIndex: Brute-force search, suitable for small scale or extremely high accuracy requirements.
  • HNSWIndex: Graph structure approximate search, supports Cosine/Euclidean/Dot distance metrics, suitable for large-scale high-dimensional vectors.
  • API Server
  • Provides Qdrant-compatible REST interfaces: collection management, point write, search, delete.
  • Automatically loads collection metadata and data from storage on startup, supports graceful shutdown.
  • Data Model and Filtering
  • Payload structure supports multi-type values; Filter supports must/must_not, exact match, range, prefix, contains, regex, etc.
  • Build and Release
  • Makefile provides build, run, cleanup, benchmark targets.
  • scripts/build_release.sh supports cross-platform compilation and Homebrew Formula synchronized updates.

Architecture Overview

The diagram below shows component interactions in production deployment: client accesses service through HTTP API, service reads/writes bbolt storage, index layer selects Flat or HNSW based on configuration; optionally enable SQ8 quantization to save space.

graph TB
C["Client/SDK"]
A["API Server
api/server.go"] P["Persistent Storage
bbolt + Protobuf"] IDX["Index Engine
Flat/HNSW"] QZ["Vector Quantization
SQ8"] C --> A A --> IDX A --> P P --> QZ QZ --> P

Detailed Component Analysis

API Server (HTTP)

  • Function highlights
  • Parse command-line parameters (port, database path, whether to enable HNSW).
  • Initialize storage and default collection, register to API server.
  • Provide endpoints for collection management, point write, search, delete.
  • Support graceful shutdown and signal handling.
  • Key flows (Startup and request processing)
sequenceDiagram
participant U as "User process"
participant M as "main.go"
participant S as "API Server"
participant ST as "Storage Engine"
participant IDX as "Index Engine"
U->>M : Start process(parameters: port/DB/HNSW)
M->>ST : Initialize storage
M->>IDX : Create/load collection(Flat/HNSW)
M->>S : Register collection and start listening
U->>S : Request /collections /points
S->>IDX : Write/search/delete
S-->>U : Return results(JSON)

Storage and Persistence (bbolt + Protobuf)

  • Capabilities
  • Collection-level bucket isolation; collection metadata bucket saves configuration and parameters.
  • Protobuf serializes point data; supports optional SQ8 quantization.
  • Batch write, load, delete; list collections and metadata.
  • Complexity and Performance
  • Write is batch Put within transaction, affected by disk throughput.
  • Load traverses bucket and deserializes, memory usage is linearly related to data volume.
flowchart TD
Start(["Write entry"]) --> CheckClosed{"Storage closed?"}
CheckClosed --> |Yes| Err["Return error"]
CheckClosed --> |No| Tx["Start bbolt transaction"]
Tx --> ForEach["Iterate point array"]
ForEach --> QuantCheck{"Quantization enabled?"}
QuantCheck --> |Yes| Quant["Generate quantized data and write to payload"]
QuantCheck --> |No| Marshal["Serialize point(Protobuf)"]
Quant --> Put["Write to bucket(key=ID)"]
Marshal --> Put
Put --> Next{"More points?"}
Next --> |Yes| ForEach
Next --> |No| Commit["Commit transaction"]
Commit --> Done(["Done"])
Err --> Done

Index Engine (Flat and HNSW)

  • FlatIndex
  • Brute-force comparison, O(N) query complexity; suitable for small to medium scale or scenarios requiring exact search.
  • HNSWIndex
  • Graph structure search, supports custom parameters (M, EfConstruction, EfSearch, K).
  • Search uses post-filter strategy (oversampling then filtering) to ensure correctness under filter conditions.
  • Parameter recommendations
  • Default parameters are suitable for general scenarios; adjust EfConstruction/EfSearch/K based on throughput and latency targets.
classDiagram
class HNSWParams {
    +int M
    +int EfConstruction
    +int EfSearch
    +int K
}
class HNSWIndex {
    -Graph
    -map[string,PointStruct] points
    -Distance metric
    -HNSWParams params
    +Upsert(points)
    +Search(query, filter, topK)
    +Delete(id)
    +Count()
}
class FlatIndex {
    -map[string,PointStruct] points
    -Distance metric
    +Upsert(points)
    +Search(query, filter, topK)
    +Delete(id)
    +Count()
}
HNSWIndex --> HNSWParams : "uses"

Data Model and Filtering

  • Payload supports multi-type values; Filter supports must/must_not condition combinations.
  • Filter types cover exact, range, prefix, contains, regex, meeting common business filtering needs.

Dependency Analysis

  • Module dependencies
  • go.mod specifies bbolt, coder/hnsw, protobuf and other dependencies.
  • Component coupling
  • API server depends on storage and index; storage depends on bbolt and Protobuf; HNSW depends on coder/hnsw.
  • External interfaces
  • Provides services through HTTP REST API externally; internally interacts through Protobuf and bbolt.
graph LR
A["api/server.go"] --> C["core/models.go"]
A --> S["core/storage.go"]
S --> B["bbolt"]
S --> P["Protobuf"]
A --> F["core/flat_index.go"]
A --> H["core/hnsw_index.go"]
H --> G["coder/hnsw"]

Performance Considerations and Tuning

  • Hardware recommendations (experience reference)
  • Small scale (under million): 16-32 GB memory, SSD, 4-8 core CPU.
  • Large scale (10 million and above): 64+ GB memory, NVMe SSD, 16+ core CPU.