Skip to content

Development Guide

This guide is for contributors and maintainers, providing complete development process documentation for the GoVector project, covering environment setup, code contribution workflow, testing strategies, debugging techniques, building and release, and CI/CD configuration. GoVector is a pure Go implemented embedded vector database, supporting HNSW approximate nearest neighbor retrieval, Qdrant-compatible API, BoltDB persistence, and 8-bit scalar quantization.

Project Structure

The repository follows a function module-based organization with core modules including:

  • Core engine and model definitions: core package
  • HTTP API server: api package
  • Command-line entry: cmd/govector
  • Benchmarks: cmd/bench
  • Examples and demos: example, demo.sh
  • Build and release scripts: scripts, Makefile
  • CI/CD: .github/workflows/go.yml
  • Documentation and license: README, LICENSE
graph TB
subgraph "Application Entry"
CLI["Command-line entry
cmd/govector/main.go"] Bench["Benchmark
cmd/bench/main.go"] Demo["Demo script
demo.sh"] end subgraph "API Layer" APIServer["HTTP API server
api/server.go"] end subgraph "Core Engine" CoreModels["Models and filters
core/models.go"] CoreCollection["Collection and index
core/collection_test.go"] end subgraph "Persistence and Utilities" Storage["Storage engine (BoltDB)
core/storage.go"] BuildScript["Release script
scripts/build_release.sh"] BrewFormula["Homebrew Formula
scripts/release/govector.rb"] ServiceUnit["SystemD service template
scripts/release/govector.service"] end CLI --> APIServer Bench --> CoreCollection Demo --> APIServer APIServer --> CoreCollection APIServer --> Storage CoreCollection --> Storage BuildScript --> BrewFormula BuildScript --> ServiceUnit

Core Components

  • Storage Layer (BoltDB): Responsible for persisting collection metadata and vector points, supporting automatic discovery and reload of collections.
  • Collection and Index: Provides vector insert, delete, query interfaces; supports Flat and HNSW index modes.
  • Filters and Payload: Supports exact match, range, prefix, contains, regex and other filter conditions.
  • HTTP API: Provides Qdrant-compatible REST interface, supporting collection management, point write, search, and delete.
  • Benchmarks: Provides large-scale vector construction and query performance evaluation tools.
  • Release and Installation: Supports multi-platform cross-compilation, Homebrew Formula, and SystemD service templates.

Architecture Overview

The diagram shows the overall interaction from command line to API, then to core engine and storage:

sequenceDiagram
participant Dev as "Developer"
participant CLI as "Command-line entry
cmd/govector/main.go" participant API as "HTTP API server
api/server.go" participant Core as "Core engine
core/*" participant Store as "Storage engine (BoltDB)" Dev->>CLI : Start service (-port/-db) CLI->>Store : Initialize local storage CLI->>Core : Create/load collections CLI->>API : Register collections and start listening API->>Core : Handle requests (write/search/delete) Core->>Store : Read/write collection metadata and vector points API-->>Dev : Return response (JSON)

Detailed Component Analysis

HTTP API Server

  • Provides collection management, point write, search, delete endpoints.
  • Supports automatic collection loading from storage, supports graceful shutdown.
  • Thread-safe: internally uses mutex to protect collection registration and HTTP server instances.
classDiagram
class Server {

    +string addr
    +map~string,*Collection~ collections
    +*Storage store
    +*http.Server httpServer
    +mutex serverMu
    +AddCollection(col)
    +Start() error
    +Stop(ctx) error
    -loadCollections() error

}
class Collection {

    +string Name
    +int VectorLen
    +Distance Metric
    +bool UseHNSW
    +Upsert(points) error
    +Search(vector, filter, limit) []ScoredPoint
    +Delete(ids, filter) int
    +Count() int

}
class Storage {

    +EnsureCollection(name)
    +LoadCollection(name) map[string]PointStruct
    +UpsertPoints(name, points)
    +ListCollectionMetas() []CollectionMeta

}
Server --> Collection : "register/query"
Server --> Storage : "load/persist"
Collection --> Storage : "read/write points"

Filters and Payload Matching

  • Supports Must/MustNot condition combinations, covering exact, range, prefix, contains, regex types.
  • Built-in regex compilation and string contains judgment, ensuring filter performance and correctness.
flowchart TD
Start(["Start matching"]) --> NilCheck{"Is filter empty?"}
NilCheck --> |Yes| True["Return true"]
NilCheck --> |No| MustAll["Iterate Must conditions"]
MustAll --> MustFail{"Any unsatisfied?"}
MustFail --> |Yes| False["Return false"]
MustFail --> |No| MustNotAny["Iterate MustNot conditions"]
MustNotAny --> MustNotFail{"Any satisfied?"}
MustNotFail --> |Yes| False
MustNotFail --> |No| True

Benchmark Flow

  • Generate random vectors for batch insertion, calculate build time and memory usage.
  • Random queries calculate average latency and QPS, with garbage collection control.
sequenceDiagram
participant Runner as "Benchmark entry
cmd/bench/main.go" participant Col as "Collection (memory/persistent)" participant Engine as "Index (HNSW/Flat)" Runner->>Col : NewCollection(dim, metric, storage, useHNSW) Runner->>Col : Batch Upsert (paginated) Col->>Engine : Build index Runner->>Col : Multiple random query Search Col->>Engine : Query TopK Runner-->>Runner : Calculate build time/query latency/QPS

Examples and Demos

  • Embedded example: Directly operate collections as Go structs, no network overhead.
  • Demo script: Start service and complete write, global search, and filtered search via HTTP API.

Dependency Analysis

  • Go Version: 1.25.1
  • Main external dependencies: HNSW graph library, BoltDB, Protocol Buffers
  • Indirect dependencies: math libraries, renameio, vek, exp, etc.
graph LR
App["GoVector application (go 1.25.1)"]
HNSW["coder/hnsw"]
BBolt["go.etcd.io/bbolt"]
PBuf["google.golang.org/protobuf"]
App --> HNSW
App --> BBolt
App --> PBuf

Performance and Testing Strategy

  • Unit tests: Cover collection creation, Upsert, Search, Delete, error scenarios, and HNSW parameter persistence.
  • Integration tests: End-to-end verification of service via HTTP API (create collection, write, search, delete).
  • Performance tests: Benchmark suite supports different scales and index modes, outputs build time, average latency, and QPS.
  • Concurrency and race: CI uses race detection; coverage reports uploaded to Codecov.
  • Coverage threshold: Project target is 85%, ignoring cmd, example, proto-generated files, and third-party paths.

Debugging and Development Best Practices

  • Environment Preparation
  • Use Go 1.25.1 or above.
  • Download and verify dependencies via go mod.
  • Local Running
  • Use command-line entry to start service, specifying port, database path, and whether to enable HNSW.
  • Refer to demo script for quick verification.
  • Debugging Tips
  • Startup parameters: port, database file path, index switch.
  • Logs: standard output and error logs help locate issues.
  • Concurrency safety: API server locks collection registration and HTTP server instances to avoid concurrent conflicts.
  • Code Quality
  • Use go test -race for race detection.
  • Use go test -coverprofile to generate coverage reports.
  • Follow minimal change principle, prefer extending capabilities within core package, keep API layer stable.

Build System and Release Process

  • Local Build
  • Makefile provides build, run, release, test, clean targets.
  • build outputs binary to bin/, run directly runs the service, test starts benchmarks, release calls scripts for multi-platform packages.
  • Release Script
  • Supports linux/amd64, linux/arm64, darwin/amd64, darwin/arm64 four-platform cross-compilation.
  • Generate compressed packages and calculate SHA256, update version and checksum in Homebrew Formula.
  • Homebrew and SystemD
  • Homebrew Formula defines download URL and checksum, provides service template.
  • SystemD service template defines working directory, restart policy, and log path.
flowchart TD
Dev["Developer"] --> Make["Makefile target"]
Make --> Build["build (generate binary)"]
Make --> Run["run (start service)"]
Make --> Test["test (run benchmarks)"]
Make --> Release["release (call release script)"]
Release --> Cross["Cross-compile (multi-platform)"]
Cross --> Pack["Package (.tar.gz/.zip)"]
Pack --> Checksum["Calculate SHA256"]
Checksum --> Brew["Update Homebrew Formula"]
Brew --> Dist["Output dist/ files"]

Troubleshooting Guide

  • Service fails to start
  • Check port occupation and permissions; confirm database path exists and is writable.
  • Check standard output and error logs, focus on storage initialization and collection loading stages.
  • API returns 404/400/500
  • 404: Collection does not exist; confirm collection name and creation process.
  • 400: JSON decode failure or invalid parameters; check request body format and fields.
  • 500: Internal error; check server logs to locate specific环节.
  • Performance anomalies
  • Check if HNSW is enabled; compare Flat and HNSW performance at different scales.
  • Monitor memory peak and GC frequency, adjust batch size and query topK if necessary.
  • Coverage threshold not met
  • Focus on ignored paths and test case coverage; prioritize supplementing core logic branches.

Conclusion

This guide provides full-chain practical suggestions from environment setup, development workflow, testing and performance evaluation, to building and release. Contributors are advised to follow minimal change principles, prioritize completing tests for core logic and boundary conditions, and attach performance comparisons and regression verification results in PRs to ensure GoVector's reliability and maintainability.

Appendix

  • Common Commands Reference
  • Build: make build
  • Run: make run
  • Test: make test
  • Clean: make clean
  • Release: make release
  • API Endpoint Overview (from server implementation)
  • POST /collections: Create collection
  • DELETE /collections/{name}: Delete collection
  • GET /collections: List collections
  • GET /collections/{name}: Get collection info
  • PUT /collections/{name}/points: Write points
  • POST /collections/{name}/points/search: Search
  • POST /collections/{name}/points/delete: Delete

  • Makefile:9-34