Embedded Mode¶
Introduction¶
This guide is for developers who want to directly integrate GoVector into Go applications as an "embedded library". You will learn: - Initialize local storage (BoltDB) and collection (Collection) - Insert vector data into collection (Upsert) - Query and filter using native Go structs - Achieve zero-network-overhead high-performance vector retrieval without starting HTTP service - Best practices for concurrency safety, memory management and graceful shutdown - Practical cases and common problem solving
Project Structure¶
GoVector provides two usage modes: embedded library and independent microservice. In embedded mode, the application directly imports the core package and builds a local vector database through Storage and Collection; in independent mode, it exposes Qdrant-compatible interfaces through HTTP API.
graph TB
subgraph "Application Process"
APP["Your Go Application"]
end
subgraph "GoVector Core"
ST["Storage
Local persistence(BoltDB)"]
COL["Collection
Collection/Index/Filter"]
IDX["VectorIndex Interface
Flat/HNSW Implementation"]
end
APP --> ST
APP --> COL
COL --> IDX
ST -.read/write.-> DB["BoltDB File"]
Core Components¶
- Storage Engine Storage: Responsible for local persistence (BoltDB), providing capabilities such as collection creation, point write/read, metadata save and list.
- Collection: Encapsulates collection dimension, distance metric, index strategy and concurrency control, providing operations like Upsert/Search/Delete.
- VectorIndex: Abstract interface, supports Flat (brute-force) and HNSW (approximate) implementations.
- Data Models: PointStruct, ScoredPoint, Filter/Condition, etc., used to describe vectors, scored results and filter conditions.
Architecture Overview¶
In embedded mode, the application directly calls the core package without network layer, all operations are completed within the process,具备更低延迟与更高吞吐潜力.
sequenceDiagram
participant App as "Application"
participant Store as "Storage"
participant Col as "Collection"
participant Idx as "VectorIndex(HNSW/Flat)"
participant DB as "BoltDB"
App->>Store : Initialize storage(NewStorage)
App->>Col : Create collection(NewCollection)
App->>Col : Upsert(batch write)
Col->>Store : Write to disk(UpsertPoints)
Store->>DB : Persist
Col->>Idx : Update in-memory index(Upsert)
App->>Col : Search(query)
Col->>Idx : Search(post-filter as needed)
Idx-->>Col : Return TopK results
Col-->>App : Native Go struct results
Detailed Component Analysis¶
Storage Engine Storage¶
- Responsibilities
- Open/close BoltDB file
- Ensure collection bucket exists
- Batch Upsert/delete points
- Load collection and metadata
- Optional vector quantization (SQ8)
- Key Points
- Transactional read/write (bbolt.Update/View)
- Protobuf serialize point data
- When quantizing, store compressed vector in Payload and decompress on load
- Metadata stored in special bucket collections_meta
Collection¶
- Responsibilities
- Maintain collection metadata and concurrency control
- Map Upsert/Search/Delete to index and storage
- Consistency guarantee: persist to disk first, then update in-memory index; best-effort rollback on failure
- Key Points
- RWMutex protects read/write
- Version number generated based on nanosecond timestamp, used for idempotency and consistency
- Supports Flat/HNSW transparent switching
VectorIndex and Implementation¶
- Interface definition: Upsert/Search/Delete/Count/GetIDsByFilter/DeleteByFilter
- FlatIndex: In-memory brute-force search, suitable for small scale or scenarios requiring exact results
- HNSWIndex: Graph structure approximate search, supports custom parameters (M/EfConstruction/EfSearch/K)
classDiagram
class VectorIndex {
+Upsert(points) error
+Search(query, filter, topK) []ScoredPoint
+Delete(id) error
+Count() int
+GetIDsByFilter(filter) []string
+DeleteByFilter(filter) ([]string, error)
}
class FlatIndex {
-points map[string]*PointStruct
-metric Distance
+Upsert(points) error
+Search(query, filter, topK) []ScoredPoint
+Delete(id) error
+Count() int
+GetIDsByFilter(filter) []string
+DeleteByFilter(filter) ([]string, error)
}
class HNSWIndex {
-graph *Graph
-points map[string]*PointStruct
-metric Distance
-params HNSWParams
+Upsert(points) error
+Search(query, filter, topK) []ScoredPoint
+Delete(id) error
+Count() int
+GetIDsByFilter(filter) []string
+DeleteByFilter(filter) ([]string, error)
}
VectorIndex <|.. FlatIndex
VectorIndex <|.. HNSWIndex
Data Model and Filtering¶
- PointStruct/ScoredPoint: Carry vector, version and metadata
- Filter/Condition: Supports Must/MustNot, condition types include exact/range/prefix/contains/regex
- Filter matching logic: MatchFilter/matchCondition/matchRange/matchPrefix/matchContains/matchRegex
Dependency Analysis¶
- go.mod shows core dependencies: bbolt (local storage), hnsw (graph index), protobuf (serialization)
- Embedded mode only uses core package, does not depend on HTTP layer
graph LR
GOV["github.com/DotNetAge/govector/core"]
BB["go.etcd.io/bbolt"]
HNSW["github.com/coder/hnsw"]
PB["google.golang.org/protobuf"]
GOV --> BB
GOV --> HNSW
GOV --> PB
Performance and Advantages¶
- No HTTP overhead: Direct local function calls, avoiding network round-trip and serialization/deserialization costs
- Direct Go struct access: Returns native Go objects, convenient for secondary processing
- HNSW approximate search: Maintains sub-millisecond latency at million-scale vector volume
- Persistence and automatic discovery: Collections and points can be recovered after restart, reducing cold start cost
Concurrency Safety and Memory Management¶
- Concurrency control
- Collection uses RWMutex to protect Upsert/Search/Delete
- HNSWIndex/FlatIndex also use internal mutexes
- Consistency guarantee
- Upsert writes to storage first, then updates in-memory index; on failure, best-effort rollback (delete written points)
- Memory management
- HNSWIndex maintains points map and graph structure
- FlatIndex only maintains in-memory map
- Optional SQ8 quantization: Compress on write, decompress on read, reducing memory usage
- Graceful shutdown
- Storage.Close closes bbolt
- Server mode triggers graceful shutdown via signals
Integration and Usage Guide¶
Quick Start (Embedded)¶
- Initialize storage: Create local BoltDB file
- Create collection: Specify dimension, distance metric, whether to enable HNSW
- Insert data: Upsert batch write
- Query: Construct Filter conditions, call Search to get TopK results
- Close: Ensure Storage.Close completes normally
Reference example paths - README.md:74-105 - example/embedded/main.go:10-62
Complete Flow (From Initialization to Complex Query)¶
flowchart TD
S["Initialize storage
NewStorage(dbPath)"] --> C["Create collection
NewCollection(name,dim,metric,store,useHNSW)"]
C --> U["Batch insert
Upsert(points)"]
U --> Q["Construct query
queryVector + Filter"]
Q --> R["Execute search
Search(query, filter, topK)"]
R --> O["Process results
Iterate ScoredPoint"]
O --> E["Graceful shutdown
store.Close()"]
Concurrency Safety Best Practices¶
- When multiple goroutines access the same Collection, follow mutex semantics for Upsert/Search/Delete
- Batch Upsert recommends batch submission to avoid lock competition from overly large single requests
- If cross-process sharing is needed, it is recommended to use independent process HTTP service mode