Backup and Recovery¶
This document is intended for GoVector operations and development teams, providing a systematic overview of data backup and recovery strategies. It covers full and incremental backups, automatic and manual backups, storage location and security, recovery procedures (full and partial), disaster recovery and business continuity, verification and testing methods, and backup considerations during data migration and version upgrades. GoVector uses an embedded storage engine based on bbolt (BoltDB) for local persistence, and serializes point data using Protobuf to ensure data can be recovered after process restart.
Project Structure and Data Persistence Overview¶
- Storage Engine: bbolt (BoltDB), stores collection data in key-value format; collection metadata is stored in special buckets.
- Serialization: Point data uses Protobuf encoding/decoding; collection metadata uses JSON.
- Service Mode: Supports running as an independent microservice or embedded as a library in applications.
- Key Paths:
- Service Entry: Database file path and port specified via command-line arguments.
- API Layer: Provides collection management and point operation interfaces.
- Storage Layer: Responsible for collection bucket creation, point write/read, deletion, metadata read/write, etc.
graph TB
subgraph "Service Layer"
S["API Server
api/server.go"]
end
subgraph "Application Layer"
C["Collection
core/collection.go"]
end
subgraph "Storage Layer"
ST["Storage Engine
core/storage.go"]
BB["bbolt Database File
*.db"]
end
S --> C
C --> ST
ST --> BB
Core Components and Backup Relationships¶
- Storage: Responsible for collection bucket creation, point Upsert/Load/Delete, collection metadata save/load.
- Collection: Encapsulates collection dimensions, metrics, index types and in-memory indexes, provides Upsert/Search/Delete operations; Upsert persists to disk before updating in-memory index.
- API Server: Exposes REST interfaces externally, internally holds Storage and Collection, responsible for graceful shutdown.
- Service Entry: Parses command-line arguments, initializes Storage and Collection, starts API service and listens for signals for graceful shutdown.
Architecture Overview¶
The diagram below shows the key call chain from API request to storage persistence, and the impact of graceful shutdown on data consistency.
sequenceDiagram
participant Client as "Client"
participant API as "API Server"
participant Col as "Collection"
participant Store as "Storage"
participant DB as "bbolt Database"
Client->>API : "PUT /collections/{name}/points"
API->>Col : "Upsert(points)"
Col->>Store : "UpsertPoints(collection, points)"
Store->>DB : "Transaction write point data"
DB-->>Store : "Write success"
Store-->>Col : "Return"
Col->>Col : "Update in-memory index"
Col-->>API : "Return success"
API-->>Client : "200 OK"
Note over API,DB : "During graceful shutdown, service waits for connection close, Storage.Close() ensures data flush"
Detailed Component Analysis¶
Storage Component (Persistence Core)¶
- Responsibilities: Collection bucket management, point batch write/delete, collection metadata save/load, collection list query.
- Key Points:
- Collection bucket names are the collection name; collection metadata is stored in a special bucket for easy automatic collection rebuild on startup.
- Write uses bbolt transactions ensuring atomicity; serialization uses Protobuf, metadata uses JSON.
- Supports optional vector quantization (SQ8), compresses vectors before storage and decompresses on load.
classDiagram
class Storage {
-db : bbolt.DB
-closed : bool
-quantizer : Quantizer
-useQuant : bool
+EnsureCollection(name) error
+UpsertPoints(colName, points) error
+LoadCollection(colName) map[string]*PointStruct
+DeletePoints(colName, ids) error
+SaveCollectionMeta(name, meta) error
+LoadCollectionMeta(name) *CollectionMeta
+ListCollectionMetas() []CollectionMeta
+Close() error
}
Collection Component (Collection and Index)¶
- Responsibilities: Encapsulates collection metadata and in-memory index, provides Upsert/Search/Delete.
- Key Points:
- Upsert persists to storage first, then updates in-memory index; on failure, attempts to rollback storage-side changes to maintain consistency.
- Supports Flat/HNSW两种索引,按需选择。
classDiagram
class Collection {
+Name : string
+VectorLen : int
+Metric : Distance
-index : VectorIndex
-storage : *Storage
+Upsert(points) error
+Search(query, filter, topK) []ScoredPoint
+Delete(points, filter) (int, error)
+Count() int
}
API Server Component (REST Interface and Graceful Shutdown)¶
- Responsibilities: Registers collection management and point operation routes, starts/stops HTTP service, loads collection metadata.
- Key Points:
- On startup, loads collection metadata from storage and rebuilds collections.
- Graceful shutdown uses context for timeout control, ensuring requests are completed before closing.
sequenceDiagram
participant Main as "Main Program"
participant API as "API Server"
participant Store as "Storage"
Main->>API : "NewServer(addr, store)"
Main->>API : "AddCollection(col)"
Main->>API : "Start()"
API->>Store : "ListCollectionMetas()"
Store-->>API : "metas"
API->>API : "Create Collection for each metadata and register"
Main->>Main : "Listen for OS signals"
Main->>API : "Stop(ctx)"
API-->>Main : "HTTP server shutdown"
Dependency Analysis¶
- go.mod shows core dependencies: bbolt (local database), Protobuf (serialization), HNSW (indexing).
- Service entry controls database file path and port via command-line parameters for easy deployment and backup in different environments.
graph LR
M["go.mod"] --> BB["go.etcd.io/bbolt"]
M --> PB["google.golang.org/protobuf"]
M --> HNSW["github.com/coder/hnsw"]
MAIN["cmd/govector/main.go"] --> API["api/server.go"]
API --> CORE["core/collection.go"]
CORE --> STORE["core/storage.go"]
Performance and Reliability Considerations¶
- bbolt transaction writes ensure atomicity for single Upsert, suitable for high-concurrency batch write scenarios.
- Quantization (SQ8) can significantly reduce disk usage but increases CPU overhead; decide whether to enable based on data scale and hardware capability.
- Graceful shutdown and Close() calls ensure data flush, avoiding data loss risk from process abnormal exit.
Backup Strategy and Implementation Steps¶
Backup Timing and Frequency Recommendations¶
- Full Backup
- Before launch: Perform a full backup immediately after initial deployment.
- Change window: Perform full backup during business off-peak hours (e.g., early morning).
- Rule: Recommend once-daily full backup, or immediately after significant data changes.
- Incremental Backup
- Recommendation: Combined with business logs and API operation records, can trigger incremental backup after each large-scale Upsert/delete.
- Note: bbolt is a single-file database, typically using "copy database file" as the incremental backup method (see next section).
Automatic Backup Script¶
- Recommended script approach (example steps, not directly pasting code):
- Stop service or enter read-only mode (optional).
- Copy database file to backup directory (recommend using hard links on the same filesystem to reduce IO).
- Compress and verify backup file (optional).
- Upload to object storage or archive to local/remote media.
- Clean up expired backups (retain N days/weeks/months).
-
Log backup results and alerts.
-
Reference scripts and commands in the repository:
- Service start and stop: Command-line arguments specify database path and port, graceful shutdown.
- Build and cleanup: Makefile provides build, run, cleanup targets, can be used in automated pipelines.
Manual Backup Operation Steps¶
- Confirm database file path (default specified by command-line arguments).
- Stop service or switch to read-only mode (optional, reduce consistency risk).
- Copy database file to safe location (recommend using hard links on the same filesystem).
- Verify backup file integrity (can use checksum tools).
- Archiving and retention policy: Archive by day/week/month, retention period follows compliance requirements.
Backup Data Storage Location and Security Protection¶
- Storage Location
- Local: Database file and backup file should be placed on separate mount points, avoid sharing disk with logs.
- Remote: Recommend uploading to object storage (e.g., S3, NAS, cloud drive) or off-site machine room.
- Security Protection
- File permissions: Restrict database file and backup file access permissions (e.g., only allow running user to read/write).
- Encryption: Transport and at-rest encryption (e.g., enable object storage encryption or local disk encryption).
- Audit: Log backup operations and access logs, audit regularly.
Recovery Process and Drills¶
Full Recovery¶
- Preparation Phase
- Select the most recent successful full backup as baseline.
- Confirm backup file is available and has not been tampered with.
- Recovery Steps
- Stop service.