Debugging Tools and Techniques¶
This guide is for GoVector users and maintainers, systematically explaining how to efficiently use Go standard debugging tools (dlv, pprof) for performance analysis and memory leak detection in development, testing and production environments; how to locate problems through logs and error codes; how to use mock objects and test helper methods in unit and integration tests for debugging; and how to trace and analyze network requests and API calls. Also provides safety precautions and best practices for debugging in production environments.
Project Structure¶
GoVector uses modular layered organization:
- cmd/govector: Service entry, responsible for parsing parameters, initializing storage and collections, starting HTTP server and graceful shutdown.
- api: HTTP API layer, providing collection management and vector operation interfaces, internally holds collection map and storage instance.
- core: Core engine, containing index interface, data models, storage and quantization capabilities.
- cmd/bench: Benchmark script, used to evaluate build and query performance at different scales.
- .github/workflows/go.yml: CI pipeline, containing tests, race detection and coverage reporting.
- go.mod: Module and dependency declarations.
- Makefile: Common build, run, cleanup and benchmark commands.
graph TB
subgraph "Application Entry"
MAIN["cmd/govector/main.go"]
end
subgraph "API Layer"
API_SRV["api/server.go"]
end
subgraph "Core Engine"
CORE_IDX["core/index.go"]
CORE_MODELS["core/models.go"]
CORE_STORAGE["core/storage.go"]
end
subgraph "Testing and Benchmark"
TEST_API["api/server_test.go"]
TEST_CORE["core/collection_test.go"]
BENCH["cmd/bench/main.go"]
end
subgraph "CI/Build"
CI["go.yml"]
MOD["go.mod"]
MK["Makefile"]
end
MAIN --> API_SRV
API_SRV --> CORE_STORAGE
API_SRV --> CORE_MODELS
API_SRV --> CORE_IDX
TEST_API --> API_SRV
TEST_API --> CORE_STORAGE
TEST_CORE --> CORE_STORAGE
BENCH --> CORE_IDX
CI --> TEST_API
CI --> TEST_CORE
MK --> MAIN
MK --> BENCH
MOD --> API_SRV
MOD --> CORE_STORAGE
Core Components¶
- Server: Encapsulates HTTP server, collection registration, loading persisted collections, graceful shutdown.
- Storage: Local persistence based on bbolt, supports collection metadata and point data read/write, quantization switch.
- Data Models: Point structure, filters, matching conditions, range conditions, scored points, etc.
- Index Interface: Unified vector index interface, supporting insert, search, delete, statistics, etc.
- Benchmark: Generates random vectors, batch insert and query, outputs build time, average latency and QPS.
- Tests (api/servertest.go, core/collectiontest.go): Cover collection lifecycle, Upsert/Search/Delete, error paths and rollback behavior.
Architecture Overview¶
The diagram below shows the overall call chain and responsibility boundaries from command-line entry to API layer, then to core storage and index.
sequenceDiagram
participant CLI as "Command-line(main.go)"
participant API as "API Server(server.go)"
participant COL as "Collection(Collection)"
participant IDX as "Index(VectorIndex)"
participant ST as "Storage(Storage)"
CLI->>API : Initialize and start HTTP service
CLI->>COL : Create/load collection
COL->>ST : Read/write collection metadata and point data
COL->>IDX : Insert/search/delete
API-->>CLI : Provide /collections /points and other endpoints
Detailed Component Analysis¶
Server (Server) Debugging Points¶
- Concurrency safety: Collection map and http.Server access are protected by mutex to avoid race conditions.
- Start/stop flow: Start blocks on ListenAndServe; Stop uses context to control graceful shutdown.
- Log locations: Startup, shutdown, loading collections, error returns all print clear logs for easy problem location.
- Key paths: handleCreateCollection/handleUpsert/handleSearch/handleDelete input validation and error code returns.
flowchart TD
Start(["Enter Start"]) --> LoadMeta["Load collection metadata"]
LoadMeta --> SetupRoutes["Register route handlers"]
SetupRoutes --> Listen["ListenAndServe listen"]
Listen --> Shutdown["Receive signal or Stop call"]
Shutdown --> Graceful["Shutdown(ctx) graceful shutdown"]
Graceful --> End(["Exit"])
Storage (Storage) Debugging Points¶
- Transactions and buckets: Use bbolt View/Update transactions, collections named by buckets; metadata stored in special bucket.
- Serialization: Point data serialized through Protobuf; when quantization is enabled, compressed vectors are placed in payload.
- Error propagation: Database open, write, delete, deserialization failures all return errors with context.
- Close semantics: Close is idempotent, preventing duplicate close.
flowchart TD
Upsert(["UpsertPoints"]) --> TxBegin["Start update transaction"]
TxBegin --> GetBucket["Get collection bucket"]
GetBucket --> ForEach["Iterate points to write"]
ForEach --> Quantize{"Quantization enabled?"}
Quantize -- Yes --> StoreQuant["Write quantized vector to payload"]
Quantize -- No --> SkipQuant["Use original vector directly"]
StoreQuant --> Marshal["Protobuf encode"]
SkipQuant --> Marshal
Marshal --> Put["Write to bucket"]
Put --> TxCommit["Commit transaction"]
TxCommit --> Done(["Done"])
Data Model and Filter Debugging Points¶
- Payload structure: Key-value pairs, supports multiple types; branching by type when matching.
- Condition types: Exact match, range, prefix, contains, regex; note illegal types default to exact match.
- Regex compilation: When compilation fails, determined as non-match to avoid exception propagation.
flowchart TD
MatchFilter["MatchFilter(payload, filter)"] --> NilFilter{"Filter nil?"}
NilFilter -- Yes --> True["Return true"]
NilFilter -- No --> MustLoop["Iterate must conditions"]
MustLoop --> MustEval["Evaluate matchCondition one by one"]
MustEval --> MustFail{"Any not satisfied?"}
MustFail -- Yes --> False["Return false"]
MustFail -- No --> MustNotLoop["Iterate must_not conditions"]
MustNotLoop --> MustNotEval["Evaluate matchCondition one by one"]
MustNotEval --> MustNotFail{"Any satisfied?"}
MustNotFail -- Yes --> False
MustNotFail -- No --> True
Index Interface Debugging Points¶
- Unified interface: Upsert/Search/Delete/Count/GetIDsByFilter/DeleteByFilter.
- Implementation switching: Flat and HNSW can be transparently replaced, convenient for comparing performance and correctness.
Benchmark (bench) Debugging Points¶
- Memory statistics: Print Alloc/TotalAlloc/Sys/NumGC, convenient for observing memory growth and GC behavior.
- Batch insert: Generate vectors in batches and insert, reducing peak memory usage.
- Query statistics: Calculate average latency and QPS, convenient for horizontal comparison of Flat/HNSW performance at different scales.
Dependency Analysis¶
- External dependencies: bbolt (persistence), protobuf (serialization), hnsw (indexing).
- Module relationships: api depends on core; cmd/govector depends on api and core; tests depend on api and core.
graph LR
GOV["github.com/DotNetAge/govector"] --> API_PKG["api package"]
GOV --> CORE_PKG["core package"]
API_PKG --> CORE_PKG
CMD_MAIN["cmd/govector/main.go"] --> API_PKG
CMD_MAIN --> CORE_PKG
TEST_API["api/server_test.go"] --> API_PKG
TEST_API --> CORE_PKG
TEST_CORE["core/collection_test.go"] --> CORE_PKG
Performance and Memory Debugging¶
- Using dlv for breakpoint and step-through debugging
- Set breakpoints at service startup, observe collection loading, index build and query paths step by step.
- Set breakpoints at Upsert/Search/Delete key paths, check input parameters and return status.
- Verify serialization/deserialization and bbolt transaction execution at storage layer breakpoints.
- Using pprof for CPU/memory analysis
- After starting service, use go tool pprof to get CPU/heap analysis data, locate hot functions and memory allocation points.
- Combine with benchmark scripts for comparative analysis of different scales and index strategies.
- Memory leak detection
- Run tests and service with -race parameter, combined with pprof heap analysis to investigate unreleased references and resident large objects.
- Pay attention to lifecycle of point data load/unload and quantization buffer in storage layer.
Log and Error Analysis¶
- Server logs
- Startup/shutdown, loading collections, error returns (e.g., 404/400/500) all have clear log output for quick problem location.
- Error code interpretation
- 400: Invalid JSON, invalid parameters (e.g., distance metric, vector dimension).
- 404: Collection does not exist.
- 409: Collection already exists (create collection).
- 500: Internal error (e.g., index/storage failure).
- Common error scenarios
- Collection metadata missing or corrupted causes load failure.
- Storage closed but still attempting to write.
- Query/insert vector dimension inconsistent with collection configuration.
Unit Testing and Integration Testing Debugging¶
- Test helpers
- Use httptest to quickly construct requests and responses, covering create/delete/list/query/delete and other endpoints.
- Safely access collection state through test_helpers provided GetCollection/GetCollectionsMapSize.
- Mock objects
- MockIndex triggers rollback logic when collection Upsert fails, verifying storage consistency.
- Key debugging points
- Input validation: Invalid JSON, non-existent collection, illegal parameters.
- Business flow: Count increases after Upsert, Search returns expected number, Delete removes specified ID or points satisfying filter conditions.
- Error rollback: When index layer fails, ensure storage layer has no residual half-finished data.