Collection Management¶
Collections are the primary organizational unit in GoVector, similar to tables in traditional databases. This document explains how to create, configure, and manage collections effectively.
π Collection Overview¶
A collection is a container for vectors with specific configuration parameters. Each collection has its own:
- Vector configuration (dimension, distance metric)
- Indexing strategy (HNSW or Flat)
- Quantization settings (SQ8 compression)
- Storage location (local directory)
π Creating a Collection¶
Embedded Mode¶
import (
"github.com/yourusername/govector/core"
)
// Create a new collection with custom configuration
collection, err := core.NewCollection(core.CollectionConfig{
Name: "my-collection",
VectorLen: 768, // Vector dimension
Metric: core.Cosine, // Distance metric (Cosine, Euclidean, Dot)
IndexType: core.HNSW, // Index type (HNSW or Flat)
Quantize: false, // Enable/disable SQ8 quantization
M: 16, // HNSW M parameter (number of connections per node)
EfConstruction: 200, // HNSW construction parameter
EfSearch: 10, // HNSW search parameter
})
if err != nil {
log.Fatalf("Failed to create collection: %v", err)
}
Microservice Mode¶
curl -X POST http://localhost:6333/collections \
-H "Content-Type: application/json" \
-d '{
"name": "my-collection",
"vectors": {
"size": 768,
"distance": "Cosine"
},
"hnsw_config": {
"m": 16,
"ef_construction": 200,
"ef": 10
},
"quantization_config": {
"enabled": false
}
}'
π§ Collection Configuration¶
Core Parameters¶
| Parameter | Description | Default | Range |
|---|---|---|---|
VectorLen |
Vector dimension | - | 1-β |
Metric |
Distance metric | Cosine |
Cosine, Euclidean, Dot |
IndexType |
Index type | HNSW |
HNSW, Flat |
Quantize |
Enable SQ8 quantization | false |
true, false |
HNSW Index Parameters¶
| Parameter | Description | Default | Range |
|---|---|---|---|
M |
Number of connections per node | 16 | 4-64 |
EfConstruction |
Size of the dynamic candidate list during construction | 200 | 10-β |
EfSearch |
Size of the dynamic candidate list during search | 10 | 10-β |
π Persistence¶
Saving a Collection¶
// Save collection to disk
if err := collection.Save("/path/to/collection"); err != nil {
log.Fatalf("Failed to save collection: %v", err)
}
Loading a Collection¶
// Load collection from disk
loadedCollection, err := core.LoadCollection("/path/to/collection")
if err != nil {
log.Fatalf("Failed to load collection: %v", err)
}
π Collection Operations¶
Upsert Points¶
// Upsert single point
point := core.PointStruct{
ID: "doc_1",
Vector: []float32{0.1, 0.2, 0.3, /* ... */},
Payload: core.Payload{
"category": "tech",
},
}
if err := collection.Upsert([]core.PointStruct{point}); err != nil {
log.Fatalf("Failed to upsert point: %v", err)
}
// Upsert multiple points
points := []core.PointStruct{
// ... multiple points ...
}
if err := collection.Upsert(points); err != nil {
log.Fatalf("Failed to upsert points: %v", err)
}
Search Points¶
// Search with query vector
queryVector := []float32{0.1, 0.2, 0.3, /* ... */}
results, err := collection.Search(queryVector, nil, 10)
if err != nil {
log.Fatalf("Failed to search: %v", err)
}
// Search with filter
filter := &core.Filter{
Must: []core.Condition{
{
Key: "category",
Type: core.MatchTypeExact,
Match: core.MatchValue{Value: "tech"},
},
},
}
results, err = collection.Search(queryVector, filter, 10)
if err != nil {
log.Fatalf("Failed to search with filter: %v", err)
}
Delete Points¶
// Delete by IDs
ids := []string{"doc_1", "doc_2"}
deleted, err := collection.Delete(ids, nil)
if err != nil {
log.Fatalf("Failed to delete points: %v", err)
}
fmt.Printf("Deleted %d points\n", deleted)
// Delete by filter
deleted, err = collection.Delete(nil, filter)
if err != nil {
log.Fatalf("Failed to delete points by filter: %v", err)
}
fmt.Printf("Deleted %d points by filter\n", deleted)
Get Points¶
// Get points by IDs
ids := []string{"doc_1", "doc_2"}
points, err := collection.Get(ids)
if err != nil {
log.Fatalf("Failed to get points: %v", err)
}
for _, point := range points {
fmt.Printf("ID: %s, Vector: %v\n", point.ID, point.Vector)
}
π Collection Statistics¶
Get Collection Info¶
// Get collection information
info, err := collection.Info()
if err != nil {
log.Fatalf("Failed to get collection info: %v", err)
}
fmt.Printf("Collection: %s\n", info.Name)
fmt.Printf("Vector length: %d\n", info.VectorLen)
fmt.Printf("Metric: %s\n", info.Metric)
fmt.Printf("Index type: %s\n", info.IndexType)
fmt.Printf("Point count: %d\n", info.PointCount)
Microservice Mode¶
# Get collection information
curl http://localhost:6333/collections/my-collection
# List all collections
curl http://localhost:6333/collections
ποΈ Deleting a Collection¶
Embedded Mode¶
// Delete collection (removes from memory, but not from disk)
// To completely remove, delete the directory after closing
Microservice Mode¶
π‘ Best Practices¶
Collection Design¶
- Vector Dimension: Choose the appropriate dimension based on your embedding model (e.g., 768 for BERT-based models)
- Distance Metric: Use Cosine for normalized vectors, Euclidean for unnormalized vectors
- Index Selection: Use HNSW for large datasets, Flat for small datasets (< 10,000 points)
- Quantization: Enable SQ8 for large datasets to reduce memory and storage usage
Performance Optimization¶
- Batch Operations: Use batch upserts for multiple points to improve performance
- Index Parameters: Adjust HNSW parameters based on your dataset size:
- Small datasets (10,000-100,000 points): M=12, EfConstruction=100
- Medium datasets (100,000-1,000,000 points): M=16, EfConstruction=200
- Large datasets (1,000,000+ points): M=24, EfConstruction=400
Storage Management¶
- Backup: Regularly backup collection directories
- Disk Space: Monitor disk usage, especially for large collections
- Loading Time: Large collections may take time to load due to index rebuilding
π© Common Issues¶
Dimension Mismatch¶
- Error: "dimension mismatch"
- Solution: Ensure all vectors have the same dimension as the collection configuration
Out of Memory¶
- Error: "out of memory"
- Solution: Enable SQ8 quantization, use Flat index for small datasets, or reduce collection size
Slow Search Performance¶
- Cause: Inappropriate HNSW parameters or large result sets
- Solution: Increase EfSearch parameter, reduce topK, or use more selective filters
π Related Documentation¶
- Data Model - Core data structures
- HNSW Index - HNSW index implementation
- Storage Engine - Persistence and serialization
- Usage Modes - Different ways to use GoVector