Skip to content

Collection Management

Collections are the primary organizational unit in GoVector, similar to tables in traditional databases. This document explains how to create, configure, and manage collections effectively.

πŸ“‹ Collection Overview

A collection is a container for vectors with specific configuration parameters. Each collection has its own:

  • Vector configuration (dimension, distance metric)
  • Indexing strategy (HNSW or Flat)
  • Quantization settings (SQ8 compression)
  • Storage location (local directory)

πŸš€ Creating a Collection

Embedded Mode

import (
    "github.com/yourusername/govector/core"
)

// Create a new collection with custom configuration
collection, err := core.NewCollection(core.CollectionConfig{
    Name:       "my-collection",
    VectorLen:  768,             // Vector dimension
    Metric:     core.Cosine,      // Distance metric (Cosine, Euclidean, Dot)
    IndexType:  core.HNSW,        // Index type (HNSW or Flat)
    Quantize:   false,            // Enable/disable SQ8 quantization
    M:          16,               // HNSW M parameter (number of connections per node)
    EfConstruction: 200,          // HNSW construction parameter
    EfSearch:   10,               // HNSW search parameter
})
if err != nil {
    log.Fatalf("Failed to create collection: %v", err)
}

Microservice Mode

curl -X POST http://localhost:6333/collections \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-collection",
    "vectors": {
      "size": 768,
      "distance": "Cosine"
    },
    "hnsw_config": {
      "m": 16,
      "ef_construction": 200,
      "ef": 10
    },
    "quantization_config": {
      "enabled": false
    }
  }'

πŸ”§ Collection Configuration

Core Parameters

Parameter Description Default Range
VectorLen Vector dimension - 1-∞
Metric Distance metric Cosine Cosine, Euclidean, Dot
IndexType Index type HNSW HNSW, Flat
Quantize Enable SQ8 quantization false true, false

HNSW Index Parameters

Parameter Description Default Range
M Number of connections per node 16 4-64
EfConstruction Size of the dynamic candidate list during construction 200 10-∞
EfSearch Size of the dynamic candidate list during search 10 10-∞

πŸ“ Persistence

Saving a Collection

// Save collection to disk
if err := collection.Save("/path/to/collection"); err != nil {
    log.Fatalf("Failed to save collection: %v", err)
}

Loading a Collection

// Load collection from disk
loadedCollection, err := core.LoadCollection("/path/to/collection")
if err != nil {
    log.Fatalf("Failed to load collection: %v", err)
}

πŸ“Š Collection Operations

Upsert Points

// Upsert single point
point := core.PointStruct{
    ID:     "doc_1",
    Vector: []float32{0.1, 0.2, 0.3, /* ... */},
    Payload: core.Payload{
        "category": "tech",
    },
}

if err := collection.Upsert([]core.PointStruct{point}); err != nil {
    log.Fatalf("Failed to upsert point: %v", err)
}

// Upsert multiple points
points := []core.PointStruct{
    // ... multiple points ...
}

if err := collection.Upsert(points); err != nil {
    log.Fatalf("Failed to upsert points: %v", err)
}

Search Points

// Search with query vector
queryVector := []float32{0.1, 0.2, 0.3, /* ... */}
results, err := collection.Search(queryVector, nil, 10)
if err != nil {
    log.Fatalf("Failed to search: %v", err)
}

// Search with filter
filter := &core.Filter{
    Must: []core.Condition{
        {
            Key:   "category",
            Type:  core.MatchTypeExact,
            Match: core.MatchValue{Value: "tech"},
        },
    },
}

results, err = collection.Search(queryVector, filter, 10)
if err != nil {
    log.Fatalf("Failed to search with filter: %v", err)
}

Delete Points

// Delete by IDs
ids := []string{"doc_1", "doc_2"}
deleted, err := collection.Delete(ids, nil)
if err != nil {
    log.Fatalf("Failed to delete points: %v", err)
}
fmt.Printf("Deleted %d points\n", deleted)

// Delete by filter
deleted, err = collection.Delete(nil, filter)
if err != nil {
    log.Fatalf("Failed to delete points by filter: %v", err)
}
fmt.Printf("Deleted %d points by filter\n", deleted)

Get Points

// Get points by IDs
ids := []string{"doc_1", "doc_2"}
points, err := collection.Get(ids)
if err != nil {
    log.Fatalf("Failed to get points: %v", err)
}

for _, point := range points {
    fmt.Printf("ID: %s, Vector: %v\n", point.ID, point.Vector)
}

πŸ“ˆ Collection Statistics

Get Collection Info

// Get collection information
info, err := collection.Info()
if err != nil {
    log.Fatalf("Failed to get collection info: %v", err)
}

fmt.Printf("Collection: %s\n", info.Name)
fmt.Printf("Vector length: %d\n", info.VectorLen)
fmt.Printf("Metric: %s\n", info.Metric)
fmt.Printf("Index type: %s\n", info.IndexType)
fmt.Printf("Point count: %d\n", info.PointCount)

Microservice Mode

# Get collection information
curl http://localhost:6333/collections/my-collection

# List all collections
curl http://localhost:6333/collections

πŸ—‘οΈ Deleting a Collection

Embedded Mode

// Delete collection (removes from memory, but not from disk)
// To completely remove, delete the directory after closing

Microservice Mode

# Delete collection
curl -X DELETE http://localhost:6333/collections/my-collection

πŸ’‘ Best Practices

Collection Design

  • Vector Dimension: Choose the appropriate dimension based on your embedding model (e.g., 768 for BERT-based models)
  • Distance Metric: Use Cosine for normalized vectors, Euclidean for unnormalized vectors
  • Index Selection: Use HNSW for large datasets, Flat for small datasets (< 10,000 points)
  • Quantization: Enable SQ8 for large datasets to reduce memory and storage usage

Performance Optimization

  • Batch Operations: Use batch upserts for multiple points to improve performance
  • Index Parameters: Adjust HNSW parameters based on your dataset size:
  • Small datasets (10,000-100,000 points): M=12, EfConstruction=100
  • Medium datasets (100,000-1,000,000 points): M=16, EfConstruction=200
  • Large datasets (1,000,000+ points): M=24, EfConstruction=400

Storage Management

  • Backup: Regularly backup collection directories
  • Disk Space: Monitor disk usage, especially for large collections
  • Loading Time: Large collections may take time to load due to index rebuilding

🚩 Common Issues

Dimension Mismatch

  • Error: "dimension mismatch"
  • Solution: Ensure all vectors have the same dimension as the collection configuration

Out of Memory

  • Error: "out of memory"
  • Solution: Enable SQ8 quantization, use Flat index for small datasets, or reduce collection size

Slow Search Performance

  • Cause: Inappropriate HNSW parameters or large result sets
  • Solution: Increase EfSearch parameter, reduce topK, or use more selective filters