Skip to content

Typical Application Scenarios

AI Retrieval, Recommendation Systems, Semantic Search

5.1 Application Scenarios Overview

mindmap
  root((Vector Database Applications))
    RAG Retrieval Augmentation
      Enterprise Knowledge Base
      Intelligent Q&A
      Document Summarization
    Recommendation Systems
      E-commerce Recommendations
      Content Recommendations
      Music and Video
    Semantic Search
      Search Engines
      Intelligent Customer Service
      Code Search
    Multi-Modal Retrieval
      Search by Image
      Video Understanding
      Audio/Video Retrieval
    Anomaly Detection
      Financial Risk Control
      Cybersecurity
      Quality Control
    AI Agent
      Memory Storage
      Tool Calling
      Context Management

5.2 RAG (Retrieval-Augmented Generation)

What is RAG

RAG (Retrieval-Augmented Generation) is a technical architecture that combines retrieval systems with large language models. It enables AI to "consult" external knowledge bases and then generate answers based on the retrieved content.

Why RAG is Needed:

Problem Description RAG Solution
Outdated knowledge LLM training data has a cutoff date Retrieve latest knowledge in real-time
Hallucinations LLMs may generate incorrect information Answer based on real documents
Private knowledge LLMs don't know enterprise private data Retrieve from enterprise knowledge base
Traceability Cannot verify source of answers Provide citation sources

RAG System Architecture

flowchart TD
    subgraph User Query
        Q["User Question:
'What is the company's annual leave policy?'"] end subgraph Retrieval Process Q -->|1| EQ["Query Vectorization"] EQ -->|2| SN["Vector Database Retrieval"] SN -->|3| RD["Return Relevant Documents"] RD -->|4| CT["Build Prompt"] end subgraph Generation Process CT -->|5| LLM["Large Language Model"] LLM -->|6| A["Generate Answer"] end subgraph Knowledge Base SN -.->|"Query"| KB["Document Collection"] KB -->|"Store"| CH["Chunked Storage"] CH -->|"Vectorize"| KV["Vector Index"] end style Q fill:#e1f5fe style SN fill:#fff3e0 style LLM fill:#c8e6c9 style A fill:#c8e6c9 style KB fill:#ffccbc

RAG Workflow Detailed Explanation

# RAG system implementation example
from sentence_transformers import SentenceTransformer
import vector_db  # Hypothetical vector database client

# 1. Initialize components
model = SentenceTransformer('all-MiniLM-L6-v2')
db = vector_db.Client("localhost:6333")
collection = db.get_collection("company_docs")

# 2. User query
user_query = "What is the company's annual leave policy?"

# 3. Vectorize the query
query_embedding = model.encode(user_query)

# 4. Retrieve relevant documents
results = collection.search(
    vector=query_embedding,
    top_k=5,
    filter={"department": "HR"}  # Optional filter
)

# 5. Build context
context = "\n\n".join([r.content for r in results])

# 6. Build prompt
prompt = f"""Answer the user's question based on the following information. If the information is insufficient, please say you don't know.

Context:
{context}

User Question: {user_query}

Answer:"""

# 7. Call LLM to generate answer
response = llm.generate(prompt)
print(response)

RAG Query Flow Sequence Diagram

sequenceDiagram
    participant User as User
    participant Embed as Embedding Model
    participant VDB as Vector Database
    participant LLM as Large Language Model

    User->>Embed: 1. Send query text
    Embed-->>User: 2. Return query vector

    User->>VDB: 3. Send query vector
    VDB-->>User: 4. Return Top-K similar documents

    User->>LLM: 5. Send documents + question
    LLM-->>User: 6. Return generated answer

Document Chunking Strategies

The effectiveness of RAG largely depends on how documents are chunked:

# Common chunking strategies

# 1. Fixed-size chunking (simplest)
def chunk_by_size(text, chunk_size=500, overlap=50):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

# 2. Chunk by paragraph (maintain semantic integrity)
def chunk_by_paragraph(text):
    paragraphs = text.split('\n\n')
    return [p for p in paragraphs if p.strip()]

# 3. Recursive chunking (more intelligent)
def chunk_recursively(text, delimiters=['\n\n', '\n', '.', '!']):
    # Try larger delimiters first, refine progressively
    pass
Chunking Strategy Pros Cons Use Case
Fixed size Simple, controllable May cut sentences General scenarios
By paragraph Maintains semantics Uneven chunk sizes Structured documents
Recursive Balanced Complex implementation Best practice

5.3 Recommendation Systems

Vector Representation in Recommendation Systems

In recommendation systems, both users and items can be represented as vectors:

# Recommendation system vector representation
user_vector = model.encode("User A's interest features")
item_vector = model.encode("Item B's features")

# Calculate user's interest score for the item
score = np.dot(user_vector, item_vector)  # Dot product
top_items = find_top_k(score, k=10)

Recommendation System Architecture

flowchart TD
    subgraph User Profile
        U["User A"] -->|Behavior Data| UH["User Vector
[0.8, 0.3, 0.7, ...]"] end subgraph Item Database I1["Product 1"] -->|Features| IV1["[0.9, 0.1, 0.6, ...]"] I2["Product 2"] -->|Features| IV2["[0.2, 0.8, 0.4, ...]"] I3["Product 3"] -->|Features| IV3["[0.7, 0.4, 0.9, ...]"] end subgraph Matching Engine UH -->|Calculate similarity| MATCH["Vector Dot Product"] IV1 --> MATCH IV2 --> MATCH IV3 --> MATCH MATCH -->|Sort| TOP["Top-K Recommendations"] end style UH fill:#e1f5fe style MATCH fill:#fff3e0 style TOP fill:#c8e6c9

Recommendation System Code Example

# Movie recommendation system example
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')

# Movie database
movies = [
    {"id": 1, "title": "Interstellar", "genre": "Sci-Fi"},
    {"id": 2, "title": "Titanic", "genre": "Romance"},
    {"id": 3, "title": "Inception", "genre": "Sci-Fi"},
    {"id": 4, "title": "Forrest Gump", "genre": "Drama"},
]

# Build movie vectors
movie_vectors = model.encode([m["title"] for m in movies])

# User preferences
user_liked = "Sci-fi movies space time travel"
user_vector = model.encode(user_liked)

# Calculate similarity
similarities = np.dot(user_vector, movie_vectors.T)

# Sort and return recommendations
top_indices = np.argsort(similarities)[::-1][:2]
recommendations = [movies[i]["title"] for i in top_indices]

print(f"Recommended movies: {recommendations}")
# Output: ['Interstellar', 'Inception']
graph LR
    subgraph Keyword Search
        KQ["Query: 'apple'"] --> KM["Match documents containing 'apple'"]
        KM --> KR["'iPhone price drop'
'Apple pie recipe'
'Yantai apple wholesale'"] end subgraph Semantic Search SQ["Query: 'apple'"] --> SE["Understand query intent: fruit? company?"] SE --> SR["'iPhone'
'Apple Inc.'
'Red Fuji apple'"] end style KQ fill:#ffccbc style SQ fill:#c8e6c9 style KR fill:#ffccbc style SR fill:#c8e6c9

Semantic Search Code Example

# Semantic search code example
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Document corpus
documents = [
    "Artificial intelligence will change our way of life",
    "Machine learning is an important branch of AI",
    "The weather is really nice today, good for going out",
    "Deep learning has made breakthroughs in image recognition",
    "The latest iPhone uses the A17 chip",
]

# Vectorize documents
doc_vectors = model.encode(documents)

# Semantic search query
query = "What is deep learning?"
query_vector = model.encode(query)

# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([query_vector], doc_vectors)[0]

# Return most relevant document
top_idx = np.argsort(similarities)[::-1][0]
print(f"Most relevant document: {documents[top_idx]}")
# Output: 'Deep learning has made breakthroughs in image recognition'
# Cross-language semantic search
query_vector = model.encode("How to cook pasta?")  # English query

# Chinese documents can also be retrieved
chinese_docs = [
    "How to make spaghetti",
    "What's the weather like in Beijing",
    "Learn Python programming",
]

doc_vectors = model.encode(chinese_docs)
# Through multi-language models, semantically similar content returns high similarity

5.5 Multi-Modal Retrieval

What is Multi-Modal Retrieval

Multi-modal retrieval can simultaneously process and associate different types of data (text, images, audio, video), enabling "search by text for images", "search by image for images", and other capabilities.

graph TD
    subgraph Input
        T1["Text: 'An orange cat'"]
        I1["Image: Cat photo"]
        A1["Audio: Cat meowing"]
    end

    subgraph Vector Space
        T2["Text Vector"]
        I2["Image Vector"]
        A2["Audio Vector"]
    end

    subgraph Cross-Modal Retrieval
        T1 -->|CLIP Encode| T2
        I1 -->|CLIP Encode| I2
        A1 -->|CLIP Encode| A2

        T2 -.->|"Similarity"| I2
        T2 -.->|"Similarity"| A2
    end

    style T2 fill:#e1f5fe
    style I2 fill:#fff3e0
    style A2 fill:#c8e6c9

Multi-Modal Retrieval Code Example

# Search by image example
import clip
import torch
from PIL import Image

# Load CLIP model
model, preprocess = clip.load("ViT-B/32", device="cpu")

# Vectorize image database
def encode_image(img_path):
    image = preprocess(Image.open(img_path)).unsqueeze(0)
    with torch.no_grad():
        image_features = model.encode_image(image)
    return image_features.numpy()

# Image collection
image_paths = ["cat1.jpg", "cat2.jpg", "dog.jpg"]
image_vectors = [encode_image(p) for p in image_paths]

# Query image
query_image = preprocess(Image.open("query_cat.jpg")).unsqueeze(0)
query_vector = model.encode_image(query_image).numpy()

# Find most similar image
similarities = cosine_similarity(query_vector, np.array(image_vectors))
most_similar_idx = np.argmax(similarities)
print(f"Most similar image: {image_paths[most_similar_idx]}")

5.6 Anomaly Detection

Vector Databases in Anomaly Detection

flowchart TD
    subgraph Normal Pattern Learning
        D1["Normal data samples"] -->|Cluster| C1["Normal pattern center"]
        D1 -->|Vector representation| V1["Normal vector space"]
    end

    subgraph Anomaly Detection
        N["New data"] -->|Vectorize| NV
        NV -->|Calculate distance| DIST["Distance to normal center"]
        DIST -->|Judge| RESULT{Distance > Threshold?}
        RESULT -->|Yes| ANOMALY["Anomaly"]
        RESULT -->|No| NORMAL["Normal"]
    end

    style C1 fill:#c8e6c9
    style ANOMALY fill:#ffccbc
    style NORMAL fill:#c8e6c9

Anomaly Detection Code Example

# Financial fraud detection example
from sklearn.cluster import KMeans
import numpy as np

# Normal transaction feature vectors
normal_transactions = np.random.rand(1000, 20)  # 1000 normal transactions, 20 dimensions

# Learn normal patterns
kmeans = KMeans(n_clusters=10, random_state=42)
kmeans.fit(normal_transactions)

# Detect new transaction
new_transaction = np.random.rand(1, 20)

# Calculate distance to nearest normal center
distance = np.min(np.linalg.norm(
    new_transaction - kmeans.cluster_centers_,
    axis=1
))

# Judge if anomalous
threshold = 2.5
is_fraud = distance > threshold
print(f"Transaction anomalous: {is_fraud}, distance: {distance:.2f}")

5.7 AI Agent Memory Storage

Agent System Architecture

flowchart LR
    subgraph Agent Core
        P["Planning"]
        M["Memory"]
        A["Action"]
    end

    subgraph Memory Types
        M -->|Short-term| STM["Short-term Memory
Current conversation context"] M -->|Long-term| LTM["Long-term Memory
Stored in vector database"] end subgraph Retrieval Process Q["Current task"] -->|Query| VDB["Vector Database"] VDB -->|Return relevant memories| A end style M fill:#e1f5fe style VDB fill:#fff3e0 style P fill:#c8e6c9 style A fill:#c8e6c9

Agent Memory System Code Example

# AI Agent memory system
class AgentMemory:
    def __init__(self, vector_db, embed_model):
        self.db = vector_db
        self.embed = embed_model

    def add_memory(self, content, memory_type="experience"):
        """Add memory"""
        vector = self.embed.encode(content)
        self.db.insert({
            "vector": vector,
            "content": content,
            "type": memory_type
        })

    def retrieve(self, query, top_k=5):
        """Retrieve relevant memories"""
        query_vector = self.embed.encode(query)
        results = self.db.search(query_vector, top_k=top_k)
        return [r["content"] for r in results]

    def reflect(self, recent_experiences):
        """Reflection and summarization: convert short-term experiences to long-term memory"""
        summary = self.llm.summarize(recent_experiences)
        self.add_memory(summary, memory_type="reflection")