Typical Application Scenarios¶
AI Retrieval, Recommendation Systems, Semantic Search
5.1 Application Scenarios Overview¶
mindmap
root((Vector Database Applications))
RAG Retrieval Augmentation
Enterprise Knowledge Base
Intelligent Q&A
Document Summarization
Recommendation Systems
E-commerce Recommendations
Content Recommendations
Music and Video
Semantic Search
Search Engines
Intelligent Customer Service
Code Search
Multi-Modal Retrieval
Search by Image
Video Understanding
Audio/Video Retrieval
Anomaly Detection
Financial Risk Control
Cybersecurity
Quality Control
AI Agent
Memory Storage
Tool Calling
Context Management
5.2 RAG (Retrieval-Augmented Generation)¶
What is RAG¶
RAG (Retrieval-Augmented Generation) is a technical architecture that combines retrieval systems with large language models. It enables AI to "consult" external knowledge bases and then generate answers based on the retrieved content.
Why RAG is Needed:
| Problem | Description | RAG Solution |
|---|---|---|
| Outdated knowledge | LLM training data has a cutoff date | Retrieve latest knowledge in real-time |
| Hallucinations | LLMs may generate incorrect information | Answer based on real documents |
| Private knowledge | LLMs don't know enterprise private data | Retrieve from enterprise knowledge base |
| Traceability | Cannot verify source of answers | Provide citation sources |
RAG System Architecture¶
flowchart TD
subgraph User Query
Q["User Question:
'What is the company's annual leave policy?'"]
end
subgraph Retrieval Process
Q -->|1| EQ["Query Vectorization"]
EQ -->|2| SN["Vector Database Retrieval"]
SN -->|3| RD["Return Relevant Documents"]
RD -->|4| CT["Build Prompt"]
end
subgraph Generation Process
CT -->|5| LLM["Large Language Model"]
LLM -->|6| A["Generate Answer"]
end
subgraph Knowledge Base
SN -.->|"Query"| KB["Document Collection"]
KB -->|"Store"| CH["Chunked Storage"]
CH -->|"Vectorize"| KV["Vector Index"]
end
style Q fill:#e1f5fe
style SN fill:#fff3e0
style LLM fill:#c8e6c9
style A fill:#c8e6c9
style KB fill:#ffccbc
RAG Workflow Detailed Explanation¶
# RAG system implementation example
from sentence_transformers import SentenceTransformer
import vector_db # Hypothetical vector database client
# 1. Initialize components
model = SentenceTransformer('all-MiniLM-L6-v2')
db = vector_db.Client("localhost:6333")
collection = db.get_collection("company_docs")
# 2. User query
user_query = "What is the company's annual leave policy?"
# 3. Vectorize the query
query_embedding = model.encode(user_query)
# 4. Retrieve relevant documents
results = collection.search(
vector=query_embedding,
top_k=5,
filter={"department": "HR"} # Optional filter
)
# 5. Build context
context = "\n\n".join([r.content for r in results])
# 6. Build prompt
prompt = f"""Answer the user's question based on the following information. If the information is insufficient, please say you don't know.
Context:
{context}
User Question: {user_query}
Answer:"""
# 7. Call LLM to generate answer
response = llm.generate(prompt)
print(response)
RAG Query Flow Sequence Diagram¶
sequenceDiagram
participant User as User
participant Embed as Embedding Model
participant VDB as Vector Database
participant LLM as Large Language Model
User->>Embed: 1. Send query text
Embed-->>User: 2. Return query vector
User->>VDB: 3. Send query vector
VDB-->>User: 4. Return Top-K similar documents
User->>LLM: 5. Send documents + question
LLM-->>User: 6. Return generated answer
Document Chunking Strategies¶
The effectiveness of RAG largely depends on how documents are chunked:
# Common chunking strategies
# 1. Fixed-size chunking (simplest)
def chunk_by_size(text, chunk_size=500, overlap=50):
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunks.append(text[i:i + chunk_size])
return chunks
# 2. Chunk by paragraph (maintain semantic integrity)
def chunk_by_paragraph(text):
paragraphs = text.split('\n\n')
return [p for p in paragraphs if p.strip()]
# 3. Recursive chunking (more intelligent)
def chunk_recursively(text, delimiters=['\n\n', '\n', '.', '!']):
# Try larger delimiters first, refine progressively
pass
| Chunking Strategy | Pros | Cons | Use Case |
|---|---|---|---|
| Fixed size | Simple, controllable | May cut sentences | General scenarios |
| By paragraph | Maintains semantics | Uneven chunk sizes | Structured documents |
| Recursive | Balanced | Complex implementation | Best practice |
5.3 Recommendation Systems¶
Vector Representation in Recommendation Systems¶
In recommendation systems, both users and items can be represented as vectors:
# Recommendation system vector representation
user_vector = model.encode("User A's interest features")
item_vector = model.encode("Item B's features")
# Calculate user's interest score for the item
score = np.dot(user_vector, item_vector) # Dot product
top_items = find_top_k(score, k=10)
Recommendation System Architecture¶
flowchart TD
subgraph User Profile
U["User A"] -->|Behavior Data| UH["User Vector
[0.8, 0.3, 0.7, ...]"]
end
subgraph Item Database
I1["Product 1"] -->|Features| IV1["[0.9, 0.1, 0.6, ...]"]
I2["Product 2"] -->|Features| IV2["[0.2, 0.8, 0.4, ...]"]
I3["Product 3"] -->|Features| IV3["[0.7, 0.4, 0.9, ...]"]
end
subgraph Matching Engine
UH -->|Calculate similarity| MATCH["Vector Dot Product"]
IV1 --> MATCH
IV2 --> MATCH
IV3 --> MATCH
MATCH -->|Sort| TOP["Top-K Recommendations"]
end
style UH fill:#e1f5fe
style MATCH fill:#fff3e0
style TOP fill:#c8e6c9
Recommendation System Code Example¶
# Movie recommendation system example
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('all-MiniLM-L6-v2')
# Movie database
movies = [
{"id": 1, "title": "Interstellar", "genre": "Sci-Fi"},
{"id": 2, "title": "Titanic", "genre": "Romance"},
{"id": 3, "title": "Inception", "genre": "Sci-Fi"},
{"id": 4, "title": "Forrest Gump", "genre": "Drama"},
]
# Build movie vectors
movie_vectors = model.encode([m["title"] for m in movies])
# User preferences
user_liked = "Sci-fi movies space time travel"
user_vector = model.encode(user_liked)
# Calculate similarity
similarities = np.dot(user_vector, movie_vectors.T)
# Sort and return recommendations
top_indices = np.argsort(similarities)[::-1][:2]
recommendations = [movies[i]["title"] for i in top_indices]
print(f"Recommended movies: {recommendations}")
# Output: ['Interstellar', 'Inception']
5.4 Semantic Search¶
Semantic Search vs Keyword Search¶
graph LR
subgraph Keyword Search
KQ["Query: 'apple'"] --> KM["Match documents containing 'apple'"]
KM --> KR["'iPhone price drop'
'Apple pie recipe'
'Yantai apple wholesale'"]
end
subgraph Semantic Search
SQ["Query: 'apple'"] --> SE["Understand query intent: fruit? company?"]
SE --> SR["'iPhone'
'Apple Inc.'
'Red Fuji apple'"]
end
style KQ fill:#ffccbc
style SQ fill:#c8e6c9
style KR fill:#ffccbc
style SR fill:#c8e6c9
Semantic Search Code Example¶
# Semantic search code example
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Document corpus
documents = [
"Artificial intelligence will change our way of life",
"Machine learning is an important branch of AI",
"The weather is really nice today, good for going out",
"Deep learning has made breakthroughs in image recognition",
"The latest iPhone uses the A17 chip",
]
# Vectorize documents
doc_vectors = model.encode(documents)
# Semantic search query
query = "What is deep learning?"
query_vector = model.encode(query)
# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([query_vector], doc_vectors)[0]
# Return most relevant document
top_idx = np.argsort(similarities)[::-1][0]
print(f"Most relevant document: {documents[top_idx]}")
# Output: 'Deep learning has made breakthroughs in image recognition'
Multi-Language Search¶
# Cross-language semantic search
query_vector = model.encode("How to cook pasta?") # English query
# Chinese documents can also be retrieved
chinese_docs = [
"How to make spaghetti",
"What's the weather like in Beijing",
"Learn Python programming",
]
doc_vectors = model.encode(chinese_docs)
# Through multi-language models, semantically similar content returns high similarity
5.5 Multi-Modal Retrieval¶
What is Multi-Modal Retrieval¶
Multi-modal retrieval can simultaneously process and associate different types of data (text, images, audio, video), enabling "search by text for images", "search by image for images", and other capabilities.
graph TD
subgraph Input
T1["Text: 'An orange cat'"]
I1["Image: Cat photo"]
A1["Audio: Cat meowing"]
end
subgraph Vector Space
T2["Text Vector"]
I2["Image Vector"]
A2["Audio Vector"]
end
subgraph Cross-Modal Retrieval
T1 -->|CLIP Encode| T2
I1 -->|CLIP Encode| I2
A1 -->|CLIP Encode| A2
T2 -.->|"Similarity"| I2
T2 -.->|"Similarity"| A2
end
style T2 fill:#e1f5fe
style I2 fill:#fff3e0
style A2 fill:#c8e6c9
Multi-Modal Retrieval Code Example¶
# Search by image example
import clip
import torch
from PIL import Image
# Load CLIP model
model, preprocess = clip.load("ViT-B/32", device="cpu")
# Vectorize image database
def encode_image(img_path):
image = preprocess(Image.open(img_path)).unsqueeze(0)
with torch.no_grad():
image_features = model.encode_image(image)
return image_features.numpy()
# Image collection
image_paths = ["cat1.jpg", "cat2.jpg", "dog.jpg"]
image_vectors = [encode_image(p) for p in image_paths]
# Query image
query_image = preprocess(Image.open("query_cat.jpg")).unsqueeze(0)
query_vector = model.encode_image(query_image).numpy()
# Find most similar image
similarities = cosine_similarity(query_vector, np.array(image_vectors))
most_similar_idx = np.argmax(similarities)
print(f"Most similar image: {image_paths[most_similar_idx]}")
5.6 Anomaly Detection¶
Vector Databases in Anomaly Detection¶
flowchart TD
subgraph Normal Pattern Learning
D1["Normal data samples"] -->|Cluster| C1["Normal pattern center"]
D1 -->|Vector representation| V1["Normal vector space"]
end
subgraph Anomaly Detection
N["New data"] -->|Vectorize| NV
NV -->|Calculate distance| DIST["Distance to normal center"]
DIST -->|Judge| RESULT{Distance > Threshold?}
RESULT -->|Yes| ANOMALY["Anomaly"]
RESULT -->|No| NORMAL["Normal"]
end
style C1 fill:#c8e6c9
style ANOMALY fill:#ffccbc
style NORMAL fill:#c8e6c9
Anomaly Detection Code Example¶
# Financial fraud detection example
from sklearn.cluster import KMeans
import numpy as np
# Normal transaction feature vectors
normal_transactions = np.random.rand(1000, 20) # 1000 normal transactions, 20 dimensions
# Learn normal patterns
kmeans = KMeans(n_clusters=10, random_state=42)
kmeans.fit(normal_transactions)
# Detect new transaction
new_transaction = np.random.rand(1, 20)
# Calculate distance to nearest normal center
distance = np.min(np.linalg.norm(
new_transaction - kmeans.cluster_centers_,
axis=1
))
# Judge if anomalous
threshold = 2.5
is_fraud = distance > threshold
print(f"Transaction anomalous: {is_fraud}, distance: {distance:.2f}")
5.7 AI Agent Memory Storage¶
Agent System Architecture¶
flowchart LR
subgraph Agent Core
P["Planning"]
M["Memory"]
A["Action"]
end
subgraph Memory Types
M -->|Short-term| STM["Short-term Memory
Current conversation context"]
M -->|Long-term| LTM["Long-term Memory
Stored in vector database"]
end
subgraph Retrieval Process
Q["Current task"] -->|Query| VDB["Vector Database"]
VDB -->|Return relevant memories| A
end
style M fill:#e1f5fe
style VDB fill:#fff3e0
style P fill:#c8e6c9
style A fill:#c8e6c9
Agent Memory System Code Example¶
# AI Agent memory system
class AgentMemory:
def __init__(self, vector_db, embed_model):
self.db = vector_db
self.embed = embed_model
def add_memory(self, content, memory_type="experience"):
"""Add memory"""
vector = self.embed.encode(content)
self.db.insert({
"vector": vector,
"content": content,
"type": memory_type
})
def retrieve(self, query, top_k=5):
"""Retrieve relevant memories"""
query_vector = self.embed.encode(query)
results = self.db.search(query_vector, top_k=top_k)
return [r["content"] for r in results]
def reflect(self, recent_experiences):
"""Reflection and summarization: convert short-term experiences to long-term memory"""
summary = self.llm.summarize(recent_experiences)
self.add_memory(summary, memory_type="reflection")