项目简介
Memvid通过将文本数据编码为视频,彻底改变了AI记忆管理,实现了百万级文本块的闪电般语义搜索和亚秒级检索速度。与传统向量数据库消耗大量内存和存储不同,Memvid将知识库压缩为紧凑的视频文件,同时保持对任何信息的即时访问。
🎥 演示
https://github.com/user-attachments/assets/ec550e93-e9c4-459f-a8a1-46e122b5851e
✨ 核心特性
-
🎥 视频即数据库:将数百万文本块存储在单个MP4文件中 -
🔍 语义搜索:使用自然语言查询查找相关内容 -
💬 内置聊天:支持上下文感知的对话式交互 -
📚 PDF支持:直接导入和索引PDF文档 -
🚀 快速检索:海量数据亚秒级搜索 -
💾 高效存储:相比传统数据库压缩10倍 -
🔌 可插拔LLM:支持OpenAI、Anthropic或本地模型 -
🌐 离线优先:视频生成后无需联网 -
🔧 简单API:仅需3行代码即可开始使用
🎯 应用场景
-
📖 数字图书馆:将数千本书索引到单个视频文件中 -
🎓 教育内容:创建课程材料的可搜索视频记忆 -
📰 新闻存档:将多年文章压缩为可管理的视频数据库 -
💼 企业知识:构建公司级可搜索知识库 -
🔬 研究论文:快速语义搜索科学文献 -
📝 个人笔记:将笔记转化为可搜索的AI助手
🚀 为什么选择Memvid?
突破性创新
-
视频即数据库:将数百万文本块存储在单个MP4文件中 -
即时检索:海量数据亚秒级语义搜索 -
10倍存储效率:视频压缩显著减少内存占用 -
零基础设施:无需数据库服务器,仅需可随处复制的文件 -
离线优先:视频生成后完全离线工作
轻量级架构
-
最小依赖:核心功能仅约1000行Python代码 -
CPU友好:无需GPU即可高效运行 -
便携:单个视频文件包含整个知识库 -
可流式传输:视频可从云存储流式传输
📦 安装
快速安装
ounter(line
pip install memvid
支持PDF
ounter(line
pip install memvid PyPDF2
推荐设置(虚拟环境)
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
# 创建新项目目录
mkdir my-memvid-project
cd my-memvid-project
# 创建虚拟环境
python -m venv venv
# 激活环境
# macOS/Linux:
source venv/bin/activate
# Windows:
venv\Scripts\activate
# 安装memvid
pip install memvid
# 支持PDF:
pip install PyPDF2
🎯 快速开始
基础用法
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
from memvid import MemvidEncoder, MemvidChat
# 从文本块创建视频记忆
chunks = ["重要事实1", "重要事实2", "历史事件详情", ...]
encoder = MemvidEncoder()
encoder.add_chunks(chunks)
encoder.build_video("memory.mp4", "memory_index.json")
# 与记忆对话
chat = MemvidChat("memory.mp4", "memory_index.json")
chat.start_session()
response = chat.chat("你知道哪些历史事件?")
print(response)
从文档构建记忆
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
from memvid import MemvidEncoder
import os
# 加载文档
encoder = MemvidEncoder(chunk_size=512, overlap=50)
# 添加文本文件
for file in os.listdir("documents"):
with open(f"documents/{file}", "r") as f:
encoder.add_text(f.read(), metadata={"source": file})
# 构建优化视频
encoder.build_video(
"knowledge_base.mp4",
"knowledge_index.json",
fps=30, # 更高FPS = 每秒更多文本块
frame_size=512 # 更大帧 = 每帧更多数据
)
高级搜索与检索
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
from memvid import MemvidRetriever
# 初始化检索器
retriever = MemvidRetriever("knowledge_base.mp4", "knowledge_index.json")
# 语义搜索
results = retriever.search("机器学习算法", top_k=5)
for chunk, score in results:
print(f"得分: {score:.3f} | {chunk[:100]}...")
# 获取上下文窗口
context = retriever.get_context("解释神经网络", max_tokens=2000)
print(context)
交互式聊天界面
ounter(lineounter(lineounter(lineounter(lineounter(line
from memvid import MemvidInteractive
# 启动交互式聊天UI
interactive = MemvidInteractive("knowledge_base.mp4", "knowledge_index.json")
interactive.run() # 在http://localhost:7860打开网页界面
完整示例:与PDF书籍对话
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
# 1. 创建新目录并设置环境
mkdir book-chat-demo
cd book-chat-demo
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 2. 安装依赖
pip install memvid PyPDF2
# 3. 创建book_chat.py
cat > book_chat.py << 'EOF'
from memvid import MemvidEncoder, chat_with_memory
import os
# 您的PDF文件
book_pdf = "book.pdf" # 替换为您的PDF路径
# 构建视频记忆
encoder = MemvidEncoder()
encoder.add_pdf(book_pdf)
encoder.build_video("book_memory.mp4", "book_index.json")
# 与书籍对话
api_key = os.getenv("OPENAI_API_KEY") # 可选:用于AI响应
chat_with_memory("book_memory.mp4", "book_index.json", api_key=api_key)
EOF
# 4. 运行
export OPENAI_API_KEY="您的API密钥" # 可选
python book_chat.py
🔧 API参考
MemvidEncoder
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
encoder = MemvidEncoder(
chunk_size=512, # 每块字符数
overlap=50, # 块间重叠字符数
model_name='all-MiniLM-L6-v2' # 句子转换模型
)
# 方法
encoder.add_chunks(chunks: List[str], metadata: List[dict] = None)
encoder.add_text(text: str, metadata: dict = None)
encoder.build_video(video_path: str, index_path: str, fps: int = 30, qr_size: int = 512)
MemvidRetriever
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
retriever = MemvidRetriever(
video_path: str,
index_path: str,
cache_size: int = 100 # 缓存帧数
)
# 方法
results = retriever.search(query: str, top_k: int = 5)
context = retriever.get_context(query: str, max_tokens: int = 2000)
chunks = retriever.get_chunks_by_ids(chunk_ids: List[int])
MemvidChat
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
chat = MemvidChat(
video_path: str,
index_path: str,
llm_backend: str = 'openai', # 'openai', 'anthropic', 'local'
model: str = 'gpt-4'
)
# 方法
chat.start_session(system_prompt: str = None)
response = chat.chat(message: str, stream: bool = False)
chat.clear_history()
chat.export_conversation(path: str)
🛠️ 高级配置
自定义嵌入
ounter(lineounter(lineounter(lineounter(lineounter(line
from sentence_transformers import SentenceTransformer
# 使用自定义嵌入模型
custom_model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
encoder = MemvidEncoder(embedding_model=custom_model)
视频优化
ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line
# 最大压缩
encoder.build_video(
"compressed.mp4",
"index.json",
fps=60, # 更高帧率
frame_size=256, # 更小帧
video_codec='h265', # 更好压缩
crf=28 # 压缩质量(更低=更好质量)
)
分布式处理
ounter(lineounter(lineounter(line
# 并行处理大数据集
encoder = MemvidEncoder(n_workers=8)
encoder.add_chunks_parallel(massive_chunk_list)
🐛 故障排除
常见问题
ModuleNotFoundError: No module named ‘memvid’
ounter(lineounter(lineounter(lineounter(line
# 确保使用正确的Python
which python # 应显示虚拟环境路径
# 如未激活虚拟环境:
source venv/bin/activate # Windows: venv\Scripts\activate
ImportError: PyPDF2 is required for PDF support
ounter(line
pip install PyPDF2
OpenAI API密钥问题
ounter(lineounter(lineounter(lineounter(line
# 设置API密钥(获取地址https://platform.openai.com)
export OPENAI_API_KEY="sk-..." # macOS/Linux
# Windows:
set OPENAI_API_KEY=sk-...
处理大型PDF
ounter(lineounter(lineounter(line
# 对于超大PDF,使用更小块大小
encoder = MemvidEncoder()
encoder.add_pdf("large_book.pdf", chunk_size=400, overlap=50)
🆚 与传统解决方案对比
|
|
|
|
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
📚 示例
查看examples/目录:
-
从维基百科转储构建记忆 -
创建个人知识库 -
多语言支持 -
实时记忆更新 -
与流行LLM集成
🔗 项目地址
-
https://github.com/Olow304/memvid/blob/main/README.md
🙏 致谢
由Olow304和Memvid社区创建。
使用以下工具构建:
-
sentence-transformers – 语义搜索的最先进嵌入 -
OpenCV – 计算机视觉和视频处理 -
qrcode – QR码生成 -
FAISS – 高效相似性搜索 -
PyPDF2 – PDF文本提取
特别感谢所有帮助改进Memvid的贡献者!
扫码加入技术交流群,备注「开发语言-城市-昵称」
(文:GitHubStore)