Building a Local Multimodal Video RAG System - Python PoC and Memory Bottlenecks
IntroductionRecently, I’ve been exploring how to build a fully localized, privacy-preserving Multimodal Retrieval-Augmented Generation (RAG) system. My specific use case was indexing non-standard animated videos (like Chiikawa) that don’t have embedded subtitles. While existing solutions rely heavily on cloud APIs, I wanted to prove that a completely local pipeline is possible on edge devices. Thus, this Python-based Proof-of-Concept (PoC) was born. System ArchitectureThe system relies on ...
如何判断点是否在box中
在处理 KITTI 等自动驾驶数据集时,我们经常需要进行数据清洗。其中一个最基础但也最重要的操作,就是判断一个 3D 点(Point)是否位于某个 3D 边界框(Bounding Box)的内部。 例如,在训练 PointNet++ 进行分类或计数之前,我需要根据标签把前景点(在vehicles,pedestrians,cyclist内的点)提取出来。这里记录两种在我的 DistantCount 项目中用到的实现方法。 方法一:向量投影法 (Vector Projection)这种方法的思路来源于数学中的向量投影。只要我们知道 Box 的 8 个角点,我们就可以利用点积(Dot Product)将目标点投影到 Box 的三个主轴上,判断投影长度是否在轴长范围内。 这种方法比较通用,适用于任意方向的 Box,前提是你已经算出了 Box 的 8 个角点。 代码实现: 1234567891011121314151617181920212223242526272829303132333435363738394041424344import numpy as npdef points_in...
Hello World
Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub. Quick StartCreate a new post1$ hexo new "My New Post" More info: Writing Run server1$ hexo server More info: Server Generate static files1$ hexo generate More info: Generating Deploy to remote sites1$ hexo deploy More info: Deployment 评论