VectorVFS is a lightweight Python package that transforms your Linux filesystem into a vector database by leveraging the native VFS (Virtual File System) extended attributes. Rather than maintaining a separate index or external database, VectorVFS stores vector embeddings directly alongside each file—turning your existing directory structure into an efficient and semantically searchable embedding store.
VectorVFS supports Meta’s Perception Encoders (PE) [arxiv] which includes image/video encoders for vision language understanding, it outperforms InternVL3, Qwen2.5VL and SigLIP2 for zero-shot image tasks. We support both CPU and GPU but if you have a large collection of images it might take a while in the first time to embed all items if you are not using a GPU.
↫ Christian S. Perone
It won’t surprise many of you that this goes a bit above my paygrade, but according to my limited understanding, VectorVFS stores information about files inside the xattr part of inodes. The information being stored is converted into vectors first, and this is the part that breaks my brain a bit, because vectors in this context are far too complex for me to understand.
I vaguely understand the end result here – making files searchable using vector magic without using a dedicated database or separate files by using extended attributes in inodes – but the process is far more complicated to understand. It still seems like a very interesting approach, though, and I’d love for people smarter than me to take VectorVFS apart and explain it in easier terms for those of us who don’t fully grasp it.
Makes me think of BeOS / Haiku’s data attributes in the filesystem. Quite a lot of power there, ready and waiting to be tapped.
Came here to say this. I’d love to play around with it and see how it compares to BFS, both in a classic BeOS installation and in Haiku on modern hardware.
Thom, look at this : https://code.google.com/archive/p/word2vec/
I find it interesting, and a bit amusing, that this news comes right before the one in which Tom writes «This is the real impact of “AI”: streams of digital trash real humans have to clean up.»
A vector database is something we’ve come up with in the context of AI, and an AI model is needed to fill the vector database with semantic information and then make use of it.