FAISSDocumentStore
| API reference | FAISS |
| GitHub link | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/faiss |
FAISSDocumentStore is a local Document Store backed by FAISS for vector similarity search.
It keeps vectors in a FAISS index and stores document data in memory, with optional persistence to disk.
FAISSDocumentStore is a good fit for local development and small to medium-sized datasets where you want a lightweight setup without running an external database service.
Installation
Install the FAISS integration:
Initialization
Create a FAISSDocumentStore instance and write embedded documents:
from haystack import Document
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.document_stores.faiss import FAISSDocumentStore
document_store = FAISSDocumentStore(
index_path="my_faiss_index", # Optional: enables persistence on disk
index_string="Flat",
embedding_dim=768,
)
document_store.write_documents(
[
Document(content="This is first", embedding=[0.1] * 768),
Document(content="This is second", embedding=[0.2] * 768),
],
policy=DuplicatePolicy.OVERWRITE,
)
print(document_store.count_documents())
# Persist index and metadata files (`.faiss` and `.json`)
document_store.save("my_faiss_index")
Persistence
If you provide index_path when initializing FAISSDocumentStore, it tries to load existing persisted files (.faiss and .json) from that path.
You can also explicitly call:
save(index_path)to write index and metadata to disk.load(index_path)to load them later.
Example of loading from a previously saved folder/path:
from haystack_integrations.document_stores.faiss import FAISSDocumentStore
# This loads `my_faiss_index.faiss` and `my_faiss_index.json` if they exist
document_store = FAISSDocumentStore(index_path="my_faiss_index")
# Alternatively, initialize first and then load explicitly
another_store = FAISSDocumentStore(embedding_dim=768)
another_store.load("my_faiss_index")
Supported Retrievers
FAISSEmbeddingRetriever: Retrieves documents from FAISSDocumentStore based on query embeddings.
Fixing OpenMP Runtime Conflicts on macOS
Symptoms
You may encounter one or both of the following errors at runtime:
If setting OMP_NUM_THREADS=1 prevents the crash, the root cause is multiple OpenMP runtimes loaded simultaneously. Each runtime maintains its own thread pool and thread-local storage (TLS). When two runtimes spin up worker threads at the same time, they corrupt each other's memory — causing segfaults at N > 1 threads.
Diagnosis
First, find how many copies of libomp.dylib exist in your virtual environment:
If you see more than one, e.g.:
.venv/lib/pythonX.Y/site-packages/torch/lib/libomp.dylib
.venv/lib/pythonX.Y/site-packages/sklearn/.dylibs/libomp.dylib
.venv/lib/pythonX.Y/site-packages/faiss/.dylibs/libomp.dylib
you need to consolidate them into a single runtime.
Fix
The solution is to pick one canonical libomp.dylib (torch's is a good choice) and replace all other copies with symlinks pointing to it.
For each duplicate, delete the copy and replace it with a symlink:
# Delete the duplicate
rm /path/to/.venv/lib/pythonX.Y/site-packages/<package>/.dylibs/libomp.dylib
# Replace with a symlink to the canonical copy
ln -s /path/to/.venv/lib/pythonX.Y/site-packages/torch/lib/libomp.dylib \
/path/to/.venv/lib/pythonX.Y/site-packages/<package>/.dylibs/libomp.dylib
Repeat for every duplicate found. Because these packages use @loader_path-relative references to load libomp.dylib, the symlink will be transparently resolved to the single canonical runtime at load time.
Verify
After applying the fix, confirm only one unique libomp.dylib is being referenced:
All entries should resolve to the same canonical path. You should now be able to run without OMP_NUM_THREADS=1.