Start Here
Most early AI search features do not fail because the vector index is missing a trick.
They fail because the chunks are poor, the data is stale, permissions are wrong, or the search result is not tied back to the real record.
pgvector lets you test semantic search while keeping those normal product problems in Postgres.
What To Build First
Store embeddings beside the thing they represent, or in a table that points back to it.
create extension if not exists vector;
create table document_chunks (
id bigserial primary key,
document_id bigint not null,
body text not null,
embedding vector(1536) not null
);
Start with exact search while the dataset is small enough.
select id, document_id, body
from document_chunks
order by embedding <=> $1
limit 10;
Add an approximate index when latency makes it necessary.
create index document_chunks_embedding_hnsw
on document_chunks
using hnsw (embedding vector_cosine_ops);
Good First Uses
- Semantic search over docs already stored in Postgres.
- RAG prototypes.
- Support article search.
- Search with account filters or permissions.
- Small recommendation features.
This is especially useful when metadata filters matter as much as vector distance.
The Part People Skip
Chunking matters. Freshness matters. Permissions matter.
If users only have access to documents in their account, that filter must be in the query. Do not search all vectors and filter permissions in application code after the fact.
Also store the embedding model version. You do not want a table full of vectors from mixed models with no way to tell them apart.
What This Does Not Replace
pgvector is not automatically the best home for every search workload.
If vector search is the product, you may need dedicated search infrastructure, deeper ranking controls, larger index operations, or a team focused only on search quality.
Move To A Vector Database When
- Vector search is the main workload.
- Latency and recall tuning need dedicated infrastructure.
- Index builds and updates are too heavy for your Postgres setup.
- Search ranking needs its own team and release cycle.
- You need product features your Postgres setup cannot provide cleanly.
Until then, pgvector is a good way to learn what the feature really needs before splitting the data model.