The rise of semantic search: a comparison of vector databases

In the world of ever-complex data and AI powered applications, databases have taken on a new challenge: providing a way to store and query the outputs of deep learning models. One of the latest innovations in this space is the Vector Database.

What exactly is a vector database?

A vector database is a specialised type of database designed to store high-dimensional vectors (also called embeddings), which are mathematical representations of an AI model's “understanding” of the data it’s received.

The video below and any image you see like this simply helps us to visualise embeddings in a 3D space that the human mind can interpret. In reality these embeddings are hundreds of dimension (e.g. 512D), which mean they are impossible to visualise in our 3D world.

What do vector databases enable?

Fundamentally, vector databases empower “semantic retrieval”, what this means is that instead of traditional keyword based searches, queries can be made based on the actual content of the product and context of the query.

Imagine searching for a product not just by its name or tag, but by the essence of its features (which may or may not be accurately described by the retailer but can be understood by AI) or even by a related feeling or mood. Instead of looking up a "blue cotton shirt," users might seek a product that feels "summery" or "cosy" and the system would understand and match this request semantically.

Vector databases significantly boost the speed of such complex searches.

What does this mean in e-commerce?

In e-commerce, this means that customers can find products that match their desires more intuitively and quickly, leading to a smoother and more personalised shopping experience.

In the vast world of e-commerce, we think of vectors as a unique language that gives voice to the distinctiveness of product descriptions and images.

What to Look for in a Vector Database?

These are some of our priorities when deciding on a vector database.

Performance ⚡: Can it efficiently query millions of products in a fraction of a second?
Maturity ⏳: Given the investor hype around many new vector databases, businesses would ideally want something proven and reliable
Ease of Use 🤷‍♂️: We're trying to avoid an overly complex setup or steep learning curve.
Native modelling support 🧩: An abstraction layer between the vector db and the model helps simplify code.

The contenders

✅ Pros

Relatively new but has large backing from investors.
Well-documented
Offers ultra-low query latency even with billions of item.
Provides a managed solution easing scaling and replication tasks.
Cloud-native with integrations in platforms like GCP.
Solid integration with programming languages like Python.

❌ Cons

Managed services come with a cost. For large volumes of data Pinecone might be a costly option
Closed source.

✅ Pros

New but financially backed
Fully open-source.
Minimal setup required to get started.

❌ Cons

Less mature documentation and support.
Less robust support for different types of AI models.
Deployment may be challenging.
Lacks comprehensive support for multi-modal models.

✅ Pros

Available in both open-source and managed versions.
Mature with advanced monitoring and replication capabilities.
Comprehensive documentation and strong support for popular LLMs and multi-modal models like CLIP.
Offers unique search features, such as moving towards semantic concepts.

❌ Cons

Resource intensive if a lot of data needs to be stored

✅ Pros

Scalable to billions of embeddings
Seamless GCP integration
High flexibility if configured correctly
First class support for new Google Models (PaLM)

❌ Cons

Complex setup
Lacks inherent modeling support
More suited to enterprise needs which might make it expensive for startups / small businesses.

In the world of ever-complex data and AI powered applications, databases have taken on a new challenge: providing a way to store and query the outputs of deep learning models. One of the latest innovations in this space is the Vector Database.

What exactly is a vector database?

A vector database is a specialised type of database designed to store high-dimensional vectors (also called embeddings), which are mathematical representations of an AI model's “understanding” of the data it’s received.

The video below and any image you see like this simply helps us to visualise embeddings in a 3D space that the human mind can interpret. In reality these embeddings are hundreds of dimension (e.g. 512D), which mean they are impossible to visualise in our 3D world.

What do vector databases enable?

Fundamentally, vector databases empower “semantic retrieval”, what this means is that instead of traditional keyword based searches, queries can be made based on the actual content of the product and context of the query.

Imagine searching for a product not just by its name or tag, but by the essence of its features (which may or may not be accurately described by the retailer but can be understood by AI) or even by a related feeling or mood. Instead of looking up a "blue cotton shirt," users might seek a product that feels "summery" or "cosy" and the system would understand and match this request semantically.

Vector databases significantly boost the speed of such complex searches.

What does this mean in e-commerce?

In e-commerce, this means that customers can find products that match their desires more intuitively and quickly, leading to a smoother and more personalised shopping experience.

In the vast world of e-commerce, we think of vectors as a unique language that gives voice to the distinctiveness of product descriptions and images.

What to Look for in a Vector Database?

These are some of our priorities when deciding on a vector database.

Performance ⚡: Can it efficiently query millions of products in a fraction of a second?
Maturity ⏳: Given the investor hype around many new vector databases, businesses would ideally want something proven and reliable
Ease of Use 🤷‍♂️: We're trying to avoid an overly complex setup or steep learning curve.
Native modelling support 🧩: An abstraction layer between the vector db and the model helps simplify code.

The contenders

✅ Pros

Relatively new but has large backing from investors.
Well-documented
Offers ultra-low query latency even with billions of item.
Provides a managed solution easing scaling and replication tasks.
Cloud-native with integrations in platforms like GCP.
Solid integration with programming languages like Python.

❌ Cons

Managed services come with a cost. For large volumes of data Pinecone might be a costly option
Closed source.

✅ Pros

New but financially backed
Fully open-source.
Minimal setup required to get started.

❌ Cons

Less mature documentation and support.
Less robust support for different types of AI models.
Deployment may be challenging.
Lacks comprehensive support for multi-modal models.

✅ Pros

Available in both open-source and managed versions.
Mature with advanced monitoring and replication capabilities.
Comprehensive documentation and strong support for popular LLMs and multi-modal models like CLIP.
Offers unique search features, such as moving towards semantic concepts.

❌ Cons

Resource intensive if a lot of data needs to be stored

✅ Pros

Scalable to billions of embeddings
Seamless GCP integration
High flexibility if configured correctly
First class support for new Google Models (PaLM)

❌ Cons

Complex setup
Lacks inherent modeling support
More suited to enterprise needs which might make it expensive for startups / small businesses.

All Guides

Start your Wishlist

All Guides

Start your Wishlist

The rise of semantic search: a comparison of vector databases

The rise of semantic search: a comparison of vector databases

What exactly is a vector database?

What do vector databases enable?

What does this mean in e-commerce?

What to Look for in a Vector Database?

The contenders

✅ Pros

❌ Cons

✅ Pros

❌ Cons

✅ Pros

❌ Cons

✅ Pros

❌ Cons

What exactly is a vector database?

What do vector databases enable?

What does this mean in e-commerce?

What to Look for in a Vector Database?

The contenders

✅ Pros

❌ Cons

✅ Pros

❌ Cons

✅ Pros

❌ Cons

✅ Pros

❌ Cons

The rise of semantic search: a comparison of vector databases