From Keywords to Context: How AI is set to revolutionise Product Search
Aug 19, 2023
The primary function of the internet is to allow us to find information that satisfies our queries, enlightens our understanding, and connects us to a vast world of knowledge.
In the early days of the World Wide Web, this was achieved through a rather simplistic mechanism: keyword-based search. Users would type in specific words or phrases, and search engines would match these keywords to the text within web pages. While effective to a degree, this approach had its limitations.
In this blog we discuss how advancements in Machine Learning and Artificial Intelligence are making it possible to find information not just based on keywords but by understanding the context and intent behind user queries.
Where does keyword-based Search miss the mark?
While keyword-based search engines laid the foundation for online search, these traditional search engines have found it difficult to keep pace with the evolving needs and expectations of users.
Keyword-based search scans and indexes content for specific terms users enter into search engines. It matches queries based on keyword occurrence. Although it has been a staple for years, it's a basic method for sorting online information.
BM25 - a reminder of the origins of search
An example is the BM25 algorithm, a traditional ranking function that focuses on keyword frequency in a document relative to its overall occurrence in the document collection.
This provides a basic measure of a document's relevance to a user's query, making it a step up from mere keyword counting but still quite rudimentary in understanding the larger context. In the evolving landscape of digital search, where understanding context and intent is increasingly important, BM25 serves as a reminder of the origins of search and its early attempts at refining the keyword-based approach.
Now let's delve into the specific areas where keyword-centric approaches may not fully meet today's search demands.
Where keyword-based search still struggles
Surface-Level Matching 🔍
Overlooks synonyms or related terms, which can lead to missed content.
May not detect variations in phrasing or terminology, limiting search scope.
Struggles with 'meaning', 'sentiment' and 'context' 🤖
May fail to discern between multiple meanings of a word (homonyms).
Struggles with queries that require a grasp of nuance or deeper context.
Can't determine user sentiment or purpose behind a query, leading to mismatched results.
Lack of Image Recognition and Understanding 🖼
Struggles with correlating textual content with relevant images.
Prioritises text over visual content, leading to a narrower search output.
Lack of NLP/NLU puts onus on users 🗣
Users need to think of "search-friendly" terms rather than what they naturally want to ask.
May misinterpret user queries that rely on contextual or cultural knowledge.
Struggles with questions framed in passive voice or complex sentence structures.
May not understand or correctly process conversational queries or colloquialisms.
Redundancy 🔁
Proliferation of similar content across various sites can reduce search diversity.
Often leads to dominant or popular websites overshadowing niche, yet relevant, content.
Users may find themselves sifting through repetitive or near-identical results.
🚫 Vulnerability to Manipulation
SEO tactics can and do artificially boost irrelevant or low-quality content.
Authenticity and quality of content is frequently compromised by gaming the keyword system.
Sites and product descriptions can misuse meta tags, leading search engines astray.
Can be gamed by networks of sites interlinking to artificially boost relevance.
🔄 Static Algorithm
Less responsive to evolving user needs or shifts in content relevance.
Struggles with updating its approach based on real-time user interaction patterns.
Queries are not just complex, terminology and trends constantly change
What's the Solution?
Semantic Understanding
Addressing the complexity of human queries requires a paradigm shift from traditional keyword matching. We believe the answer lies in 'semantic understanding'. Instead of merely matching words, semantic understanding delves deeper into the intent and contextual meaning behind a user's query. At the heart of this revolution is the recent wave of Artificial Intelligence (AI) advances. Semantic understanding harnesses the capabilities of AI to process, relate, and evaluate information similarly to how humans do.
What now makes this possible?
The advent of the transformer architecture marked a pivotal shift in the realm of deep learning, especially in how we process vast amounts of data. One of the distinguishing features of transformers is their scalability.
Unlike previous architectures, transformers are capable of processing all words or symbols in a sequence simultaneously due to their parallelisable nature. This ability not only speeds up training but also makes it feasible to handle large datasets with ease. In the context of semantic understanding, this means that transformers can efficiently process and understand large corpora of text, leading to more accurate and contextually relevant search results.
Once trained these models have the ability to represent words, phrases, or even entire sentences as vectors in a high-dimensional space. When a user submits a query, it's converted into a vector, and then the system finds the most relevant documents by calculating the closeness or similarity of these vectors. This is known as vector search. With embedding models, subtle nuances and deeper meanings of words are captured, allowing for a richer understanding of content beyond just the literal definitions.
By bridging the gap between human language and machine understanding, semantic understanding offers transformative advantages in how information is sought and delivered. Here are the distinct advantages that we believe semantic understanding offers in this domain:
Advantages of semantic understanding
Deep Contextual Understanding 🧠
Beyond mere keywords, semantic searches understand context, intent, and nuanced relationships, offering results that are more aligned with user intent.
Natural Language Processing 🗣
These engines can process conversational phrases, complex questions, or even colloquial language, eliminating the need for "search-engine-friendly" queries.
Multi modal: Image and Text Recognition 🏞+💬
Semantic search engines excel in multi-modal scenarios by understanding and correlating both text and images, providing comprehensive search results.
For example, in e-commerce, a user might input "red high-heeled shoes." While a traditional search might focus on the text, a semantic system can align the query with product images, ensuring the visuals in the search results truly match the user's intent.
Dynamic Adaptability 📈
Powered by AI and machine learning, semantic searches constantly evolve, refining their algorithms based on user interactions and feedback.
The more the AI processes, the smarter it becomes. This means it can continually adapt to changes in trends and language.
Language and Culture Sensitivity 🌍
Semantic searches often recognize cultural nuances, idioms, and regional language variations, making them more inclusive and globally applicable.
What does this mean for Moonsift?
In today's digital age, consumers expect not just answers, but the right answers, and fast. Every now and then radical innovations allow new solutions that leapfrog the seemingly unmovable incumbent solutions.
Semantic understanding and groundbreaking research in Machine Learning and Artificial Intelligence, put Moonsift in a unique position to redefine how people sift the ginormous universe of products available online. While platforms like Google Shopping have set industry standards, innovation doesn't stand still.
With one of the largest proprietary cross-retail databases in the world, help from the open source community and AI experts from leading Universities, Moonsift has been researching what is now possible.
Do the recent advances in AI enable new approaches and solutions that can leapfrog standards set by Google etc?
Semantic understanding R&D: To gauge the performance of early Moonsift outputs in our research, we've been putting them up against Google Shopping, Shop, Bing etc. We've subjected these platforms to some challenging queries to stretch their limits. Below we share some such examples of Moonsift vs Google.
Moonsift vs. Google (some fun examples)
As we build Moonsift's Shopping Copilot for the entire internet, we are constantly experimenting with and tweaking multi-modal AI against existing engines to evaluate progress.
Below are some fun examples that begin to illustrate what is possible with semantic search.
🔍 “A dress that looks like a Caipirinha”
If you don't know what a caipirinha cocktail is, this is what they look like for context.
Semantic outputs from Moonsift's AI vs Google Shopping
🔍 “Skirt with a pattern inspired by ocean waves”
People have learnt to stick within the limitations of existing search technology, but with increasing uptake of LLM enabled chat experiences, this is changing. Why should users limit their queries to what search engines can understand? With Moonsift, we're pushing these boundaries, aiming to meet users where their imagination and needs take them.
These early results are certainly exciting. We first set out on our mission to solve online shopper's 'product discovery' problem in 2018 when Transformers were barely on anyone's radar. At that time it would have cost us tens if not hundreds of millions of dollars to build models with this capabilty.
We look forward to sharing more about how AI is going to revolutionise the product research and discovery process for shoppers as we build the world's first shopping Copilot for the entire internet.