In the previous post, we discussed how to translate a variety of questions on semantic similarity into a single, tractable mathematical problem: which vector is closest to a query vector?
One way to address the nearest-vector question is, of course, to compare the query vector with each data vector. For a 2-dimensional Euclidean space this would translate to applying the distance formula for each resulting pair of vectors; for most modern embeddings, the cosine similarity measure is preferred as a more accurate representation of semantic proximity (in fact, cosine similarity is equivalent to the Euclidean distance between normalized vectors). Regardless of the similarity measure used, this method is slow and doesn’t scale well at all — we need to iterate over each point in the dataset, calculate the distance/similarity between the query point and this data point, and then identify the closest data point. Effectively, this approach scales linearly with the size of the dataset. Can we do better?
The key to solving this problem is realizing that we can build a structure ahead of time, well before the query comes in, that sits with the database and allows us to answer the query quickly. Such a structure is called an index. Indexes are by no means specific to semantic search — they have been used for decades to speed up database queries by orders of magnitude. If you have a simple database table that maps your friends’ names to their addresses and phone numbers, you may build an index on the name column to allow queries (e.g. “What is Fred’s address?”) to be answered quickly. An index is a sorted data structure, and in the same way that sorted lists can be scanned in logarithmic time via binary search, database indexes allow the database engine to perform a log-time search rather than a linear scan through all records. Most database indexes are B-trees — hierarchical, rapidly-navigable data structures that allow the database engine to quickly locate the row corresponding to a query.
While traditional B-tree indexes are well suited to exact-match queries (“find the address corresponding to the name Fred”), they do not address similarity or closeness. But semantic similarity search, by definition, requires similar matches rather than exact ones. So, how can we build an index on a billion-vector dataset to efficiently answer the ‘find the closest vector’ query? Or, more generally, how can we find the k-closest vectors efficiently?
In the final part of this post, we’ll talk about some approaches to building the right index for semantic similarity search.
At Quilt.AI, we use machine learning models to analyze semantic relationships between text, images, and ideas. Reach out to us at [email protected] for more information.
synthesizing vast data into actionable insights that reflect each market's unique cultural and economic backdrop
grasping the distinct consumer perspectives that these diverse regions offer
Curated digital profiles:
-Instagram, Twitter, and TikTok (US)
-Weibo and Douyin (China)
Pulled 400 million unique searches to estimate the growth of each segment
Used Quilt.AI’s Sphere language and image capabilities to categorise lifestyle areas into specific segments
These consumers are confident, bold, and comfortable with modern masculinity. They also often turn to social media to express their personal style and interests.
Actionable Insight: Collaborate with high-profile fashion influencers to create vibrant, trend-setting campaigns that resonate with this segment's desire for attention and admiration.
Highly image-driven, these individuals often seek validation through their appearance and are likely to engage heavily with both grooming and fashion products.
Actionable Insight:Leverage digital marketing strategies that feature before-and-after visuals and testimonials that showcase the transformative power of the products
These men aim to be recognized as modern, open-minded, and sensitive – embodying the image of "the woke good guy" in today's society by actively participating in movements related to activism and gender equality.
Actionable Insight:Design marketing campaigns that highlight their participation in these movements, showcasing products that enable them to express and amplify their desired social identities.
They value beauty while still maintaining traditional masculine ideals of what it means to be good-looking. These men also tend to seek out methods of maintaining their youthful appearances.
Actionable Insight:Market products that boost physical appeal and suit active lifestyles, and focus on dynamic marketing that highlights masculine elegance.
Despite seeing gender in traditionally binary terms, these men aren’t afraid of behaving in more feminine manners. They own their uniqueness and tend to be deeply loyal to brands that affirm their identity.
Actionable Insight:Focusing on brand narratives that celebrate individuality and personal expression will better engage this segment. Brands can also offer personalized services to maintain their commitment.
Despite seeing gender in traditionally binary terms, these men aren’t afraid of behaving in more feminine manners. They own their uniqueness and tend to be deeply loyal to brands that affirm their identity.
Actionable Insight:Focusing on brand narratives that celebrate individuality and personal expression will better engage this segment. Brands can also offer personalized services to maintain their commitment.