Multivariate Similarity Search-A Call for a New Breed of Similarity Search Algorithms
Published in ICDE, 2024
The similarity search task involves identifying pairs of similar vectors, e.g., time series. For example, given a query q , the user might wish to find all vectors in a dataset with a cosine similarity with q higher than a threshold t , or to find the top-k most similar vectors with q , using Euclidean distance. The task has been widely considered in different domains, ranging from data science for detecting correlations that help the analyst extract insights from the data, to e-commerce for recommending additional purchases to the users based on their shopping behavior. Accordingly, many similarity search algorithms and indices were proposed in the literature, focusing on efficiency, scalability for big datasets, and different distance measures. However, the majority of past work only considers pairwise similarity/distance measures. In this talk we will revisit similarity search under the lens of multivariate similarity measures.
