ANN Benchmarks with Etienne Dilocker — Weaviate Podcast #16

Recap: 16th Episode of the Weaviate Podcast!
  • Dataset Sizes (e.g. 1 million versus 1 billion vectors)
  • Vector Dimensionality (e.g. 384-d versus 2048-d)
  • Vector Distributions (e.g. clustering properties / distance histograms)

ANN Benchmarks on Weaviate

What makes each ANN Dataset Different?

How do these relate to my Dataset?

Problems with Random Vector Benchmarks

The Role of Clustering in ANN and Deep Learning

Recommendation as Language Modeling

Re-Ranking

Class-Property Schema in Weaviate

Weaviate Class-Property Schema Example

Weaviate Core and Python Inference Containers

The Weaviate Podcast

Follow our previous podcast guests on Twitter!

References

  1. ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. Martin Aumuller, Erik Bernhardsson, and Alexander Faithfull; 2018. arXiv:1807.05614.
  2. https://github.com/spotify/annoy. Accessed May 2022.
  3. Nearest neighbors and vector models — part 2 — algorithms and data structures. https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html. Accessed May 2022.
  4. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. Yury A. Malkov, Dmitry A. Yashunin; 2016. arXiv:1603.09320.

Cover Image

Thank you for reading! Please subscribe to SeMI Technologies on YouTube!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store