Release Notes — Nils Reimers on Cohere Search AI, Weaviate Podcast #63

Connor Shorten
5 min readAug 16


Hey everyone! Thank you so much for watching the Weaviate Podcast! I am aiming to make more content around these podcasts to hopefully capture more of the value in the conversation. Here is a short description of each of the chapters discussed and their timestamps if you would like to skip ahead to what is relevant to your interests!

0:00 Introduction - Re-introducing Nils Reimers to the Weaviate Podcast! In our first podcast, we discussed Nils’ evolution from creating Sentence Transformers to Cohere and the integration of new multilingual embedding models from Cohere into Weaviate! This podcast similarly discusses a new model from Cohere… 🥁… Rerankers! 🎉 We also discuss several academic topics about AI in Search technology!

1:30 Cohere Rerankers - Rerankers describe regression models that takes as input each <query, candidate document> pair and output a relevance score. This is unique from embedding models that score the vector distance of the query embedding between each candidate document embedding. Rerankers are much slower than pure retrieval, but they can be more accurate in many cases. Nils further describes how reranking can be applied to legacy systems that only use BM25 scoring, and we discuss how they can be used for complex queries, multi-discourse candidate documents, and in recommendation systems.

7:02 Dataset Curation at Cohere - What are the key challenges of curating datasets to develop commercial models? Nils describes the problem of outdated knowledge and the challenge this presents for training vector embedding and reranking models.

10:30 New Rerankers and XGBoost - XGBoost has long held the crown 👑 as the most widely applicable Machine Learning modeling algorithm, especially for personalized ranking and the integration of symbolic product features such as price or availability. Can text-based rerankers borrow anything from the symbolic features used in pioneering XGBoost? How do these worlds collide in new search systems?

14:35 Temporal Queries - Imagine the query What time is my flight leaving?. Embedding models trained to capture semantic similarity are likely to return emails about flight departure from say 10 years ago! Can we encode this kind of information into embeddings?

17:55 Metadata Extraction from Unstructured Text Chunks - Recent startups such as Unstructured or Docugami are showing how LLMs can be used to extract structured, symbolic data from unstructured text. Here is an example of this I used in our Weaviate blog post “LLMs and Search”

The next application of LLMs in Search Index Construction is in extracting structured data from unstructured text chunks. Suppose we have a text chunk extracted from the Wikipedia page of Jimmy Butler from the Miami Heat:

Now we prompt the Large Language Model as follows:

Given the following text chunk: `{text_chunk}`
Please identify any potential symbolic data contained in the passage. Please format this data as a json dictionary with the key equal to the name of the variable and the value equal to the value this variable holds as evidenced by the text chunk. The values for these variables should either have `int`, `boolean`, or `categorical` values.

For this example, text-davinci-003 with temperature = 0 produces the following output:

{ "Draft Pick": 30, "Most Improved Player": true, "Teams": ["Chicago Bulls", "Minnesota Timberwolves", "Philadelphia 76ers", "Miami Heat"], "NBA Finals": 2, "Lead League in Steals": true }

We can apply this prompt to each chunk of text in the Wikipedia article and then aggregate the symbolic properties discovered. We can then answer symbolic questions about our data such as, How many NBA players have played for more than 3 distinct NBA teams?. We can also use these filters for vector search as described above with respect to the Llama Index Query Engine.

21:52 Soft Filters - Filtered Vector Search describes using a hard filter, where every result contains the filter such as “Price < $500”. In this chapter, Nils and I return to our discussion of ranking models and soft filters. For example, if an iPhone is $520, that may still be ok if it matches other criteria. What is the best way to achieve this? With symbolic rankers such as XGBoost that take features and vector distances as input? LLMs that take the ranking logic as a prompt input? Or can we bake this deeper into the HNSW graph traversal?

24:58 Chunking and Long Document Representation - Text Chunking is typically step 1 in Vector Database / LLM pipelines. There is all sorts of nuance to this, but particularly our conversation is focused on disambiguation in long texts. As Nils describes, the Apple annual report will transition to self-references of the type “our company did this” or “we have decided this”. Can we apply LLMs Generative Feedback Loop style to disambiguate these references and create better search indexes. I also took Nils’ temperature on AliBi for long document embeddings.

38:00 Retrieval-Augmented Generation - Beginning with Tool Use and the Gorilla 🦍 LLMs. Discussing Cohere’s Coral system. Hallucination problem We communicated 30% discount to this client, is it ok to use this here? Hallucination and citing sources. Multiple types of queries SQL or Vector Search. Specialized models for Query Decomposition or Routing? So many exciting topics here!

45:40 Retrieval-Aware Training to solve Hallucinations - Discussing the idea of training LLMs with retrieval particularly to solve hallucinations. This is a fairly new idea lead by works such as Gorilla!

49:50 Learning to Search and End-to-End RAG - One idea for the future of RAG is to put gradients from the loss function in the decoder / LLM back into the embedding model. What are the unique challenges for achieving this? What is Nils’ sentiment on the effectiveness of this approach?

54:35 RETRO - Another cutting edge technique in RAG is to apply embeddings to the middle of the transformer rather than at the input layer. Nils describes a really exciting perspective on how we should think about open and closed book pretraining of LLMs!

59:25 Foundation Model for Search - Nils previews a new foundation model for embeddings!

Thank you so much for reading, if interested you can find the full podcast here!