{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Rerankers\n", "\n", "In this notebook, we will show how to use RedisVL to rerank search results\n", "(documents or chunks or records) based on the input query. Today RedisVL\n", "supports reranking through: \n", "\n", "- A re-ranker that uses pre-trained [Cross-Encoders](https://sbert.net/examples/applications/cross-encoder/README.html) which can use models from [Hugging Face cross encoder models](https://huggingface.co/cross-encoder) or Hugging Face models that implement a cross encoder function ([example: BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base)).\n", "- The [Cohere /rerank API](https://docs.cohere.com/docs/rerank-2).\n", "\n", "Before running this notebook, be sure to:\n", "1. Have installed ``redisvl`` and have that environment active for this notebook.\n", "2. Have a running Redis Stack instance with RediSearch > 2.4 active.\n", "\n", "For example, you can run Redis Stack locally with Docker:\n", "\n", "```bash\n", "docker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest\n", "```\n", "\n", "This will run Redis on port 6379 and RedisInsight at http://localhost:8001." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "metadata": {} }, "outputs": [], "source": [ "# import necessary modules\n", "import os" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple Reranking\n", "\n", "Reranking provides a relevance boost to search results generated by\n", "traditional (lexical) or semantic search strategies.\n", "\n", "As a simple demonstration, take the passages and user query below:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "metadata": {} }, "outputs": [], "source": [ "query = \"What is the capital of the United States?\"\n", "docs = [\n", " \"Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.\",\n", " \"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.\",\n", " \"Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.\",\n", " \"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.\",\n", " \"Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.\"\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of reranking is to provide a more fine-grained quality improvement to\n", "initial search results. With RedisVL, this would likely be results coming back\n", "from a search operation like full text or vector." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the Cross-Encoder Reranker\n", "\n", "To use the cross-encoder reranker we initialize an instance of `HFCrossEncoderReranker` passing a suitable model (if no model is provided, the `cross-encoder/ms-marco-MiniLM-L-6-v2` model is used): " ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "metadata": {} }, "outputs": [], "source": [ "from redisvl.utils.rerank import HFCrossEncoderReranker\n", "\n", "cross_encoder_reranker = HFCrossEncoderReranker(\"BAAI/bge-reranker-base\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rerank documents with HFCrossEncoderReranker\n", "\n", "With the obtained reranker instance we can rerank and truncate the list of\n", "documents based on relevance to the initial query." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "metadata": {} }, "outputs": [], "source": [ "results, scores = cross_encoder_reranker.rank(query=query, docs=docs)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.07461125403642654 -- {'content': 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.'}\n", "0.05220315232872963 -- {'content': 'Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.'}\n", "0.3802368640899658 -- {'content': 'Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.'}\n" ] } ], "source": [ "for result, score in zip(results, scores):\n", " print(score, \" -- \", result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using the Cohere Reranker\n", "\n", "To initialize the Cohere reranker you'll need to install the cohere library and provide the right Cohere API Key." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "metadata": {} }, "outputs": [], "source": [ "#!pip install cohere" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "metadata": {} }, "outputs": [], "source": [ "import getpass\n", "\n", "# setup the API Key\n", "api_key = os.environ.get(\"COHERE_API_KEY\") or getpass.getpass(\"Enter your Cohere API key: \")" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "metadata": {} }, "outputs": [], "source": [ "from redisvl.utils.rerank import CohereReranker\n", "\n", "cohere_reranker = CohereReranker(limit=3, api_config={\"api_key\": api_key})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rerank documents with CohereReranker\n", "\n", "Below we will use the `CohereReranker` to rerank and truncate the list of\n", "documents above based on relevance to the initial query." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "metadata": {} }, "outputs": [], "source": [ "results, scores = cohere_reranker.rank(query=query, docs=docs)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9990564 -- Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.\n", "0.7516481 -- Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.\n", "0.08882029 -- The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.\n" ] } ], "source": [ "for result, score in zip(results, scores):\n", " print(score, \" -- \", result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Working with semi-structured documents\n", "\n", "Often times the initial result set includes other metadata and components that could be used to steer the reranking relevancy. To accomplish this, we can set the `rank_by` argument and provide documents with those additional fields." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "metadata": {} }, "outputs": [], "source": [ "docs = [\n", " {\n", " \"source\": \"wiki\",\n", " \"passage\": \"Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.\"\n", " },\n", " {\n", " \"source\": \"encyclopedia\",\n", " \"passage\": \"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.\"\n", " },\n", " {\n", " \"source\": \"textbook\",\n", " \"passage\": \"Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.\"\n", " },\n", " {\n", " \"source\": \"textbook\",\n", " \"passage\": \"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.\"\n", " },\n", " {\n", " \"source\": \"wiki\",\n", " \"passage\": \"Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.\"\n", " }\n", "]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "metadata": {} }, "outputs": [], "source": [ "results, scores = cohere_reranker.rank(query=query, docs=docs, rank_by=[\"passage\", \"source\"])" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "metadata": {} }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9988121 -- {'source': 'textbook', 'passage': 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.'}\n", "0.5974905 -- {'source': 'wiki', 'passage': 'Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.'}\n", "0.059101548 -- {'source': 'encyclopedia', 'passage': 'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.'}\n" ] } ], "source": [ "for result, score in zip(results, scores):\n", " print(score, \" -- \", result)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.13 ('redisvl2')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "9b1e6e9c2967143209c2f955cb869d1d3234f92dc4787f49f155f3abbdfb1316" } } }, "nbformat": 4, "nbformat_minor": 2 }