{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Semantic Routing\n", "\n", "RedisVL provides a `SemanticRouter` interface to utilize Redis' built-in search & aggregation in order to perform\n", "KNN-style classification over a set of `Route` references to determine the best match.\n", "\n", "This notebook will go over how to use Redis as a Semantic Router for your applications" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the Routes\n", "\n", "Below we define 3 different routes. One for `technology`, one for `sports`, and\n", "another for `entertainment`. Now for this example, the goal here is\n", "surely topic \"classification\". But you can create routes and references for\n", "almost anything.\n", "\n", "Each route has a set of references that cover the \"semantic surface area\" of the\n", "route. The incoming query from a user needs to be semantically similar to one or\n", "more of the references in order to \"match\" on the route." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from redisvl.extensions.router import Route\n", "\n", "\n", "# Define routes for the semantic router\n", "technology = Route(\n", " name=\"technology\",\n", " references=[\n", " \"what are the latest advancements in AI?\",\n", " \"tell me about the newest gadgets\",\n", " \"what's trending in tech?\"\n", " ],\n", " metadata={\"category\": \"tech\", \"priority\": 1}\n", ")\n", "\n", "sports = Route(\n", " name=\"sports\",\n", " references=[\n", " \"who won the game last night?\",\n", " \"tell me about the upcoming sports events\",\n", " \"what's the latest in the world of sports?\",\n", " \"sports\",\n", " \"basketball and football\"\n", " ],\n", " metadata={\"category\": \"sports\", \"priority\": 2}\n", ")\n", "\n", "entertainment = Route(\n", " name=\"entertainment\",\n", " references=[\n", " \"what are the top movies right now?\",\n", " \"who won the best actor award?\",\n", " \"what's new in the entertainment industry?\"\n", " ],\n", " metadata={\"category\": \"entertainment\", \"priority\": 3}\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize the SemanticRouter\n", "\n", "``SemanticRouter`` will automatically create an index within Redis upon initialization for the route references. By default, it uses the `HFTextVectorizer` to \n", "generate embeddings for each route reference." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "from redisvl.extensions.router import SemanticRouter\n", "from redisvl.utils.vectorize import HFTextVectorizer\n", "\n", "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", "\n", "# Initialize the SemanticRouter\n", "router = SemanticRouter(\n", " name=\"topic-router\",\n", " vectorizer=HFTextVectorizer(),\n", " routes=[technology, sports, entertainment],\n", " redis_url=\"redis://localhost:6379\",\n", " overwrite=True # Blow away any other routing index with this name\n", ")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "HFTextVectorizer(model='sentence-transformers/all-mpnet-base-v2', dims=768)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "router.vectorizer" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Index Information:\n", "╭──────────────┬────────────────┬──────────────────┬─────────────────┬────────────╮\n", "│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │\n", "├──────────────┼────────────────┼──────────────────┼─────────────────┼────────────┤\n", "│ topic-router │ HASH │ ['topic-router'] │ [] │ 0 │\n", "╰──────────────┴────────────────┴──────────────────┴─────────────────┴────────────╯\n", "Index Fields:\n", "╭────────────┬─────────────┬────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮\n", "│ Name │ Attribute │ Type │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │\n", "├────────────┼─────────────┼────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼────────────────┤\n", "│ route_name │ route_name │ TAG │ SEPARATOR │ , │ │ │ │ │ │ │\n", "│ reference │ reference │ TEXT │ WEIGHT │ 1 │ │ │ │ │ │ │\n", "│ vector │ vector │ VECTOR │ algorithm │ FLAT │ data_type │ FLOAT32 │ dim │ 768 │ distance_metric │ COSINE │\n", "╰────────────┴─────────────┴────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴─────────────────┴────────────────╯\n" ] } ], "source": [ "# look at the index specification created for the semantic router\n", "!rvl index info -i topic-router" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple routing" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RouteMatch(name='technology', distance=0.119614243507)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Query the router with a statement\n", "route_match = router(\"Can you tell me about the latest in artificial intelligence?\")\n", "route_match" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RouteMatch(name=None, distance=None)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Query the router with a statement and return a miss\n", "route_match = router(\"are aliens real?\")\n", "route_match" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RouteMatch(name='sports', distance=0.554210186005)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Toggle the runtime distance threshold\n", "route_match = router(\"Which basketball team will win the NBA finals?\", distance_threshold=0.7)\n", "route_match" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also route a statement to many routes and order them by distance:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[RouteMatch(name='sports', distance=0.758580708504),\n", " RouteMatch(name='entertainment', distance=0.812423825264),\n", " RouteMatch(name='technology', distance=0.88423516353)]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Perform multi-class classification with route_many() -- toggle the max_k and the distance_threshold\n", "route_matches = router.route_many(\"Lebron James\", distance_threshold=1.0, max_k=3)\n", "route_matches" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[RouteMatch(name='sports', distance=0.663253903389),\n", " RouteMatch(name='entertainment', distance=0.712985396385),\n", " RouteMatch(name='technology', distance=0.832674384117)]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Toggle the aggregation method -- note the different distances in the result\n", "from redisvl.extensions.router.schema import DistanceAggregationMethod\n", "\n", "route_matches = router.route_many(\"Lebron James\", aggregation_method=DistanceAggregationMethod.min, distance_threshold=1.0, max_k=3)\n", "route_matches" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the different route match distances. This is because we used the `min` aggregation method instead of the default `avg` approach." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Update the routing config" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from redisvl.extensions.router import RoutingConfig\n", "\n", "router.update_routing_config(\n", " RoutingConfig(distance_threshold=1.0, aggregation_method=DistanceAggregationMethod.min, max_k=3)\n", ")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[RouteMatch(name='sports', distance=0.663253903389),\n", " RouteMatch(name='entertainment', distance=0.712985396385),\n", " RouteMatch(name='technology', distance=0.832674384117)]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "route_matches = router.route_many(\"Lebron James\")\n", "route_matches" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Router serialization" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'name': 'topic-router',\n", " 'routes': [{'name': 'technology',\n", " 'references': ['what are the latest advancements in AI?',\n", " 'tell me about the newest gadgets',\n", " \"what's trending in tech?\"],\n", " 'metadata': {'category': 'tech', 'priority': '1'}},\n", " {'name': 'sports',\n", " 'references': ['who won the game last night?',\n", " 'tell me about the upcoming sports events',\n", " \"what's the latest in the world of sports?\",\n", " 'sports',\n", " 'basketball and football'],\n", " 'metadata': {'category': 'sports', 'priority': '2'}},\n", " {'name': 'entertainment',\n", " 'references': ['what are the top movies right now?',\n", " 'who won the best actor award?',\n", " \"what's new in the entertainment industry?\"],\n", " 'metadata': {'category': 'entertainment', 'priority': '3'}}],\n", " 'vectorizer': {'type': 'hf',\n", " 'model': 'sentence-transformers/all-mpnet-base-v2'},\n", " 'routing_config': {'distance_threshold': 1.0,\n", " 'max_k': 3,\n", " 'aggregation_method': 'min'}}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "router.to_dict()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15:16:28 redisvl.index.index INFO Index already exists, not overwriting.\n" ] } ], "source": [ "router2 = SemanticRouter.from_dict(router.to_dict(), redis_url=\"redis://localhost:6379\")\n", "\n", "assert router2 == router" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "router.to_yaml(\"router.yaml\", overwrite=True)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15:17:42 redisvl.index.index INFO Index already exists, not overwriting.\n" ] } ], "source": [ "router3 = SemanticRouter.from_yaml(\"router.yaml\", redis_url=\"redis://localhost:6379\")\n", "\n", "assert router3 == router2 == router" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean up the router" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Use clear to flush all routes from the index\n", "router.clear()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Use delete to clear the index and remove it completely\n", "router.delete()" ] } ], "metadata": { "kernelspec": { "display_name": "rvl", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }