AutomationFlowsAI & RAG › Evaluate Hybrid Search for Legal Question-answering Using Qdrant & Bm25/mxbai

Evaluate Hybrid Search for Legal Question-answering Using Qdrant & Bm25/mxbai

ByJenny @mrscoopers on n8n.io

This is the second part of "Hybrid Search with Qdrant & n8n, Legal AI."* The first part, "Indexing", covers preparing and uploading the dataset to Qdrant.*

Event trigger★★★★☆ complexity17 nodesHTTP RequestN8N Nodes Qdrant
AI & RAG Trigger: Event Nodes: 17 Complexity: ★★★★☆ Added:

This workflow corresponds to n8n.io template #7946 — we link there as the canonical source.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "h81ddl7uooV3eLBq",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Hybrid Search with Qdrant & n8n, Legal AI: Retrieval",
  "tags": [],
  "nodes": [
    {
      "id": "eb8d4dd7-f40b-4524-a9de-f9ef9eef0eca",
      "name": "Index Dataset from HuggingFace",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -256,
        400
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "78030f66-5331-463f-ad22-9d09f477e3f9",
      "name": "Split Them All Out",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        176,
        400
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "splits"
      },
      "typeVersion": 1
    },
    {
      "id": "e6d1e789-1293-480b-a163-992b0c7a2ae8",
      "name": "Get Dataset Splits",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -32,
        400
      ],
      "parameters": {
        "url": "https://datasets-server.huggingface.co/splits",
        "options": {},
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "dataset",
              "value": "={{ $json.dataset }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "cb58241e-4579-4c5b-bd65-4a20f6cf3698",
      "name": "Divide Per Row",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        816,
        400
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "rows"
      },
      "typeVersion": 1
    },
    {
      "id": "2ee71c28-71fd-4cba-87c7-c9886fb403c7",
      "name": "Keep Test Split",
      "type": "n8n-nodes-base.filter",
      "position": [
        384,
        400
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "52e3d8e2-825f-4e43-9d5f-e275d196b442",
              "operator": {
                "name": "filter.operator.equals",
                "type": "string",
                "operation": "equals"
              },
              "leftValue": "={{ $json.split }}",
              "rightValue": "test"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "484a82fe-93d8-439b-bbb5-e96a4b5d7861",
      "name": "Get Test Queries",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        592,
        400
      ],
      "parameters": {
        "url": "=https://datasets-server.huggingface.co/rows",
        "options": {},
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "dataset",
              "value": "={{ $json.dataset }}"
            },
            {
              "name": "config",
              "value": "={{ $json.config }}"
            },
            {
              "name": "split",
              "value": "={{ $json.split }}"
            },
            {
              "name": "length",
              "value": "=100"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "20f67ae7-6631-4602-aa20-42a382db12ae",
      "name": "Query Points",
      "type": "n8n-nodes-qdrant.qdrant",
      "position": [
        2144,
        416
      ],
      "parameters": {
        "limit": 1,
        "query": "{\n  \"fusion\": \"rrf\"\n}",
        "prefetch": "=[\n  {\n    \"query\": {\n      \"text\": \"{{ $json.question }}\",\n      \"model\": \"mixedbread-ai/mxbai-embed-large-v1\"\n    },\n    \"using\": \"mxbai_large\",\n    \"limit\": 25\n  },\n  {\n    \"query\": {\n      \"text\": \"{{ $json.question }}\",\n      \"model\": \"qdrant/bm25\"\n    },\n    \"using\": \"bm25\",\n    \"limit\": 25\n  }\n]",
        "resource": "search",
        "operation": "queryPoints",
        "collectionName": {
          "__rl": true,
          "mode": "list",
          "value": "legalQA_test",
          "cachedResultName": "legalQA_test"
        },
        "requestOptions": {}
      },
      "credentials": {
        "qdrantApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "445ace25-f900-4bcf-9f7d-9bc1db662867",
      "name": "Merge",
      "type": "n8n-nodes-base.merge",
      "position": [
        2320,
        608
      ],
      "parameters": {
        "mode": "combine",
        "options": {},
        "combineBy": "combineAll"
      },
      "typeVersion": 3.2
    },
    {
      "id": "c631ce99-a672-499f-bbf3-e740ef431884",
      "name": "Loop Over Items",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        1776,
        400
      ],
      "parameters": {
        "options": {
          "reset": false
        }
      },
      "typeVersion": 3
    },
    {
      "id": "c075a745-ed50-459f-87dd-101a559e4523",
      "name": "Keep Questions with Answers in the Dataset",
      "type": "n8n-nodes-base.filter",
      "position": [
        1056,
        400
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "d1120153-1852-42c0-8b0a-084e8c3190d3",
              "operator": {
                "type": "number",
                "operation": "gt"
              },
              "leftValue": "={{ $json.row.answers.length }}",
              "rightValue": 0
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "94f78e9b-f9eb-4179-9844-d6e23bc79751",
      "name": "Keep Questions & IDs",
      "type": "n8n-nodes-base.set",
      "position": [
        1280,
        400
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "961c95d9-c803-404b-b4b6-cb66a8a33928",
              "name": "id_qa",
              "type": "string",
              "value": "={{ $json.row.id }}"
            },
            {
              "id": "0fefba06-4567-479c-9eb5-efbb3e13e743",
              "name": "question",
              "type": "string",
              "value": "={{ $json.row.question }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "08f67e31-ba8f-47fd-bc78-f352a160d4fd",
      "name": "Aggregate Evals",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        2032,
        224
      ],
      "parameters": {
        "options": {},
        "aggregate": "aggregateAllItemData",
        "destinationFieldName": "eval"
      },
      "typeVersion": 1
    },
    {
      "id": "413ade44-d27d-4c49-862f-afb9d4e18bf6",
      "name": "Percentage of isHits in Evals",
      "type": "n8n-nodes-base.set",
      "position": [
        2256,
        224
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "5bca1a50-3e41-4f50-8362-cb7b185b50f6",
              "name": "Hits percentage",
              "type": "number",
              "value": "={{ ($json.eval.filter(item => item.isHit).length * 100) / $json.eval.length}}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "c0840c22-8954-4937-80d9-f32741b81e1e",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -96,
        144
      ],
      "parameters": {
        "color": 5,
        "width": 1520,
        "height": 464,
        "content": "## Get Questions to Eval Retrieval from Hugging Face Dataset (Already Indexed to Qdrant)\n\nFetching questions from a sample Q&A dataset on Hugging Face using the [Dataset Viewer API](https://huggingface.co/docs/dataset-viewer/quick_start).  \n**Dataset:** [LegalQAEval (isaacus)](https://huggingface.co/datasets/isaacus/LegalQAEval)\n\n1. **Retrieve dataset splits**.  \n2. **Get a small subsample of questions from the `test` split**.  \n   To fetch the full split, apply [pagination in HTTP node](https://docs.n8n.io/code/cookbook/http-node/pagination/#enable-pagination), as shown in Part 1.  \n3. **Keep only questions that have a paired text chunk answering them**, so evaluation remains fair.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "742e68ae-0013-4dda-a818-4485ff80a986",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1696,
        -256
      ],
      "parameters": {
        "color": 5,
        "width": 1088,
        "height": 1120,
        "content": "## Check Quality of Simple Hybrid Search on Legal Q&A Dataset\nFor each question in the evaluation set, using the qdrant collection created and indexed in Part 1:\n1. **Perform a Hybrid Search in Qdrant**  \n   - Get 25 results with [**BM25-based keyword retrieval**](https://en.wikipedia.org/wiki/Okapi_BM25) (exact word matches).  \n     - Sparse representations for BM25 are created automatically by Qdrant.  \n   - Get 25 results with [**mxbai-embed-large-v1**](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) semantic search (meaning-based matches).  \n     - Here we use [**Qdrant Cloud Inference**](https://qdrant.tech/documentation/cloud/inference/), so conversion of questions to vectors and searching is handled by the Qdrant node.  \n     - To use an external provider (e.g. OpenAI), see Part 1 for an example on how to adapt this template.  \n   - Fuse both result lists with **Reciprocal Rank Fusion (RRF)**.  \n   - Select the **top-1 result**.  \n2. **Check the top-1 result**  \n   - Verify if the text chunk contains the correct answer. This is done by checking if the question ID is present in the list of related to the text chunk question IDs (created in Part 1).  \n3. **Aggregate results**  \n   - Calculate the **hits@1**: percentage of evaluation questions where the top-1 retrieved chunk contained the answer.  \n\n- If results are good \u2192 you can reuse the **Qdrant Query Points** node as a tool for an **agentic legal AI RAG** system.  \n- If results are poor \u2192 don\u2019t worry. This is the *simplest* hybrid query setup. You can improve quality with [various tooling for hybrid search in Qdrant](https://qdrant.tech/documentation/concepts/hybrid-queries/):  \n  - Reranking  \n  - Score boosting  \n  - Tuning vector index parameters  \n  - \u2026  \n\n\nExperiment! \ud83d\ude42\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "d4a32298-02ef-4f9e-b22d-3f30e9b74eb2",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1344,
        -128
      ],
      "parameters": {
        "width": 1008,
        "height": 960,
        "content": "## Evaluate Hybrid Search on Legal Dataset\n*This is the second part of **\"Hybrid Search with Qdrant & n8n, Legal AI.\"**\nThe first part, **\"Indexing,\"** covers preparing and uploading the dataset to Qdrant.*\n\n### Overview\nThis pipeline demonstrates how to perform **Hybrid Search** on a [Qdrant collection](https://qdrant.tech/documentation/concepts/collections/#collections) using `question`s and `text` chunks (containing answers) from the  \n[LegalQAEval dataset (isaacus)](https://huggingface.co/datasets/isaacus/LegalQAEval).\n\nOn a small subset of questions, it shows:  \n- How to set up hybrid retrieval in Qdrant with:  \n  - [BM25](https://en.wikipedia.org/wiki/Okapi_BM25)-based keyword retrieval;\n  - [mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) semantic retrieval;  \n  - **Reciprocal Rank Fusion (RRF)**, a simple zero-shot fusion of the two searches;\n- How to run a basic evaluation:  \n  - Calculate **hits@1** \u2014 the percentage of evaluation questions where the top-1 retrieved text chunk contains the correct answer  \n\n\nAfter running this pipeline, you will have a quality estimate of a simple hybrid retrieval setup.  \nFrom there, you can reuse Qdrant\u2019s **Query Points** node to build a **legal RAG chatbot**.  \n\n### Embedding Inference\n- By default, this pipeline uses [**Qdrant Cloud Inference**](https://qdrant.tech/documentation/cloud/inference/) to convert questions to embeddings.  \n- You can also use an **external embedding provider** (e.g. OpenAI).  \n  - In that case, minimally update the pipeline, similar to the adjustments showed in **Part 1: Indexing**.  \n\n### Prerequisites\n- **Completed Part 1 pipeline**, *\"Hybrid Search with Qdrant & n8n, Legal AI: Indexing\"*, and the collection created in it;\n- All the requirements of **Part 1 pipeline**;\n\n### Hybrid Search\nThe example here is a **basic hybrid query**. You can extend/enhance it with:\n- Reranking strategies;  \n- Different fusion techniques;\n- Score boosting based on metadata;\n- ...  \n\nMore details: [Hybrid Queries in Qdrant](https://qdrant.tech/documentation/concepts/hybrid-queries/).  \n\n#### P.S.\n- To ask retrieval in Qdrant-related questions, join the [Qdrant Discord](https://discord.gg/ArVgNHV6).  \n- Star [Qdrant n8n community node repo](https://github.com/qdrant/n8n-nodes-qdrant) <3\n"
      },
      "typeVersion": 1
    },
    {
      "id": "56a5efd8-ed3f-46f7-85c8-966536f24a13",
      "name": "isHit = If we Found the Correct Answer",
      "type": "n8n-nodes-base.set",
      "position": [
        2512,
        608
      ],
      "parameters": {
        "include": "selected",
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "80089820-cc55-4b74-966e-b50a3f4b6e36",
              "name": "isHit",
              "type": "boolean",
              "value": "={{ $json.result.points[0].payload.ids_qa.includes($json.id_qa) }}"
            }
          ]
        },
        "includeFields": "id_qa,question",
        "includeOtherFields": true
      },
      "typeVersion": 3.4
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "20b38566-7985-4139-98a3-6b275e85a9cb",
  "connections": {
    "Merge": {
      "main": [
        [
          {
            "node": "isHit = If we Found the Correct Answer",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Query Points": {
      "main": [
        [
          {
            "node": "Merge",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Divide Per Row": {
      "main": [
        [
          {
            "node": "Keep Questions with Answers in the Dataset",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate Evals": {
      "main": [
        [
          {
            "node": "Percentage of isHits in Evals",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Keep Test Split": {
      "main": [
        [
          {
            "node": "Get Test Queries",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over Items": {
      "main": [
        [
          {
            "node": "Aggregate Evals",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Merge",
            "type": "main",
            "index": 1
          },
          {
            "node": "Query Points",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get Test Queries": {
      "main": [
        [
          {
            "node": "Divide Per Row",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get Dataset Splits": {
      "main": [
        [
          {
            "node": "Split Them All Out",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Them All Out": {
      "main": [
        [
          {
            "node": "Keep Test Split",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Keep Questions & IDs": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Index Dataset from HuggingFace": {
      "main": [
        [
          {
            "node": "Get Dataset Splits",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "isHit = If we Found the Correct Answer": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Keep Questions with Answers in the Dataset": {
      "main": [
        [
          {
            "node": "Keep Questions & IDs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This is the second part of "Hybrid Search with Qdrant & n8n, Legal AI."* The first part, "Indexing", covers preparing and uploading the dataset to Qdrant.*

Source: https://n8n.io/workflows/7946/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This pipeline is the first part of "Hybrid Search with Qdrant & n8n, Legal AI"*. The second part, "Hybrid Search with Qdrant & n8n, Legal AI: Retrieval", covers retrieval and simple evaluation.*

N8N Nodes Qdrant, HTTP Request
AI & RAG

The benefits being (1) the vision model doesn't need to keep all document scans in context (expensive) and (2) ability to query on graphical content such as charts, graphs and tables. Page extracts fr

HTTP Request, N8N Nodes Qdrant, Chat Trigger +7
AI & RAG

Featherless.ai is an inference provider with a different pricing model - they charge a flat subscription fee (starting from $10) and allows for unlimited token usage instead. If you're typically spend

N8N Nodes Featherless, HTTP Request, N8N Nodes Qdrant +6
AI & RAG

Self-Hosted

Google Drive Trigger, Google Drive, N8N Nodes Qdrant +4
AI & RAG

[1/3 - anomaly detection] [1/2 - KNN classification] Batch upload dataset to Qdrant (crops dataset). Uses manualTrigger, googleCloudStorage, httpRequest, stickyNote. Event-driven trigger; 25 nodes.

Google Cloud Storage, HTTP Request