AutomationFlowsAI & RAG › Build a PDF Search System with Mistral OCR and Weaviate Db

Build a PDF Search System with Mistral OCR and Weaviate Db

ByDietmar @docd on n8n.io

A comprehensive RAG (Retrieval-Augmented Generation) workflow that transforms PDF documents into searchable vector embeddings using advanced AI technologies. PDF Document Processing: Upload and extract text from PDF files using Mistral's OCR capabilities Vector Database Storage:…

Event trigger★★★☆☆ complexityAI-powered13 nodesCohere EmbeddingsDocument Default Data LoaderReranker CohereMcp TriggerWeaviate Vector StoreText Splitter Recursive Character Text SplitterForm TriggerMistral Ai
AI & RAG Trigger: Event Nodes: 13 Complexity: ★★★☆☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #7339 — we link there as the canonical source.

This workflow follows the Documentdefaultdataloader → Form Trigger recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "rV1w47cZn1rsk7MP",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "PDF to Vector RAG System: Mistral OCR, Weaviate Database and MCP Server",
  "tags": [],
  "nodes": [
    {
      "id": "d9e90589-d6b6-4601-bac8-5009b765fa78",
      "name": "Cohere Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsCohere",
      "position": [
        160,
        336
      ],
      "parameters": {
        "modelName": "embed-multilingual-v3.0"
      },
      "typeVersion": 1
    },
    {
      "id": "5e7c6668-64a4-4cc2-b519-ab75f07ecab5",
      "name": "Document Loader",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "position": [
        -144,
        336
      ],
      "parameters": {
        "options": {},
        "textSplittingMode": "custom"
      },
      "typeVersion": 1.1
    },
    {
      "id": "b808993d-a6b9-497f-88b5-271c16abc185",
      "name": "Cohere Reranker",
      "type": "@n8n/n8n-nodes-langchain.rerankerCohere",
      "position": [
        304,
        336
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "117948ee-4be7-4869-b35b-d0c58a66fcd5",
      "name": "MCP Knowledge Server",
      "type": "@n8n/n8n-nodes-langchain.mcpTrigger",
      "position": [
        192,
        -80
      ],
      "parameters": {
        "path": "c74c97f5-0197-45e3-b4dd-f3efbd4bab22",
        "authentication": "headerAuth"
      },
      "typeVersion": 2
    },
    {
      "id": "700afe32-2bcc-4f31-a680-cdce710861e2",
      "name": "Search Knowledge Base",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreWeaviate",
      "position": [
        256,
        128
      ],
      "parameters": {
        "mode": "retrieve-as-tool",
        "options": {},
        "useReranker": true,
        "toolDescription": "Use this tool to search and retrieve information from the knowledge base containing various documents and resources",
        "weaviateCollection": {
          "__rl": true,
          "mode": "list",
          "value": "KnowledgeDocuments",
          "cachedResultName": "KnowledgeDocuments"
        },
        "includeDocumentMetadata": false
      },
      "typeVersion": 1.3
    },
    {
      "id": "4075cf1b-e9f6-44cc-b827-29fa5eb1ee97",
      "name": "Upload Instructions",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -880,
        -16
      ],
      "parameters": {
        "color": 5,
        "width": 688,
        "height": 304,
        "content": "## Manual Document (PDF) Upload Section\nThis section allows users to upload PDF files to the knowledge base. The files will be processed by Mistrals OCR and stored in the vector database for later retrieval and search."
      },
      "typeVersion": 1
    },
    {
      "id": "bf763fa9-03f4-4010-a6a9-63b6bb104fa1",
      "name": "Text Splitter",
      "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter",
      "position": [
        -144,
        480
      ],
      "parameters": {
        "options": {
          "splitCode": "markdown"
        },
        "chunkSize": 600,
        "chunkOverlap": 200
      },
      "typeVersion": 1
    },
    {
      "id": "d1fe1c74-d9bc-4040-8446-47e90365c5f7",
      "name": "Upload PDF",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        -832,
        112
      ],
      "parameters": {
        "options": {
          "ignoreBots": true,
          "buttonLabel": "Upload Document",
          "appendAttribution": true
        },
        "formTitle": "Upload Documents to Knowledge Base",
        "formFields": {
          "values": [
            {
              "fieldType": "file",
              "fieldLabel": "PDF File",
              "multipleFiles": false,
              "requiredField": true,
              "acceptFileTypes": ".pdf"
            }
          ]
        },
        "responseMode": "lastNode",
        "formDescription": "Upload PDF files to the knowledge base for AI-powered search and retrieval"
      },
      "typeVersion": 2.2
    },
    {
      "id": "1f04ecce-0277-4a88-9514-fec6b45ba1cf",
      "name": "Extract Text from PDF",
      "type": "n8n-nodes-base.mistralAi",
      "position": [
        -608,
        112
      ],
      "parameters": {
        "options": {},
        "binaryProperty": "file"
      },
      "retryOnFail": true,
      "typeVersion": 1
    },
    {
      "id": "53c82051-ce9b-432e-b90e-d5eb83483e49",
      "name": "Prepare Document Data",
      "type": "n8n-nodes-base.set",
      "position": [
        -384,
        112
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "518ae17b-b486-4438-8151-f49afb3b68eb",
              "name": "filename",
              "type": "string",
              "value": "={{ $('Upload PDF').item.json.file.filename }}"
            },
            {
              "id": "a574ee4d-6341-4fd5-ac8e-9452eff70aa1",
              "name": "content",
              "type": "string",
              "value": "={{ $json.extractedText }}"
            },
            {
              "id": "metadata-source",
              "name": "source",
              "type": "string",
              "value": "uploaded_pdf"
            },
            {
              "id": "metadata-timestamp",
              "name": "upload_timestamp",
              "type": "string",
              "value": "={{ new Date().toISOString() }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "d465c50a-87e9-4824-83ca-d5662630590c",
      "name": "Store in Vector Database",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreWeaviate",
      "position": [
        -112,
        112
      ],
      "parameters": {
        "mode": "insert",
        "options": {},
        "weaviateCollection": {
          "__rl": true,
          "mode": "list",
          "value": ""
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "5a5e4028-3764-4bb2-8900-df77c0c47bde",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        112,
        304
      ],
      "parameters": {
        "color": 4,
        "width": 320,
        "height": 288,
        "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Embedding and Rerank\nYou can exchange the models, but you **must** use the same model for embedding and retrieval and **no switching** later on"
      },
      "typeVersion": 1
    },
    {
      "id": "f0ecfbb5-e9ce-49fd-a921-672b12b7ef13",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        112,
        -240
      ],
      "parameters": {
        "width": 336,
        "height": 288,
        "content": "## MCP Server Trigger\nYou can call this MCP Server as a tool in your AI Workflow"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "8b3d2e49-d84f-4fca-bdda-298f3da3c15b",
  "connections": {
    "Upload PDF": {
      "main": [
        [
          {
            "node": "Extract Text from PDF",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Text Splitter": {
      "ai_textSplitter": [
        [
          {
            "node": "Document Loader",
            "type": "ai_textSplitter",
            "index": 0
          }
        ]
      ]
    },
    "Cohere Reranker": {
      "ai_reranker": [
        [
          {
            "node": "Search Knowledge Base",
            "type": "ai_reranker",
            "index": 0
          }
        ]
      ]
    },
    "Document Loader": {
      "ai_document": [
        [
          {
            "node": "Store in Vector Database",
            "type": "ai_document",
            "index": 0
          }
        ]
      ]
    },
    "Cohere Embeddings": {
      "ai_embedding": [
        [
          {
            "node": "Search Knowledge Base",
            "type": "ai_embedding",
            "index": 0
          },
          {
            "node": "Store in Vector Database",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "Extract Text from PDF": {
      "main": [
        [
          {
            "node": "Prepare Document Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Prepare Document Data": {
      "main": [
        [
          {
            "node": "Store in Vector Database",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Search Knowledge Base": {
      "ai_tool": [
        [
          {
            "node": "MCP Knowledge Server",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

A comprehensive RAG (Retrieval-Augmented Generation) workflow that transforms PDF documents into searchable vector embeddings using advanced AI technologies. PDF Document Processing: Upload and extract text from PDF files using Mistral's OCR capabilities Vector Database Storage:…

Source: https://n8n.io/workflows/7339/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

Form Trigger, Pinecone Vector Store, OpenAI Embeddings +7
AI & RAG

Your AI workforce is ready. Are you?

Google Sheets Tool, Mcp Trigger, Google Drive +29
AI & RAG

This workflow implements a complete Retrieval-Augmented Generation (RAG) knowledge assistant with built-in document ingestion, conversational AI, and automated analytics using n8n, OpenAI, and Pinecon

Form Trigger, Data Table, Text Splitter Recursive Character Text Splitter +8
AI & RAG

This workflow is an AI-powered multi-agent system built for startup founders and small business owners who want to automate decision-making, accountability, research, and communication, all through Wh

OpenRouter Chat, WhatsApp, Perplexity Tool +14
AI & RAG

Deploy a personal AI assistant that answers recruiter questions about your skills and projects, then automatically emails your CV as a PDF attachment when requested. Upload your portfolio documents (r

Google Drive Trigger, Google Drive, Pinecone Vector Store +11