AutomationFlowsAI & RAG › Build a RAG Knowledge Base From Pdfs with Gemini, Supabase and Google Sheets

Build a RAG Knowledge Base From Pdfs with Gemini, Supabase and Google Sheets

BySalman Mehboob @salmanmehboob on n8n.io

This workflow ingests educational PDF URLs from Google Sheets, extracts and chunks their text, generates embeddings with Google Gemini, and stores them in a Supabase pgvector table for retrieval, while also exposing a public chat webhook that answers questions using Gemini and…

Cron / scheduled trigger★★★★★ complexityAI-powered36 nodesGoogle SheetsHTTP RequestSupabase Vector StoreDocument Default Data LoaderText Splitter Recursive Character Text SplitterGoogle DriveGoogle Gemini EmbeddingsAgent
AI & RAG Trigger: Cron / scheduled Nodes: 36 Complexity: ★★★★★ AI nodes: yes Added:

This workflow corresponds to n8n.io template #16258 — we link there as the canonical source.

This workflow follows the Agent → Chat Trigger recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "bFLpWsjfondxPPou",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Build a RAG Knowledge Base from Educational PDFs with Gemini and Supabase",
  "tags": [],
  "nodes": [
    {
      "id": "0c63efd3-f17f-4c35-9a9d-af56076117e4",
      "name": "Extract from File",
      "type": "n8n-nodes-base.extractFromFile",
      "position": [
        1968,
        400
      ],
      "parameters": {
        "options": {},
        "operation": "pdf",
        "binaryPropertyName": "pdfBinary"
      },
      "typeVersion": 1.1
    },
    {
      "id": "602feb5e-d02a-4b83-aa95-c90e8855312a",
      "name": "Code in JavaScript",
      "type": "n8n-nodes-base.code",
      "position": [
        2224,
        464
      ],
      "parameters": {
        "jsCode": "const allItems = $input.all();\nconst results = [];\n\nconst source_url = $('Loop Over Items').first().json.source_url || '';\nconst isSeraj = source_url.includes('seraj-uae.com');\n\nlet fileId = '';\nlet title = '';\n\nif (isSeraj) {\n  fileId = $('Extract PDF URL + Title').first().json.fileId || '';\n  title = $('Extract PDF URL + Title').first().json.title || '';\n} else {\n  fileId = $('Extract Drive File ID + Title').first().json.driveFileId || '';\n  title = $('Extract Drive File ID + Title').first().json.title || '';\n}\n\nfor (const item of allItems) {\n  const raw = item.json.text || '';\n\n  const cleaned = raw\n    .replace(/\\n([^\\n])/g, '$1')\n    .replace(/\\n+/g, ' ')\n    .replace(/\\s+/g, ' ')\n    .trim();\n\n  const meaningful = cleaned.replace(/[\\s.\u060c,\u061f!]/g, '');\n  if (meaningful.length < 50) continue;\n\n  results.push({\n    json: {\n      text: cleaned,\n      fileId: fileId,\n      title: title,\n      source_url: source_url\n    }\n  });\n}\n\nreturn results;"
      },
      "typeVersion": 2,
      "alwaysOutputData": true
    },
    {
      "id": "856fc2a7-1d80-4b37-a4b3-8a88dd8acc83",
      "name": "Get row(s) in sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        -688,
        480
      ],
      "parameters": {
        "options": {},
        "filtersUI": {
          "values": []
        },
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1ZF-9eOgWA0PMX_LQ0uNneUuoJ6Qm1sz7EOwRUqxkBeY",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1ZF-9eOgWA0PMX_LQ0uNneUuoJ6Qm1sz7EOwRUqxkBeY/edit?usp=drivesdk",
          "cachedResultName": "Serja-UAE"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "d08602f4-519f-4c5a-a4bd-da3ff741fd2a",
      "name": "Filter",
      "type": "n8n-nodes-base.filter",
      "position": [
        -464,
        480
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 3,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "7b2f824a-f1ef-439f-a9c0-dbc1fcc049ed",
              "operator": {
                "type": "string",
                "operation": "empty",
                "singleValue": true
              },
              "leftValue": "={{ $json.status }}",
              "rightValue": ""
            }
          ]
        }
      },
      "typeVersion": 2.3
    },
    {
      "id": "692dd884-b543-44f0-8d10-0d5e621b6c4f",
      "name": "All Done",
      "type": "n8n-nodes-base.set",
      "position": [
        0,
        -224
      ],
      "parameters": {
        "mode": "raw",
        "options": {},
        "jsonOutput": "{ \"status\": \"All files embedded\" }"
      },
      "typeVersion": 3.4
    },
    {
      "id": "1bef5473-f6bb-4613-941c-2e1cd6b5d23e",
      "name": "Seraj or Drive?",
      "type": "n8n-nodes-base.if",
      "position": [
        144,
        480
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "operator": {
                "type": "string",
                "operation": "contains"
              },
              "leftValue": "={{ $json.source_url }}",
              "rightValue": "seraj-uae.com"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "cc743ecf-c435-4a0d-923a-893f4de13f4b",
      "name": "Fetch File Page",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        432,
        288
      ],
      "parameters": {
        "url": "={{ $json.source_url }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "text"
            }
          }
        }
      },
      "retryOnFail": true,
      "typeVersion": 4.4
    },
    {
      "id": "5c8e14c2-6413-42ce-871e-978dc9f1c47a",
      "name": "Extract PDF URL + Title",
      "type": "n8n-nodes-base.code",
      "position": [
        672,
        288
      ],
      "parameters": {
        "jsCode": "const allItems = $input.all();\nconst html = allItems[0].json.data || '';\nconst source_url = $('Loop Over Items').item.json.source_url || '';\nconst fileId = (source_url.match(/\\/file\\/(\\d+)\\//) || [])[1] || '';\n\nconst titleMatch = html.match(/<title>([^<]+)<\\/title>/);\nconst title = titleMatch\n  ? titleMatch[1].replace(/\\s*-\\s*\u0633\u0631\u0627\u062c\\s*$/, '').trim()\n  : 'File ' + fileId;\n\n// Pattern 1 \u2014 njah.online (newer files)\nconst njahMatch = html.match(/https:\\/\\/njah\\.online\\/download\\/[a-f0-9]+\\.pdf\\/\\?read=1/);\nif (njahMatch) {\n  const pdfUrl = njahMatch[0].replace('?read=1', '?direct=1');\n  return [{ json: { fileId, source_url, pdfUrl, driveFileId: null, title } }];\n}\n\n// Pattern 2 \u2014 Google Drive iframe (older files)\nconst driveMatch = html.match(/https:\\/\\/drive\\.google\\.com\\/file\\/d\\/([a-zA-Z0-9_-]+)\\/preview/);\nif (driveMatch) {\n  const driveFileId = driveMatch[1];\n  return [{ json: { fileId, source_url, pdfUrl: null, driveFileId, title } }];\n}\n\nthrow new Error('No PDF URL found. fileId: ' + fileId + ' | HTML length: ' + html.length);"
      },
      "retryOnFail": true,
      "typeVersion": 2
    },
    {
      "id": "9347da7f-d2a5-4517-b384-e415d1b2cde9",
      "name": "Download PDF (Seraj)",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1264,
        256
      ],
      "parameters": {
        "url": "={{ $json.pdfUrl }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "file",
              "outputPropertyName": "pdfBinary"
            }
          }
        }
      },
      "retryOnFail": true,
      "typeVersion": 4.4
    },
    {
      "id": "c7a6fe12-17b7-49cf-b94f-a7cbff001071",
      "name": "Insert to Supabase pgvector",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreSupabase",
      "position": [
        2784,
        64
      ],
      "parameters": {
        "mode": "insert",
        "options": {
          "queryName": "match_seraj_documents"
        },
        "tableName": {
          "__rl": true,
          "mode": "id",
          "value": "seraj_documents"
        },
        "embeddingBatchSize": 1
      },
      "credentials": {
        "supabaseApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "686c8daa-bb9a-4d3b-8a8f-46ca1d0203e5",
      "name": "PDF Binary Loader",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "position": [
        2880,
        288
      ],
      "parameters": {
        "options": {
          "metadata": {
            "metadataValues": [
              {
                "name": "file_id",
                "value": "={{ $json.fileId || $json.driveFileId }}"
              },
              {
                "name": "title",
                "value": "={{ $json.title || $json.title }}"
              },
              {
                "name": "source_url",
                "value": "={{ $('Loop Over Items').item.json.source_url }}"
              }
            ]
          },
          "splitPages": false
        },
        "jsonData": "={{ $json.text }}",
        "jsonMode": "expressionData",
        "textSplittingMode": "custom"
      },
      "typeVersion": 1.1
    },
    {
      "id": "9501a7b4-8156-4053-94e7-bcd3ab44e104",
      "name": "Recursive Text Splitter",
      "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter",
      "position": [
        2960,
        496
      ],
      "parameters": {
        "options": {},
        "chunkOverlap": 180
      },
      "typeVersion": 1
    },
    {
      "id": "4fa5abd7-aa63-4241-80a7-f8820b3ca83b",
      "name": "Update row in sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3248,
        544
      ],
      "parameters": {
        "columns": {
          "value": {
            "status": "Embedded",
            "row_number": "={{ $('Loop Over Items').item.json.row_number }}"
          },
          "schema": [
            {
              "id": "source_url",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "source_url",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "status",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "status",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "row_number",
              "type": "number",
              "display": true,
              "removed": false,
              "readOnly": true,
              "required": false,
              "displayName": "row_number",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "row_number"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "update",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1ZF-9eOgWA0PMX_LQ0uNneUuoJ6Qm1sz7EOwRUqxkBeY",
          "cachedResultName": "Serja-UAE"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "executeOnce": true,
      "typeVersion": 4.7
    },
    {
      "id": "623bdd65-5a32-48fd-a84d-be12956109c1",
      "name": "Loop Over Items",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        -240,
        480
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 3
    },
    {
      "id": "af490cd6-a4a4-4bcf-b8a3-82a5c30602d2",
      "name": "Extract Drive File ID + Title",
      "type": "n8n-nodes-base.code",
      "position": [
        1216,
        544
      ],
      "parameters": {
        "jsCode": "const source_url = $('Loop Over Items').item.json.source_url || '';\nconst title = $('Loop Over Items').item.json.title || '';\n\nconst driveMatch = source_url.match(/\\/file\\/d\\/([a-zA-Z0-9_-]+)/);\nif (!driveMatch) {\n  throw new Error('Could not extract Drive file ID from: ' + source_url);\n}\nconst driveFileId = driveMatch[1];\n\nreturn [{ json: { driveFileId, source_url, title: title || 'Drive File ' + driveFileId } }];"
      },
      "typeVersion": 2
    },
    {
      "id": "58af97d7-3690-4c6d-a2f1-0c60b6ebc378",
      "name": "Download PDF (Drive)",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        1504,
        512
      ],
      "parameters": {
        "fileId": {
          "__rl": true,
          "mode": "id",
          "value": "={{ $json.driveFileId }}"
        },
        "options": {
          "binaryPropertyName": "pdfBinary"
        },
        "operation": "download"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "d0a36089-ab8b-4193-b2f4-9bee0c83244d",
      "name": "Drive Link?",
      "type": "n8n-nodes-base.if",
      "position": [
        880,
        288
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 3,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "ee4b01b4-495f-4689-981b-dc70af0f04a6",
              "operator": {
                "type": "string",
                "operation": "notEmpty",
                "singleValue": true
              },
              "leftValue": "={{ $json.driveFileId }}",
              "rightValue": ""
            }
          ]
        }
      },
      "typeVersion": 2.3
    },
    {
      "id": "f53cea0e-a9a3-43ee-93cb-af2acb62c594",
      "name": "Wait",
      "type": "n8n-nodes-base.wait",
      "position": [
        1664,
        368
      ],
      "parameters": {},
      "typeVersion": 1.1
    },
    {
      "id": "04f1c145-4e4e-4eb5-839e-82c1fdd6c063",
      "name": "Embeddings Google Gemini",
      "type": "@n8n/n8n-nodes-langchain.embeddingsGoogleGemini",
      "position": [
        2752,
        288
      ],
      "parameters": {},
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "42523d39-2124-4bf8-907f-855aaeb86403",
      "name": "text pdf?",
      "type": "n8n-nodes-base.if",
      "position": [
        2432,
        448
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 3,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "76f73fac-59bc-400b-a3f9-41dc96c404c2",
              "operator": {
                "type": "string",
                "operation": "exists",
                "singleValue": true
              },
              "leftValue": "={{ $json.text }}",
              "rightValue": ""
            }
          ]
        }
      },
      "typeVersion": 2.3
    },
    {
      "id": "bea0deb9-d5a8-4628-acfb-858568ef12c0",
      "name": "Update row in sheet1",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3200,
        848
      ],
      "parameters": {
        "columns": {
          "value": {
            "status": "Not Text Based PDF",
            "row_number": "={{ $('Loop Over Items').item.json.row_number }}"
          },
          "schema": [
            {
              "id": "source_url",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "source_url",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "status",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "status",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "row_number",
              "type": "number",
              "display": true,
              "removed": false,
              "readOnly": true,
              "required": false,
              "displayName": "row_number",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "row_number"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "update",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1ZF-9eOgWA0PMX_LQ0uNneUuoJ6Qm1sz7EOwRUqxkBeY",
          "cachedResultName": "Serja-UAE"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "executeOnce": true,
      "typeVersion": 4.7
    },
    {
      "id": "6130f338-25c6-4aba-87a7-e06304d55489",
      "name": "Schedule Trigger",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        -912,
        480
      ],
      "parameters": {
        "rule": {
          "interval": [
            {
              "field": "hours"
            }
          ]
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "6bf5dd15-89e4-49b2-a603-2b9e71f292d2",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1840,
        160
      ],
      "parameters": {
        "width": 832,
        "height": 560,
        "content": "# \ud83e\udde0 RAG Ingestion + AI Q&A Agent for Educational Platforms\n\n## What this workflow does\nTurns a library of educational PDFs hosted on any website or Google Drive into a \nfully searchable AI knowledge base \u2014 then lets students query it in plain language \nvia a chat widget on your site.\n\nNo manual uploads. No copy-pasting. Just URLs in a Google Sheet and a running pipeline.\n\n## How it works\n1. Reads unprocessed PDF URLs from a Google Sheet queue every hour\n2. Scrapes each page to detect whether the PDF is on the host website or Google Drive\n3. Downloads the PDF and validates it is text-based (not a scanned image)\n4. Chunks the text, embeds it with Google Gemini, and stores vectors in Supabase pgvector\n5. The RAG Agent answers student questions by retrieving the most relevant chunks and \n   generating a precise answer with Gemini Flash\n\n## Prerequisites\n- [ ] Google Sheet with columns: `source_url`, `status`, `row_number`\n- [ ] Supabase project with pgvector enabled (SQL in the Embedding sticky)\n- [ ] Google Gemini API key\n- [ ] Google Sheets + Google Drive OAuth2 credentials"
      },
      "typeVersion": 1
    },
    {
      "id": "d3174307-914a-4f3a-b6d0-6910a2c28a3e",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -976,
        208
      ],
      "parameters": {
        "color": 7,
        "width": 864,
        "height": 528,
        "content": "## \u23f0 Schedule & queue\n\nRuns every hour. Reads all rows from the Google Sheet and filters to only rows where `status` is empty (not yet processed). Loops through them one at a time.\n\n> \u2699\ufe0f **Change here:** Open \"Get row(s) in sheet\" and update the Google Sheet ID to your own sheet.\n> \u2699\ufe0f Change the Schedule Trigger interval if you want to run more or less frequently."
      },
      "typeVersion": 1
    },
    {
      "id": "71b90a65-1f68-453d-a51c-131430d8630d",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        48,
        0
      ],
      "parameters": {
        "color": 7,
        "width": 1040,
        "height": 688,
        "content": "## \ud83d\udd0d Source detection\n\nChecks the URL to decide the download path:\n- **Seraj website URLs** \u2192 fetches the HTML page, parses it for either a direct PDF link (njah.online) or a Google Drive embed iframe\n- **Google Drive URLs** \u2192 goes directly to the Drive download path\n\nThe \"Extract PDF URL + Title\" node also reads the page `<title>` tag to get the document name automatically."
      },
      "typeVersion": 1
    },
    {
      "id": "9011c38e-726c-45ed-a5a8-45d044fdd405",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1120,
        0
      ],
      "parameters": {
        "color": 7,
        "width": 752,
        "height": 688,
        "content": "## \u2b07\ufe0f PDF download\n\nDownloads the PDF as binary data. A short Wait node is included between download and extraction to avoid rate-limiting issues with back-to-back requests.\n\n> \u2699\ufe0f If you are hitting rate limits on the PDF host, increase the Wait node duration."
      },
      "typeVersion": 1
    },
    {
      "id": "63ca8aea-9891-4e8e-865f-9692a3c5a1a0",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1904,
        16
      ],
      "parameters": {
        "color": 7,
        "width": 752,
        "height": 688,
        "content": "## \ud83d\udcc4 Text extraction & validation\n\nExtracts raw text from the PDF binary. The JavaScript node cleans the text (collapses line breaks, removes whitespace) and attaches `fileId`, `title`, and `source_url` as metadata.\n\nThe \"Is Text Based PDF?\" node then checks whether any meaningful text was extracted. Scanned/image-only PDFs produce no text and are routed to the \"Not Text Based\" branch instead of being embedded.\n\n> \u2139\ufe0f A PDF is considered meaningful only if it contains at least 50 non-whitespace characters after cleaning."
      },
      "typeVersion": 1
    },
    {
      "id": "90db12fe-f633-458b-affe-753c446a65b1",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2704,
        -880
      ],
      "parameters": {
        "color": 7,
        "width": 752,
        "height": 1616,
        "content": "## \ud83e\udde0 Embedding & storage\n\nSplits the cleaned text into overlapping chunks (180-character overlap), generates vector embeddings using Google Gemini, and inserts them into the `seraj_documents` table in Supabase with metadata: `file_id`, `title`, `source_url`.\n\nAfter successful insertion, updates the Google Sheet row status to **\"Embedded\"**.\n\n> \u2699\ufe0f **Supabase setup \u2014 run this once in your Supabase SQL editor:**\n\n```sql\n-- 1. Enable pgvector\ncreate extension if not exists vector;\n\n-- 2. Create the table (3072 dims for gemini-embedding-2)\ncreate table seraj_documents (\n  id bigserial primary key,\n  content text,\n  metadata jsonb,\n  embedding vector(3072)\n);\n\n-- 3. Match function (alias-fixed, no ambiguity error)\ncreate or replace function match_seraj_documents (\n  query_embedding vector(3072),\n  match_count int default 5,\n  filter jsonb default '{}'\n)\nreturns table (\n  id bigint,\n  content text,\n  metadata jsonb,\n  similarity float\n)\nlanguage plpgsql\nas $$\nbegin\n  return query\n  select\n    d.id,\n    d.content,\n    d.metadata,\n    1 - (d.embedding <=> query_embedding) as similarity\n  from seraj_documents d\n  where d.metadata @> filter\n  order by d.embedding <=> query_embedding\n  limit match_count;\nend;\n$$;\n\n-- 4. Index for fast cosine search\ncreate index on seraj_documents using hnsw (embedding vector_cosine_ops);\n```\n\n> \u2139\ufe0f The 3072 dimension size matches `gemini-embedding-2`. If you switch embedding models, update the vector size in both the table definition and the function signature."
      },
      "typeVersion": 1
    },
    {
      "id": "054d815c-d163-4efa-8fe3-069fa4664a68",
      "name": "Sticky Note6",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2976,
        720
      ],
      "parameters": {
        "color": 7,
        "width": 464,
        "height": 256,
        "content": "## \ud83d\udeab Non-text PDF handler\n\nIf the PDF produced no extractable text (scanned/image-only), this branch marks the row in the Google Sheet as **\"Not Text Based PDF\"** so it is skipped on future runs and can be reviewed manually."
      },
      "typeVersion": 1
    },
    {
      "id": "67c4187a-0c94-4b40-aee6-20c0263cb36c",
      "name": "Sticky Note7",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1792,
        1008
      ],
      "parameters": {
        "width": 1568,
        "height": 912,
        "content": "## \ud83d\udcac RAG Q&A agent \u2014 student-facing chat\n\nReceives questions from the chat widget embedded on your website. \nFor each question the agent:\n\n1. Embeds the query using Gemini embeddings (same model as ingestion \u2014 \n   critical for vector space alignment)\n2. Retrieves the top 5 most semantically relevant chunks from Supabase\n3. Generates a structured, direct answer using Gemini Flash with no \n   hallucination beyond what the knowledge base contains\n\nSupports multiple questions in a single message (numbered list input). \nReturns answers only \u2014 no reasoning steps, no filler text.\n\n> \u2699\ufe0f Add your Gemini API key to both \"Gemini Flash (RAG)\" and \n>    \"Gemini Embeddings (Query)\" nodes.\n> \u2699\ufe0f Table name (`seraj_documents`) and function name \n>    (`match_seraj_documents`) must match the SQL you ran in the \n>    ingestion pipeline \u2014 do not rename them independently.\n> \u2699\ufe0f Activate this workflow to get the public chat webhook URL, \n>    then embed it in your website using the n8n chat widget script tag."
      },
      "typeVersion": 1
    },
    {
      "id": "19c45a3c-eefe-4e28-bdc7-98fd2e4df70b",
      "name": "SerajUAE RAG Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        -896,
        1280
      ],
      "parameters": {
        "text": "={{ $json.chatInput }}",
        "options": {
          "maxIterations": 10,
          "systemMessage": "# \ud83e\udde0 AI ROLE\nYou are an advanced educational AI assistant inside an n8n AI Agent workflow for the SerajUAE platform.\n\nYour main task is to solve MULTIPLE educational questions provided in a single input and return structured answers.\n\n---\n\n# \ud83d\udd27 TOOL USAGE POLICY (IMPORTANT)\n\n- You MUST use the SerajUAE Knowledge Base tool FIRST for any academic or curriculum-related question.\n- Use retrieved context ONLY if it is relevant to the current question.\n- If no relevant context is found, use your own knowledge.\n- NEVER mention tool usage in the final output.\n\n---\n\n# \ud83d\udce5 INPUT FORMAT\n\nYou will receive multiple questions in a single text block:\n\nQuestion 1:\n...\n\nQuestion 2:\n...\n\nEach question may include:\n- MCQ options\n- math expressions\n- short answers\n- true/false questions\n\nYou MUST process each question separately.\n\n---\n\n# \u2699\ufe0f PROCESSING RULES\n\n- Split input into individual questions.\n- Solve each question independently.\n- Keep original order.\n- Do NOT skip any question.\n\n---\n\n# \ud83d\udce4 OUTPUT FORMAT (STRICT)\n\nReturn ONLY in this format:\n\nQuestion 1:\n<original question>\n\nAnswer:\n<final answer only>\n\nQuestion 2:\n<original question>\n\nAnswer:\n<final answer only>\n\n\ud83d\udeab No extra text is allowed.\n\n---\n\n# \ud83d\udccc ANSWER RULES BY TYPE\n\n## MCQ\n- Return ONLY correct option text (or letter + text if available)\n\n## True / False\nReturn only:\nTrue  \nor  \nFalse  \n\n## Fill in the blanks\nReturn ONLY missing word or value\n\n## Math questions\nReturn ONLY final result or equation\n\n## Short answer\nReturn direct concise answer only\n\n## Translation\nReturn only translated text\n\n## Graph/image questions\nReturn best possible final answer only\n\n---\n\n# \ud83d\udeab STRICT RULES\n\n- Do NOT explain answers\n- Do NOT show reasoning\n- Do NOT add introductions or conclusions\n- Do NOT output JSON or markdown formatting in answers\n- Do NOT say \u201cAnswer:\u201d\n- Do NOT say \u201cHere is the solution\u201d\n- NEVER say \u201cI don\u2019t know\u201d \u2014 always attempt best possible answer"
        },
        "promptType": "define"
      },
      "retryOnFail": true,
      "typeVersion": 3.1
    },
    {
      "id": "91664981-a7e9-486f-89d5-218c0773e656",
      "name": "Gemini Flash (RAG)",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        -896,
        1520
      ],
      "parameters": {
        "options": {
          "temperature": 0.3
        },
        "modelName": "models/gemini-2.5-flash"
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.1
    },
    {
      "id": "6f390c0c-d5ff-4b9d-8e5e-d5d7feccaf11",
      "name": "SerajUAE Knowledge Base",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreSupabase",
      "position": [
        -736,
        1552
      ],
      "parameters": {
        "mode": "retrieve-as-tool",
        "topK": 5,
        "options": {
          "queryName": "match_seraj_documents"
        },
        "tableName": {
          "__rl": true,
          "mode": "id",
          "value": "seraj_documents"
        },
        "toolDescription": "Search the SerajUAE educational materials database. Use this tool to find UAE curriculum documents, worksheets, tests, and study guides in Arabic and English. Always use this tool before answering any question about school subjects, grades, or curriculum."
      },
      "credentials": {
        "supabaseApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "81468aa9-a890-4607-aa93-e1d080f5e5b8",
      "name": "Gemini Embeddings (Query)",
      "type": "@n8n/n8n-nodes-langchain.embeddingsGoogleGemini",
      "position": [
        -640,
        1776
      ],
      "parameters": {},
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "retryOnFail": true,
      "typeVersion": 1
    },
    {
      "id": "e10a993b-f12b-4de3-ae35-1ca46b6e37ef",
      "name": "When chat message received",
      "type": "@n8n/n8n-nodes-langchain.chatTrigger",
      "position": [
        -1168,
        1296
      ],
      "parameters": {
        "mode": "webhook",
        "public": true,
        "options": {
          "responseMode": "lastNode",
          "allowedOrigins": "*"
        }
      },
      "typeVersion": 1.4
    },
    {
      "id": "5f63847d-00c7-4ecd-84e0-42362c38de03",
      "name": "Sticky Note8",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1968,
        -928
      ],
      "parameters": {
        "color": "#0F0A0A",
        "width": 5600,
        "height": 2944,
        "content": "# Build a RAG Knowledge Base from Educational PDFs with Gemini and Supabase"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "binaryMode": "separate",
    "executionOrder": "v1"
  },
  "versionId": "6d22af15-8430-4961-bb17-36cf4c3d1e94",
  "connections": {
    "Wait": {
      "main": [
        [
          {
            "node": "Extract from File",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "text pdf?": {
      "main": [
        [
          {
            "node": "Insert to Supabase pgvector",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Update row in sheet1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Drive Link?": {
      "main": [
        [
          {
            "node": "Download PDF (Drive)",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Download PDF (Seraj)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch File Page": {
      "main": [
        [
          {
            "node": "Extract PDF URL + Title",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over Items": {
      "main": [
        [
          {
            "node": "All Done",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Seraj or Drive?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Seraj or Drive?": {
      "main": [
        [
          {
            "node": "Fetch File Page",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Extract Drive File ID + Title",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "Get row(s) in sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract from File": {
      "main": [
        [
          {
            "node": "Code in JavaScript",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "PDF Binary Loader": {
      "ai_document": [
        [
          {
            "node": "Insert to Supabase pgvector",
            "type": "ai_document",
            "index": 0
          }
        ]
      ]
    },
    "Code in JavaScript": {
      "main": [
        [
          {
            "node": "text pdf?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Gemini Flash (RAG)": {
      "ai_languageModel": [
        [
          {
            "node": "SerajUAE RAG Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Get row(s) in sheet": {
      "main": [
        [
          {
            "node": "Filter",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Update row in sheet": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Download PDF (Drive)": {
      "main": [
        [
          {
            "node": "Wait",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Download PDF (Seraj)": {
      "main": [
        [
          {
            "node": "Wait",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Update row in sheet1": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract PDF URL + Title": {
      "main": [
        [
          {
            "node": "Drive Link?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Recursive Text Splitter": {
      "ai_textSplitter": [
        [
          {
            "node": "PDF Binary Loader",
            "type": "ai_textSplitter",
            "index": 0
          }
        ]
      ]
    },
    "SerajUAE Knowledge Base": {
      "ai_tool": [
        [
          {
            "node": "SerajUAE RAG Agent",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    },
    "Embeddings Google Gemini": {
      "ai_embedding": [
        [
          {
            "node": "Insert to Supabase pgvector",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "Gemini Embeddings (Query)": {
      "ai_embedding": [
        [
          {
            "node": "SerajUAE Knowledge Base",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "When chat message received": {
      "main": [
        [
          {
            "node": "SerajUAE RAG Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Insert to Supabase pgvector": {
      "main": [
        [
          {
            "node": "Update row in sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Drive File ID + Title": {
      "main": [
        [
          {
            "node": "Download PDF (Drive)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow ingests educational PDF URLs from Google Sheets, extracts and chunks their text, generates embeddings with Google Gemini, and stores them in a Supabase pgvector table for retrieval, while also exposing a public chat webhook that answers questions using Gemini and…

Source: https://n8n.io/workflows/16258/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

My Workflow. Uses outputParserStructured, httpRequest, lmChatGoogleGemini, chainLlm. Scheduled trigger; 82 nodes.

Output Parser Structured, HTTP Request, Google Gemini Chat +15
AI & RAG

WooriFisa 최종. Uses memoryMongoDbChat, agent, httpRequest, documentDefaultDataLoader. Scheduled trigger; 68 nodes.

Memory Mongo Db Chat, Agent, HTTP Request +14
AI & RAG

This template is a complete, hands-on tutorial for building a RAG (Retrieval-Augmented Generation) pipeline. In simple terms, you'll teach an AI to become an expert on a specific topic—in this case, t

Memory Buffer Window, Supabase Vector Store, Document Default Data Loader +8
AI & RAG

Tech Radar. Uses googleDrive, documentDefaultDataLoader, stickyNote, mySql. Scheduled trigger; 53 nodes.

Google Drive, Document Default Data Loader, MySQL +15
AI & RAG

This project is built on top of the famous open source ThoughtWorks Tech Radar.

Google Drive, Document Default Data Loader, MySQL +15