AutomationFlowsAI & RAG › Scrape Website and Store Embeddings in Supabase

Scrape Website and Store Embeddings in Supabase

Original n8n title: Scrape and Ingest Web Content Into Supabase Pgvector with Firecrawl

ByFirecrawl @firecrawl on n8n.io

What this does

Webhook trigger★★★★☆ complexityAI-powered20 nodesSupabase@Mendable/N8N Nodes FirecrawlDocument Default Data LoaderOpenAI EmbeddingsChat TriggerAgentOpenRouter ChatMemory Buffer Window
AI & RAG Trigger: Webhook Nodes: 20 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #13911 — we link there as the canonical source.

This workflow follows the Agent → Chat Trigger recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "7GvXseCzfctnyRwo",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Scrape and ingest web content into Supabase pgvector with Firecrawl",
  "tags": [],
  "nodes": [
    {
      "id": "1221fcc2-23e9-40c8-8d25-5e8de01bf3fb",
      "name": "Receive company URL",
      "type": "n8n-nodes-base.webhook",
      "position": [
        -1136,
        336
      ],
      "parameters": {
        "path": "dedaa64a-3dc9-43ea-82ac-7fac034af0b2",
        "options": {},
        "httpMethod": "POST",
        "responseMode": "responseNode"
      },
      "typeVersion": 2.1
    },
    {
      "id": "12509942-87e2-422d-b97a-3502bb7b4f2a",
      "name": "Validate and normalize URL",
      "type": "n8n-nodes-base.code",
      "onError": "continueErrorOutput",
      "position": [
        -912,
        336
      ],
      "parameters": {
        "jsCode": "const body = $input.first().json.body;\nconst raw = body?.url?.trim();\n\nif (!raw) {\n  return [{\n    json: {\n      status: 422,\n      message: \"Missing 'url' field in request body.\"\n    }\n  }];\n}\n\n// Strip protocol and path to get clean domain\nconst domain = raw.replace(/^https?:\\/\\//i, \"\").replace(/\\/.*$/, \"\");\n\n// Validate domain format\nconst isValid = /^[a-zA-Z0-9]([a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])?(\\.[a-zA-Z]{2,})+$/.test(domain);\n\nif (!isValid) {\n  throw new Error(`Invalid URL: \"${raw}\" is not a valid domain or URL.`);\n}\nreturn [{\n  json: {\n    status: 200,\n    domain: domain,\n    url: `https://${domain}`\n  }\n}];"
      },
      "typeVersion": 2
    },
    {
      "id": "82d40c90-d166-4f8b-8224-489322aa007c",
      "name": "Check for duplicate in Supabase",
      "type": "n8n-nodes-base.supabase",
      "position": [
        -688,
        240
      ],
      "parameters": {
        "filters": {
          "conditions": [
            {
              "keyName": "metadata",
              "keyValue": "={\n  \"loc\": {\n    \"lines\": {\n      \"to\": 48,\n      \"from\": 1\n    }\n  },\n  \"url\": \"{{ $json.url }}\",\n  \"line\": 1,\n  \"source\": \"blob\",\n  \"blobType\": \"application/json\"\n}"
            }
          ]
        },
        "tableId": "documents",
        "operation": "get"
      },
      "credentials": {
        "supabaseApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1,
      "alwaysOutputData": true
    },
    {
      "id": "85705c58-aaf9-487d-b358-f0d87f49afef",
      "name": "Return duplicate notice",
      "type": "n8n-nodes-base.respondToWebhook",
      "position": [
        -240,
        144
      ],
      "parameters": {
        "options": {
          "responseCode": 200
        },
        "respondWith": "json",
        "responseBody": "{\n  \"message\": \"Already in the database\"\n}"
      },
      "typeVersion": 1.5
    },
    {
      "id": "28f6fcb9-581a-4249-9a95-d5d6d3716873",
      "name": "Scrape company website with Firecrawl",
      "type": "@mendable/n8n-nodes-firecrawl.firecrawl",
      "position": [
        -240,
        336
      ],
      "parameters": {
        "url": "={{ $('Validate and normalize URL').item.json.url }}",
        "operation": "scrape",
        "scrapeOptions": {
          "options": {
            "formats": {
              "format": [
                {}
              ]
            },
            "headers": {}
          }
        },
        "requestOptions": {}
      },
      "credentials": {
        "firecrawlApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "34256e66-c794-40ba-a579-b975dd2d2e82",
      "name": "Return URL validation error",
      "type": "n8n-nodes-base.respondToWebhook",
      "position": [
        -688,
        432
      ],
      "parameters": {
        "options": {
          "responseKey": "={{ $json.error }}",
          "responseCode": 422
        }
      },
      "typeVersion": 1.5
    },
    {
      "id": "2b060cc6-1291-4913-ba4b-02d74917a945",
      "name": "Load scraped content",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "position": [
        112,
        560
      ],
      "parameters": {
        "options": {
          "metadata": {
            "metadataValues": [
              {
                "name": "url",
                "value": "={{ $('Validate and normalize URL').item.json.url }}"
              }
            ]
          }
        }
      },
      "typeVersion": 1.1
    },
    {
      "id": "95bc5d9a-a4bb-4f61-a051-511f811f7757",
      "name": "Generate OpenAI embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "position": [
        -64,
        560
      ],
      "parameters": {
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "0f82ceff-00a2-4486-89bd-18c0aa789d75",
      "name": "Receive chat message",
      "type": "@n8n/n8n-nodes-langchain.chatTrigger",
      "position": [
        -1136,
        704
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 1.4
    },
    {
      "id": "63d0275d-573b-4308-87a4-15510e8455a2",
      "name": "Answer query from enriched leads",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        -912,
        704
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 3.1
    },
    {
      "id": "434db832-19d9-4a2a-8937-eb505c1e9371",
      "name": "OpenRouter LLM",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenRouter",
      "position": [
        -1008,
        896
      ],
      "parameters": {
        "model": "anthropic/claude-sonnet-4.6",
        "options": {}
      },
      "credentials": {
        "openRouterApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "f2f212d5-d6df-4a25-828b-d1447ff05712",
      "name": "Chat memory",
      "type": "@n8n/n8n-nodes-langchain.memoryBufferWindow",
      "position": [
        -848,
        896
      ],
      "parameters": {},
      "typeVersion": 1.3
    },
    {
      "id": "666976cb-4d2f-4e29-8fe2-be220c02c26e",
      "name": "Generate OpenAI embeddings1",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "position": [
        -688,
        1056
      ],
      "parameters": {
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "ef08cbc2-9cb9-4ca4-8698-1748140d8fc7",
      "name": "Rerank results with Cohere",
      "type": "@n8n/n8n-nodes-langchain.rerankerCohere",
      "position": [
        -528,
        1056
      ],
      "parameters": {},
      "credentials": {
        "cohereApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "a92fe745-93f2-471b-a140-e9e0367bb5cd",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1776,
        336
      ],
      "parameters": {
        "width": 512,
        "height": 448,
        "content": "### How it works\n1. A webhook receives a URL via POST request\n2. The URL is validated, normalized, and checked for duplicates in Supabase\n3. Firecrawl scrapes the page and converts it to clean markdown\n4. OpenAI generates vector embeddings from the scraped content\n5. The content and embeddings are stored in Supabase pgvector\n6. A built-in RAG chat agent lets you query the knowledge base using natural language, with Cohere reranking for better retrieval\n\n### Setup\n1. Create a Supabase project and run the SQL from the README to create the `documents` table with pgvector enabled\n2. Add your Firecrawl API key\n3. Add your OpenAI API key (for embeddings)\n4. Add your OpenRouter API key (for the chat agent)\n5. Add your Cohere API key (for reranking)\n6. Activate the workflow and send a POST request with `{\"url\": \"https://example.com\"}` to the webhook"
      },
      "typeVersion": 1
    },
    {
      "id": "43f6a78c-69d8-4aa0-8fc7-9bcd44e8f2a4",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1776,
        800
      ],
      "parameters": {
        "color": 7,
        "width": 512,
        "height": 96,
        "content": "## Supabase setup\nRun the SQL migration from the workflow README to create the `documents` table with pgvector enabled."
      },
      "typeVersion": 1
    },
    {
      "id": "be2a6f00-ee44-4eb8-bb19-e361f5b3577f",
      "name": "Skip if already ingested",
      "type": "n8n-nodes-base.if",
      "position": [
        -464,
        240
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 3,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "fc4c27fb-9647-4457-a19b-2737a50dfb9f",
              "operator": {
                "type": "object",
                "operation": "notEmpty",
                "singleValue": true
              },
              "leftValue": "={{$input.all()[0].json}}",
              "rightValue": "1"
            }
          ]
        }
      },
      "typeVersion": 2.3
    },
    {
      "id": "b8c12e9c-9a86-408e-a015-bf6587605a32",
      "name": "Store embeddings in Supabase",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreSupabase",
      "position": [
        -16,
        336
      ],
      "parameters": {
        "mode": "insert",
        "options": {},
        "tableName": {
          "__rl": true,
          "mode": "list",
          "value": "documents",
          "cachedResultName": "documents"
        }
      },
      "credentials": {
        "supabaseApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "e46086f3-028a-44b2-8035-af14f2067b64",
      "name": "Return ingestion result",
      "type": "n8n-nodes-base.respondToWebhook",
      "position": [
        336,
        336
      ],
      "parameters": {
        "options": {
          "responseCode": 200
        },
        "respondWith": "json",
        "responseBody": "={\n  \"message\": \"Added {{$input.all().length}} items to Supabase\"\n}"
      },
      "executeOnce": true,
      "typeVersion": 1.5
    },
    {
      "id": "d36fe25d-386d-4262-9faa-8d89f6a3abfd",
      "name": "Retrieve documents from Supabase",
      "type": "@n8n/n8n-nodes-langchain.vectorStoreSupabase",
      "position": [
        -624,
        864
      ],
      "parameters": {
        "mode": "retrieve-as-tool",
        "options": {
          "metadata": {
            "metadataValues": [
              {
                "name": "url",
                "value": "={{ $fromAI('url', 'to filter by URL add the specific URL here.', \"string\") }}"
              }
            ]
          }
        },
        "tableName": {
          "__rl": true,
          "mode": "list",
          "value": "documents",
          "cachedResultName": "documents"
        },
        "useReranker": true,
        "toolDescription": "Retrieve data for the AI Agent."
      },
      "credentials": {
        "supabaseApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.3
    }
  ],
  "active": false,
  "settings": {
    "binaryMode": "separate",
    "availableInMCP": false,
    "executionOrder": "v1"
  },
  "versionId": "ec7ba37c-70dd-4ae0-b367-b4dce7095212",
  "connections": {
    "Chat memory": {
      "ai_memory": [
        [
          {
            "node": "Answer query from enriched leads",
            "type": "ai_memory",
            "index": 0
          }
        ]
      ]
    },
    "OpenRouter LLM": {
      "ai_languageModel": [
        [
          {
            "node": "Answer query from enriched leads",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Receive company URL": {
      "main": [
        [
          {
            "node": "Validate and normalize URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Load scraped content": {
      "ai_document": [
        [
          {
            "node": "Store embeddings in Supabase",
            "type": "ai_document",
            "index": 0
          }
        ]
      ]
    },
    "Receive chat message": {
      "main": [
        [
          {
            "node": "Answer query from enriched leads",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Skip if already ingested": {
      "main": [
        [
          {
            "node": "Return duplicate notice",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Scrape company website with Firecrawl",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Generate OpenAI embeddings": {
      "ai_embedding": [
        [
          {
            "node": "Store embeddings in Supabase",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "Rerank results with Cohere": {
      "ai_reranker": [
        [
          {
            "node": "Retrieve documents from Supabase",
            "type": "ai_reranker",
            "index": 0
          }
        ]
      ]
    },
    "Validate and normalize URL": {
      "main": [
        [
          {
            "node": "Check for duplicate in Supabase",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Return URL validation error",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Generate OpenAI embeddings1": {
      "ai_embedding": [
        [
          {
            "node": "Retrieve documents from Supabase",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "Store embeddings in Supabase": {
      "main": [
        [
          {
            "node": "Return ingestion result",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check for duplicate in Supabase": {
      "main": [
        [
          {
            "node": "Skip if already ingested",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Retrieve documents from Supabase": {
      "ai_tool": [
        [
          {
            "node": "Answer query from enriched leads",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    },
    "Scrape company website with Firecrawl": {
      "main": [
        [
          {
            "node": "Store embeddings in Supabase",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

What this does

Source: https://n8n.io/workflows/13911/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

YouTube Agent. Uses supabase, agent, lmChatAnthropic, outputParserStructured. Webhook trigger; 56 nodes.

Supabase, Agent, Anthropic Chat +10
AI & RAG

Indoor Farming Agent. Uses lmChatOpenAi, documentDefaultDataLoader, embeddingsOpenAi, toolVectorStore. Webhook trigger; 36 nodes.

OpenAI Chat, Document Default Data Loader, OpenAI Embeddings +16
AI & RAG

Fluxo-N8N. Uses googleSheetsTool, dataTable, dataTableTool, informationExtractor. Webhook trigger; 30 nodes.

Google Sheets Tool, Data Table, Data Table Tool +11
AI & RAG

What this does

@Mendable/N8N Nodes Firecrawl, Pinecone Vector Store, OpenAI Embeddings +6
AI & RAG

Supercharge your trading decisions with this end-to-end AI automation that connects market intelligence, technical analysis, and automated trade execution — all without manual intervention.

Tool Think, Supabase Vector Store, OpenAI Embeddings +14