AutomationFlowsAI & RAG › Fetch All Page Content From Website and Store with Gemini Embedding in Pinecone

Fetch All Page Content From Website and Store with Gemini Embedding in Pinecone

ByZain Khan @zain on n8n.io

Use cases are many: Populate a custom chatbot's knowledge base, create a powerful search index for your website, or build a comprehensive repository of information for internal tools!

Event trigger★★★★☆ complexityAI-powered16 nodesXMLHTTP RequestDocument Default Data LoaderGoogle Gemini EmbeddingsPinecone Vector StoreForm Trigger
AI & RAG Trigger: Event Nodes: 16 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #6526 — we link there as the canonical source.

This workflow follows the Documentdefaultdataloader → Google Gemini Embeddings recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "nodes": [
    {
      "id": "5ad6a510-3c4a-47e4-b8ff-c0e565e25d25",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        368,
        944
      ],
      "parameters": {
        "content": ""
      },
      "typeVersion": 1
    },
    {
      "id": "3ff777b7-24bd-420c-af38-62a395f52a1a",
      "name": "Extract Page URLs",
      "type": "n8n-nodes-base.code",
      "position": [
        1936,
        1392
      ],
      "parameters": {},
      "typeVersion": 2
    },
    {
      "id": "6176e651-cef5-44e8-abed-0f6f6b81517b",
      "name": "XML Conversion",
      "type": "n8n-nodes-base.xml",
      "position": [
        1792,
        1392
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "cca1e7e7-32f6-42fd-b23c-3c2586344a50",
      "name": "Fetch Sitemap",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1632,
        1392
      ],
      "parameters": {},
      "typeVersion": 4.2
    },
    {
      "id": "520e131d-b5f2-4857-aebd-5724da2a8083",
      "name": "Split Pages URL",
      "type": "n8n-nodes-base.code",
      "position": [
        1792,
        1216
      ],
      "parameters": {},
      "typeVersion": 2
    },
    {
      "id": "7e7fe528-8748-470b-b627-a0c79b5aface",
      "name": "Merge URLs",
      "type": "n8n-nodes-base.merge",
      "position": [
        2128,
        1232
      ],
      "parameters": {},
      "typeVersion": 3.2
    },
    {
      "id": "a0517aaf-6ccd-481d-b97e-b183d305451b",
      "name": "Remove Duplicate URLs",
      "type": "n8n-nodes-base.removeDuplicates",
      "position": [
        2272,
        1232
      ],
      "parameters": {},
      "typeVersion": 2
    },
    {
      "id": "72c85ccf-a9d6-42b1-85a7-76800ba831e5",
      "name": "Loop Over Page URLs",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        2480,
        1232
      ],
      "parameters": {},
      "typeVersion": 3
    },
    {
      "id": "73aebd19-60ae-40d1-a747-0b9537d9d67c",
      "name": "Extract Content",
      "type": "n8n-nodes-base.html",
      "position": [
        2672,
        1136
      ],
      "parameters": {},
      "typeVersion": 1.2
    },
    {
      "id": "0dbf70c1-cb57-4691-916f-2a2aa9a4cec0",
      "name": "Fetch Page HTML For content",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        2672,
        1328
      ],
      "parameters": {},
      "typeVersion": 4.2
    },
    {
      "id": "fa1c18c6-6c29-4e71-905e-0945909af99b",
      "name": "Wait 5 sec",
      "type": "n8n-nodes-base.wait",
      "position": [
        2832,
        1328
      ],
      "parameters": {},
      "typeVersion": 1.1
    },
    {
      "id": "2bf3ad7f-a2fd-44f9-b6af-5a500ef80591",
      "name": "Data Loader",
      "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader",
      "position": [
        3264,
        1344
      ],
      "parameters": {},
      "typeVersion": 1.1
    },
    {
      "id": "a86d4c2e-559c-4942-ac0d-2ddcc7eb7f39",
      "name": "Gemini Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsGoogleGemini",
      "position": [
        3072,
        1344
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "f46188bd-c0a2-4d49-9b67-0937f891ae36",
      "name": "Pinecone KnowledgeBase",
      "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone",
      "position": [
        3072,
        1136
      ],
      "parameters": {},
      "typeVersion": 1.3
    },
    {
      "id": "4f5dc6e3-8f75-46ab-b3e1-49deb7695469",
      "name": "Input Sitemap or page urls",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        1296,
        1376
      ],
      "parameters": {},
      "typeVersion": 2.2
    },
    {
      "id": "67f6e98a-946c-4460-93d4-707511deb4f5",
      "name": "Switch",
      "type": "n8n-nodes-base.switch",
      "position": [
        1440,
        1376
      ],
      "parameters": {},
      "typeVersion": 3.2
    }
  ],
  "connections": {
    "Switch": {
      "main": [
        [
          {
            "node": "Split Pages URL",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Fetch Sitemap",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Merge URLs": {
      "main": [
        [
          {
            "node": "Remove Duplicate URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Wait 5 sec": {
      "main": [
        [
          {
            "node": "Loop Over Page URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Data Loader": {
      "ai_document": [
        [
          {
            "node": "Pinecone KnowledgeBase",
            "type": "ai_document",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Sitemap": {
      "main": [
        [
          {
            "node": "XML Conversion",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "XML Conversion": {
      "main": [
        [
          {
            "node": "Extract Page URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Content": {
      "main": [
        [
          {
            "node": "Pinecone KnowledgeBase",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Pages URL": {
      "main": [
        [
          {
            "node": "Merge URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Page URLs": {
      "main": [
        [
          {
            "node": "Merge URLs",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    "Gemini Embeddings": {
      "ai_embedding": [
        [
          {
            "node": "Pinecone KnowledgeBase",
            "type": "ai_embedding",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over Page URLs": {
      "main": [
        [
          {
            "node": "Extract Content",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Fetch Page HTML For content",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Remove Duplicate URLs": {
      "main": [
        [
          {
            "node": "Loop Over Page URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Input Sitemap or page urls": {
      "main": [
        [
          {
            "node": "Switch",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Page HTML For content": {
      "main": [
        [
          {
            "node": "Wait 5 sec",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Use cases are many: Populate a custom chatbot's knowledge base, create a powerful search index for your website, or build a comprehensive repository of information for internal tools!

Source: https://n8n.io/workflows/6526/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

Categories: Business Automation, Customer Support, AI, Knowledge Management

XML, HTTP Request, Document Default Data Loader +8
AI & RAG

This template is designed for podcasters, researchers, educators, product teams, and support teams who work with audio content and want to turn it into searchable knowledge. It is especially useful fo

Form Trigger, HTTP Request, Pinecone Vector Store +8
AI & RAG

This simple philosophy changes the way we think about automated sales agents. Context changes everything. In this 4-part workflow, we start by creating a knowledge base that will act as context across

Pinecone Vector Store, Document Default Data Loader, Text Splitter Recursive Character Text Splitter +12
AI & RAG

My workflow 3. Uses formTrigger, splitInBatches, lmChatGoogleGemini, httpRequest. Event-driven trigger; 36 nodes.

Form Trigger, Google Gemini Chat, HTTP Request +10
AI & RAG

This n8n workflow automates the entire process, from learning based on your website data, documents to a multi-channel chatbot with automated ticket creation. It's the perfect solution for businesses

Chat Trigger, Agent, Form Trigger +10