AutomationFlowsAI & RAG › Turn Websites Into RAG Chatbot Knowledge Bases with Apify, Openai and Pinecone

Turn Websites Into RAG Chatbot Knowledge Bases with Apify, Openai and Pinecone

ByPaul Abraham @hellopaul on n8n.io

Use cases Convert documentation sites into intelligent support chatbots Build product knowledge bases from marketing websites Create internal search tools from company intranets Power customer support agents with scraped competitor analysis Generate training data for fine-tuning…

Event trigger★★★★☆ complexityAI-powered10 nodesOpenAIN8N Nodes Ai Training ScraperPinecone Vector Store
AI & RAG Trigger: Event Nodes: 10 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #13248 — we link there as the canonical source.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "nodes": [
    {
      "id": "92907f4d-8072-41f7-8518-8fb536760a71",
      "name": "Process Chunks",
      "type": "n8n-nodes-base.code",
      "position": [
        3600,
        1616
      ],
      "parameters": {
        "jsCode": "// Extract and flatten chunks from Apify scraper output\nconst items = [];\n\nfor (const item of $input.all()) {\n  const data = item.json;\n  \n  // Handle both single page and batch outputs\n  const chunks = data.chunks || [];\n  \n  for (const chunk of chunks) {\n    items.push({\n      // Vector database fields\n      id: chunk.id,\n      text: chunk.text,\n      \n      // Page metadata\n      pageUrl: chunk.metadata.source_url,\n      pageTitle: chunk.metadata.page_title,\n      websiteDomain: new URL(chunk.metadata.source_url).hostname,\n      \n      // Chunk metadata\n      chunkIndex: chunk.metadata.chunk_index,\n      tokenCount: chunk.metadata.token_count,\n      startPosition: chunk.metadata.start_position,\n      endPosition: chunk.metadata.end_position,\n      \n      // Content classification\n      sectionTitle: chunk.metadata.section_title || 'No Section',\n      headingLevel: chunk.metadata.heading_level,\n      contentType: chunk.metadata.content_type,\n      hasCode: chunk.metadata.has_code,\n      language: chunk.metadata.language,\n      keywords: chunk.metadata.keywords.join(', '),\n      \n      // Crawl information\n      crawledAt: data.crawl_info.crawled_at,\n      crawlDepth: data.crawl_info.crawl_depth\n    });\n  }\n}\n\nreturn items;"
      },
      "typeVersion": 2
    },
    {
      "id": "ad7eb840-af2e-4e79-a2b9-64b2b1a41a10",
      "name": "Split In Batches",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        3856,
        1616
      ],
      "parameters": {
        "options": {},
        "batchSize": 50
      },
      "typeVersion": 3
    },
    {
      "id": "8a221c5b-9cbe-4390-bc71-087bf964db81",
      "name": "OpenAI - Create Embeddings",
      "type": "@n8n/n8n-nodes-langchain.openAi",
      "position": [
        4160,
        1600
      ],
      "parameters": {
        "resource": "embedding"
      },
      "typeVersion": 1.3
    },
    {
      "id": "017fc572-2373-4f58-b0ec-7fe8ff30cde6",
      "name": "Combine Embedding with Data",
      "type": "n8n-nodes-base.code",
      "position": [
        4416,
        1600
      ],
      "parameters": {
        "jsCode": "// Combine embedding with metadata\nconst items = $input.all();\nconst result = [];\n\nfor (const item of items) {\n  result.push({\n    ...item.json,\n    embedding: item.json.data[0].embedding\n  });\n}\n\nreturn result;"
      },
      "typeVersion": 2
    },
    {
      "id": "93d44390-1890-4810-81f9-faa35974720e",
      "name": "AI Training Scraper",
      "type": "n8n-nodes-ai-training-scraper.aiTrainingScraper",
      "position": [
        3328,
        1616
      ],
      "parameters": {
        "startUrls": "https://docs.n8n.io/workflows/",
        "additionalOptions": {}
      },
      "typeVersion": 1
    },
    {
      "id": "0096a1f0-a838-4f8f-b5c4-2ae6c9d98327",
      "name": "Pinecone Vector Store",
      "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone",
      "position": [
        4688,
        1600
      ],
      "parameters": {
        "mode": "insert",
        "options": {},
        "pineconeIndex": {
          "__rl": true,
          "mode": "list",
          "value": ""
        }
      },
      "credentials": {
        "pineconeApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "34152b67-777b-4658-8472-a611aa3bbec6",
      "name": "When clicking \u2018Execute workflow\u2019",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        3040,
        1616
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "4062bc35-76b0-4e8b-93b4-09c4be4cf76c",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2992,
        1488
      ],
      "parameters": {
        "width": 768,
        "height": 448,
        "content": "## Website Capture\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n- This block then uses the AI Training Data Scraper to crawl pages and extract structured content.\n\n### User Action Required:\n- Create an Apify account, get an API token from console.apify.com, and paste it into the Apify credential section in n8n.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "5901c9ba-7565-4d9d-b59c-06043422b6d1",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        4064,
        1488
      ],
      "parameters": {
        "width": 944,
        "height": 528,
        "content": "## Embedding & Vector Storage\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n- Generates semantic embeddings for each chunk and merges them with rich metadata, preserving full context for retrieval and citation.\n- Upserts all embeddings into the vector store, grouped by website domain, making the site searchable via semantic similarity.\n\n### User Action Required:\n- Create an OpenAI API key from platform.openai.com/account/api-keys and add it in the credential secton.\n- Create a Pinecone account and index at pinecone.io, generate an API key, and add it as a Pinecone node."
      },
      "typeVersion": 1
    },
    {
      "id": "a199578c-e2f7-4b4e-9f37-8f39efa4e3de",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2224,
        1472
      ],
      "parameters": {
        "color": 4,
        "width": 480,
        "height": 448,
        "content": "## Website RAGFlow\nThis workflow turns any website into a RAG\u2011ready knowledge base for AI chatbots.\nIt crawls the site, cleans and chunks content, generates embeddings, and stores them in a vector database for fast, semantic retrieval.\n\nIt uses a custom community node to power the AI Training Data Scraper. To enable it, go to Settings \u2192 Community Nodes in n8n and add the package name n8n-nodes-ai-training-scraper. After installing, open the node and configure your credentials.\n\nSetup is quick and runs on autopilot once configured, transforming your documentation or marketing sites into a chat\u2011ready knowledge base.\n\n\n\n\n\n\n\nCreated by Blukaze Automations | blukaze.com"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Process Chunks": {
      "main": [
        [
          {
            "node": "Split In Batches",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split In Batches": {
      "main": [
        [
          {
            "node": "OpenAI - Create Embeddings",
            "type": "main",
            "index": 0
          }
        ],
        []
      ]
    },
    "AI Training Scraper": {
      "main": [
        [
          {
            "node": "Process Chunks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI - Create Embeddings": {
      "main": [
        [
          {
            "node": "Combine Embedding with Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Combine Embedding with Data": {
      "main": [
        [
          {
            "node": "Pinecone Vector Store",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "When clicking \u2018Execute workflow\u2019": {
      "main": [
        [
          {
            "node": "AI Training Scraper",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Use cases Convert documentation sites into intelligent support chatbots Build product knowledge bases from marketing websites Create internal search tools from company intranets Power customer support agents with scraped competitor analysis Generate training data for fine-tuning…

Source: https://n8n.io/workflows/13248/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This advanced n8n workflow automates the full lead enrichment, qualification, and personalized outreach process tailored specifically for the B2B real estate sector. Integrating top platforms like Api

N8N Nodes Fillout, OpenAI Chat, Pinecone Vector Store +11
AI & RAG

This n8n template automatically classifies incoming emails (Sales, Support, Internal, Finance, Promotions) and routes them to a dedicated OpenAI LLM Agent for processing. Depending on the category, th

OpenAI, Gmail, Text Classifier +16
AI & RAG

This is an ultimate AI assistant: Handle emails, schedule meetings, search the web, take notes, post to social media, and retrieve information from your knowledge base, all through simple Telegram com

Telegram Trigger, OpenAI, Agent +12
AI & RAG

Telegram dummy_client. Uses telegramTrigger, agent, lmChatOpenAi, telegram. Event-driven trigger; 48 nodes.

Telegram Trigger, Agent, OpenAI Chat +12
AI & RAG

Imagine having a personal AI secretary accessible right from your Telegram, ready to assist you with information and remember everything you discuss. This n8n workflow transforms Telegram into your in

Telegram Trigger, OpenAI, Agent +9