{
  "nodes": [
    {
      "id": "92907f4d-8072-41f7-8518-8fb536760a71",
      "name": "Process Chunks",
      "type": "n8n-nodes-base.code",
      "position": [
        3600,
        1616
      ],
      "parameters": {
        "jsCode": "// Extract and flatten chunks from Apify scraper output\nconst items = [];\n\nfor (const item of $input.all()) {\n  const data = item.json;\n  \n  // Handle both single page and batch outputs\n  const chunks = data.chunks || [];\n  \n  for (const chunk of chunks) {\n    items.push({\n      // Vector database fields\n      id: chunk.id,\n      text: chunk.text,\n      \n      // Page metadata\n      pageUrl: chunk.metadata.source_url,\n      pageTitle: chunk.metadata.page_title,\n      websiteDomain: new URL(chunk.metadata.source_url).hostname,\n      \n      // Chunk metadata\n      chunkIndex: chunk.metadata.chunk_index,\n      tokenCount: chunk.metadata.token_count,\n      startPosition: chunk.metadata.start_position,\n      endPosition: chunk.metadata.end_position,\n      \n      // Content classification\n      sectionTitle: chunk.metadata.section_title || 'No Section',\n      headingLevel: chunk.metadata.heading_level,\n      contentType: chunk.metadata.content_type,\n      hasCode: chunk.metadata.has_code,\n      language: chunk.metadata.language,\n      keywords: chunk.metadata.keywords.join(', '),\n      \n      // Crawl information\n      crawledAt: data.crawl_info.crawled_at,\n      crawlDepth: data.crawl_info.crawl_depth\n    });\n  }\n}\n\nreturn items;"
      },
      "typeVersion": 2
    },
    {
      "id": "ad7eb840-af2e-4e79-a2b9-64b2b1a41a10",
      "name": "Split In Batches",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        3856,
        1616
      ],
      "parameters": {
        "options": {},
        "batchSize": 50
      },
      "typeVersion": 3
    },
    {
      "id": "8a221c5b-9cbe-4390-bc71-087bf964db81",
      "name": "OpenAI - Create Embeddings",
      "type": "@n8n/n8n-nodes-langchain.openAi",
      "position": [
        4160,
        1600
      ],
      "parameters": {
        "resource": "embedding"
      },
      "typeVersion": 1.3
    },
    {
      "id": "017fc572-2373-4f58-b0ec-7fe8ff30cde6",
      "name": "Combine Embedding with Data",
      "type": "n8n-nodes-base.code",
      "position": [
        4416,
        1600
      ],
      "parameters": {
        "jsCode": "// Combine embedding with metadata\nconst items = $input.all();\nconst result = [];\n\nfor (const item of items) {\n  result.push({\n    ...item.json,\n    embedding: item.json.data[0].embedding\n  });\n}\n\nreturn result;"
      },
      "typeVersion": 2
    },
    {
      "id": "93d44390-1890-4810-81f9-faa35974720e",
      "name": "AI Training Scraper",
      "type": "n8n-nodes-ai-training-scraper.aiTrainingScraper",
      "position": [
        3328,
        1616
      ],
      "parameters": {
        "startUrls": "https://docs.n8n.io/workflows/",
        "additionalOptions": {}
      },
      "typeVersion": 1
    },
    {
      "id": "0096a1f0-a838-4f8f-b5c4-2ae6c9d98327",
      "name": "Pinecone Vector Store",
      "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone",
      "position": [
        4688,
        1600
      ],
      "parameters": {
        "mode": "insert",
        "options": {},
        "pineconeIndex": {
          "__rl": true,
          "mode": "list",
          "value": ""
        }
      },
      "credentials": {
        "pineconeApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "34152b67-777b-4658-8472-a611aa3bbec6",
      "name": "When clicking \u2018Execute workflow\u2019",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        3040,
        1616
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "4062bc35-76b0-4e8b-93b4-09c4be4cf76c",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2992,
        1488
      ],
      "parameters": {
        "width": 768,
        "height": 448,
        "content": "## Website Capture\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n- This block then uses the AI Training Data Scraper to crawl pages and extract structured content.\n\n### User Action Required:\n- Create an Apify account, get an API token from console.apify.com, and paste it into the Apify credential section in n8n.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "5901c9ba-7565-4d9d-b59c-06043422b6d1",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        4064,
        1488
      ],
      "parameters": {
        "width": 944,
        "height": 528,
        "content": "## Embedding & Vector Storage\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n- Generates semantic embeddings for each chunk and merges them with rich metadata, preserving full context for retrieval and citation.\n- Upserts all embeddings into the vector store, grouped by website domain, making the site searchable via semantic similarity.\n\n### User Action Required:\n- Create an OpenAI API key from platform.openai.com/account/api-keys and add it in the credential secton.\n- Create a Pinecone account and index at pinecone.io, generate an API key, and add it as a Pinecone node."
      },
      "typeVersion": 1
    },
    {
      "id": "a199578c-e2f7-4b4e-9f37-8f39efa4e3de",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2224,
        1472
      ],
      "parameters": {
        "color": 4,
        "width": 480,
        "height": 448,
        "content": "## Website RAGFlow\nThis workflow turns any website into a RAG\u2011ready knowledge base for AI chatbots.\nIt crawls the site, cleans and chunks content, generates embeddings, and stores them in a vector database for fast, semantic retrieval.\n\nIt uses a custom community node to power the AI Training Data Scraper. To enable it, go to Settings \u2192 Community Nodes in n8n and add the package name n8n-nodes-ai-training-scraper. After installing, open the node and configure your credentials.\n\nSetup is quick and runs on autopilot once configured, transforming your documentation or marketing sites into a chat\u2011ready knowledge base.\n\n\n\n\n\n\n\nCreated by Blukaze Automations | blukaze.com"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Process Chunks": {
      "main": [
        [
          {
            "node": "Split In Batches",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split In Batches": {
      "main": [
        [
          {
            "node": "OpenAI - Create Embeddings",
            "type": "main",
            "index": 0
          }
        ],
        []
      ]
    },
    "AI Training Scraper": {
      "main": [
        [
          {
            "node": "Process Chunks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI - Create Embeddings": {
      "main": [
        [
          {
            "node": "Combine Embedding with Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Combine Embedding with Data": {
      "main": [
        [
          {
            "node": "Pinecone Vector Store",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "When clicking \u2018Execute workflow\u2019": {
      "main": [
        [
          {
            "node": "AI Training Scraper",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}