{
  "id": "zqaMsVBh9XGybqUC",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "LLMs.txt Generator with ScrapeGraph AI",
  "tags": [],
  "nodes": [
    {
      "id": "0b7f74f3-036c-4e62-9f6a-411e322808f6",
      "name": "When clicking \u2018Execute workflow\u2019",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -224,
        0
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "b46b464e-aa16-4a75-b9c7-e24a7debccfe",
      "name": "Wait",
      "type": "n8n-nodes-base.wait",
      "position": [
        528,
        0
      ],
      "parameters": {
        "amount": 20
      },
      "typeVersion": 1.1
    },
    {
      "id": "ad894092-0925-4abe-9566-bc4755919ba8",
      "name": "Status crawler",
      "type": "n8n-nodes-scrapegraphai.scrapegraphAi",
      "position": [
        752,
        0
      ],
      "parameters": {
        "resource": "smartcrawler"
      },
      "credentials": {
        "scrapegraphAIApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "a9158707-3c47-42ea-8b12-e7380ec025fe",
      "name": "Scraper",
      "type": "n8n-nodes-scrapegraphai.scrapegraphAiTool",
      "position": [
        1776,
        160
      ],
      "parameters": {
        "resource": "smartscraper"
      },
      "credentials": {
        "scrapegraphAIApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "7862687f-8102-4448-8c7d-73aea3b369b3",
      "name": "OpenAI Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "position": [
        1536,
        176
      ],
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-5.4-mini",
          "cachedResultName": "gpt-5.4-mini"
        },
        "options": {},
        "builtInTools": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.3
    },
    {
      "id": "a90364b5-671b-4f11-bbbb-fc724987d07d",
      "name": "to Binary",
      "type": "n8n-nodes-base.code",
      "position": [
        1920,
        -16
      ],
      "parameters": {
        "jsCode": "return items.map(item => {\n\n\tconst content = item.json.output || '';\n\n\treturn {\n\t\tjson: {},\n\t\tbinary: {\n\t\t\tdata: {\n\t\t\t\tdata: Buffer.from(content).toString('base64'),\n\t\t\t\tmimeType: 'text/plain',\n\t\t\t\tfileName: 'llms.txt'\n\t\t\t}\n\t\t}\n\t};\n\n});"
      },
      "typeVersion": 2
    },
    {
      "id": "6c0e809b-3aca-4794-88aa-7bc51a88e02b",
      "name": "LLMS.txt Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        1568,
        -16
      ],
      "parameters": {
        "text": "={{ JSON.stringify($json.internal_links) }}",
        "options": {
          "systemMessage": "# Role\nYou are an agent specialized in generating `llms.txt` files compliant with the official specification (llmstxt.org). Your task is to analyze a website starting from a list of internal URLs and produce a structured Markdown file that describes the site optimally for LLMs.\n\n# Input\nYou will receive a JSON with this structure:\n{\n  \"internal_links\": [\"https://...\", \"https://...\", ...]\n}\n\n# Available tools\n- **Scraper**: takes a URL as input and returns the page content (title, meta description, headings, main text). You MUST use it for every URL before describing it. Never make up content.\n\n# Operating procedure\n\n## Step 1 \u2014 Homepage analysis\nIdentify the homepage (shortest URL, typically the domain root) and call `Scraper` on it to extract:\n- Site / company name (from title or H1)\n- Mission / brief description (from meta description or first paragraph)\n- Site language (keep it consistent throughout the file)\n\n## Step 2 \u2014 Internal pages analysis\nFor EVERY other URL in the list, call `Scraper` and extract:\n- Page title (H1 or title tag, cleaned of suffixes like \"| Site Name\")\n- Concise description (max 100-150 characters, based on meta description or first paragraph)\n\nIf a page returns an error, empty content, or duplicate, silently exclude it.\n\n## Step 3 \u2014 Categorization\nGroup URLs into logical sections based on URL patterns and content:\n- `/services/*`, `/servizi/*` \u2192 **Services** section\n- `/products/*`, `/shop/*` \u2192 **Products** section\n- `/portfolio/*`, `/case-study/*`, `/work/*` \u2192 **Portfolio** section\n- `/blog/*`, `/news/*`, `/articles/*` \u2192 **Blog** section\n- `/about`, `/about-us`, `/team` \u2192 **Company** section\n- `/contact`, `/contacts` \u2192 **Contact** section\n- Legal pages (`privacy`, `cookie`, `terms`, `gdpr`) \u2192 **Optional** section\n- Homepage and generic pages \u2192 **Main pages** section\n\n## Step 4 \u2014 Output generation\nCompose the file following EXACTLY this structure:\n\n# [Site name]\n\n> [Site summary in 1-2 sentences, from the homepage]\n\n[Optional paragraph with additional context, only if useful]\n\n## Main pages\n\n- [Title](URL): Concise description\n- [Title](URL): Concise description\n\n## Services\n\n- [Title](URL): Concise description\n\n## Portfolio\n\n- [Title](URL): Concise description\n\n## Contact\n\n- [Title](URL): Concise description\n\n## Optional\n\n- [Title](URL): Concise description\n\n# Strict rules\n1. **ALWAYS use the Scraper tool** for every URL before describing it. Never invent titles or descriptions.\n2. **Preserve the original language** of the site (if it's in Italian \u2192 descriptions in Italian).\n3. **Short and informative descriptions**: max 1-2 sentences, avoid generic promotional phrases (\"the best solution\", \"industry-leading\").\n4. **No external links**, only URLs present in the input list.\n5. **\"Optional\" section** always LAST, reserved for legal and secondary pages.\n6. **Skip empty sections**: do not include section headings without links.\n7. **Pure Markdown output**: return ONLY the content of the `llms.txt` file, without opening/closing backticks, without preambles, without final comments. The first character of your response must be `#`.\n8. **Section order** by importance: Main pages \u2192 Services/Products \u2192 Portfolio \u2192 Blog \u2192 Company \u2192 Contact \u2192 Optional."
        },
        "promptType": "define"
      },
      "typeVersion": 3.1
    },
    {
      "id": "e22875c5-a5e2-4fa5-bf71-ca1fa8416069",
      "name": "Internal Links",
      "type": "n8n-nodes-base.set",
      "position": [
        1328,
        -16
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "47a6ad14-cc77-4b1f-84a0-a8ef731cdc86",
              "name": "internal_links",
              "type": "array",
              "value": "={{ $json.result.llm_result.internal_links }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "3e3a3143-5426-4f5c-a415-051c4c276691",
      "name": "Upload to FTP",
      "type": "n8n-nodes-base.ftp",
      "position": [
        2160,
        -16
      ],
      "parameters": {
        "path": "=/YOUR_PATH/{{$binary.data.fileName}}",
        "options": {},
        "operation": "upload"
      },
      "credentials": {
        "ftp": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "b11ff464-37f4-4bee-b2b3-79a215f4ea9c",
      "name": "If success",
      "type": "n8n-nodes-base.if",
      "position": [
        1040,
        0
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "ec0239ac-bffb-4187-b7dc-4219536e9f7e",
              "operator": {
                "type": "string",
                "operation": "equals"
              },
              "leftValue": "={{ $json.status }}",
              "rightValue": "success"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "fa5b6da4-daac-446d-80f3-7f4331c521ee",
      "name": "Crawler",
      "type": "n8n-nodes-scrapegraphai.scrapegraphAi",
      "position": [
        288,
        0
      ],
      "parameters": {
        "resource": "smartcrawler"
      },
      "credentials": {
        "scrapegraphAIApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "8fa6a432-31e4-4fa5-afa6-9ca350c7cffa",
      "name": "Set domain",
      "type": "n8n-nodes-base.set",
      "position": [
        48,
        0
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "bf073095-07f9-493d-be86-8dcd9086aecf",
              "name": "your_domain",
              "type": "string",
              "value": "n3w.it"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "6c658166-4348-4092-afb2-c1b670de17a9",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -64,
        -688
      ],
      "parameters": {
        "width": 656,
        "height": 544,
        "content": "## Auto LLMs.txt Generator for websites with ScrapeGraph AI \nThis workflow automatically generates an `llms.txt` file for any given website. It uses ScrapegraphAI to crawl and scrape pages, an OpenAI chat model to process content, and finally uploads the generated file via FTP.\n\n### How it works\n\nThis workflow starts manually, crawls the target domain with ScrapegraphAI, waits until crawling is complete, then extracts all discovered internal links. An OpenAI-powered AI agent uses ScrapegraphAI\u2019s Scraper tool to visit each URL, analyze real page content, identify the site title, description, language, and organize pages into logical `llms.txt` sections.\n\nThe workflow then generates a clean Markdown `llms.txt` file following the llmstxt org structure, converts it into a binary `.txt` file, and uploads it to the configured FTP/CDN path. The agent must scrape every URL before writing descriptions and is not allowed to invent content.\n\n### Setup steps\n\nConfigure n8n credentials for ScrapegraphAI, OpenAI, and FTP, then update the target domain in the **Set domain** node without including `https://`. Adjust the Wait node if the website is large, and set the correct remote upload directory in the FTP node so the generated file is saved as `llms.txt`.\n\nOptionally customize the AI prompt for different sections, languages, or URL exclusions. Save and activate the workflow, execute it from the Manual Trigger node, then verify the uploaded `llms.txt` file on your FTP server\n"
      },
      "typeVersion": 1
    },
    {
      "id": "7a8c3c7a-6383-470f-973f-396c5f23d36f",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -64,
        -112
      ],
      "parameters": {
        "color": 7,
        "width": 304,
        "height": 288,
        "content": "## STEP 1 - Target domain\nSet your target domain"
      },
      "typeVersion": 1
    },
    {
      "id": "50a00e07-5441-4928-b916-c6c60a548bd9",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        256,
        -112
      ],
      "parameters": {
        "color": 7,
        "width": 992,
        "height": 288,
        "content": "## STEP 2 - Crawling\nStarts a crawl of the specified domain using ScrapegraphAI\u2019s smartcrawler. The crawler extracts all internal links from the domain (acting like a sitemap generator)"
      },
      "typeVersion": 1
    },
    {
      "id": "a449eb8a-f5e3-4b48-83d0-49b577eab037",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1456,
        -112
      ],
      "parameters": {
        "color": 7,
        "width": 400,
        "height": 288,
        "content": "## STEP 3 - LLMS.txt Agent\nGenerate a clean Markdown file (llms.txt) following the official spec."
      },
      "typeVersion": 1
    },
    {
      "id": "9a514eaf-c5d5-473f-bbe0-c692e92224ec",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1872,
        -112
      ],
      "parameters": {
        "color": 7,
        "width": 480,
        "height": 288,
        "content": "## STEP 4 - Upload to website\nConvert to binary file and Upload to an FTP server"
      },
      "typeVersion": 1
    },
    {
      "id": "e61beda5-e672-48a0-9f8a-fb1dda591cf5",
      "name": "Sticky Note8",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        624,
        -880
      ],
      "parameters": {
        "color": 7,
        "width": 736,
        "height": 736,
        "content": "## MY NEW YOUTUBE CHANNEL\n\ud83d\udc49 [Subscribe to my new **YouTube channel**](https://youtube.com/@n3witalia). Here I\u2019ll share videos and Shorts with practical tutorials and **FREE templates for n8n**.\n\n[![image](https://n3wstorage.b-cdn.net/n3witalia/youtube-n8n-cover.jpg)](https://youtube.com/@n3witalia)"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "binaryMode": "separate",
    "executionOrder": "v1"
  },
  "versionId": "2ad22f44-bd11-4bab-8001-d4736f757256",
  "connections": {
    "Wait": {
      "main": [
        [
          {
            "node": "Status crawler",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Crawler": {
      "main": [
        [
          {
            "node": "Wait",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Scraper": {
      "ai_tool": [
        [
          {
            "node": "LLMS.txt Agent",
            "type": "ai_tool",
            "index": 0
          }
        ]
      ]
    },
    "to Binary": {
      "main": [
        [
          {
            "node": "Upload to FTP",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "If success": {
      "main": [
        [
          {
            "node": "Internal Links",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Wait",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set domain": {
      "main": [
        [
          {
            "node": "Crawler",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Internal Links": {
      "main": [
        [
          {
            "node": "LLMS.txt Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "LLMS.txt Agent": {
      "main": [
        [
          {
            "node": "to Binary",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Status crawler": {
      "main": [
        [
          {
            "node": "If success",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "LLMS.txt Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "When clicking \u2018Execute workflow\u2019": {
      "main": [
        [
          {
            "node": "Set domain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}