AutomationFlowsWeb Scraping › Extract Website Urls From Sitemap.xml for SEO Analysis

Extract Website Urls From Sitemap.xml for SEO Analysis

ByLe Thua Phu @lethuaphu on n8n.io

This n8n workflow automates the process of crawling a website's sitemap to extract URLs, which is particularly useful for SEO analysis, website auditing, or content monitoring. By leveraging n8n's nodes, the workflow fetches the sitemap from a specified URL, processes the XML…

Event trigger★★★★☆ complexity12 nodesXMLHTTP Request
Web Scraping Trigger: Event Nodes: 12 Complexity: ★★★★☆ Added:

This workflow corresponds to n8n.io template #4671 — we link there as the canonical source.

This workflow follows the HTTP Request → XML recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "n2iZmshLmcXubEpo",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Extract Website URLs from Sitemap.XML for SEO Analysis",
  "tags": [
    {
      "id": "MePRktFsL1ttwWdT",
      "name": "website",
      "createdAt": "2025-05-12T18:47:34.764Z",
      "updatedAt": "2025-05-12T18:47:34.764Z"
    },
    {
      "id": "xutdortHHmV1yNZB",
      "name": "SEO",
      "createdAt": "2025-03-24T16:18:45.828Z",
      "updatedAt": "2025-03-24T16:18:45.828Z"
    }
  ],
  "nodes": [
    {
      "id": "6d91a84e-bf2b-4118-9e35-5baecda1b14b",
      "name": "XML",
      "type": "n8n-nodes-base.xml",
      "position": [
        340,
        -40
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 1
    },
    {
      "id": "65f12b51-d34c-4e87-b581-e29370eb0554",
      "name": "When clicking \u2018Test workflow\u2019",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -320,
        -40
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "b82a0bce-0dd4-4a64-b60a-64ea4021bee5",
      "name": "Split Out",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        560,
        -40
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "sitemapindex.sitemap"
      },
      "typeVersion": 1
    },
    {
      "id": "0e97abde-ba13-4889-979d-0f0e5b085dcb",
      "name": "Set URL",
      "type": "n8n-nodes-base.set",
      "notes": "Set full URL - not domain",
      "position": [
        -100,
        -40
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "fa078c97-4c7c-4c08-a011-5527661997c6",
              "name": "Domain",
              "type": "string",
              "value": "https://phu.io.vn/"
            }
          ]
        }
      },
      "notesInFlow": true,
      "typeVersion": 3.4
    },
    {
      "id": "146a5e34-d64a-450b-8354-770c90547325",
      "name": "Convert to File",
      "type": "n8n-nodes-base.convertToFile",
      "position": [
        1440,
        -40
      ],
      "parameters": {
        "options": {},
        "binaryPropertyName": "={{ $json.loc }}"
      },
      "typeVersion": 1.1
    },
    {
      "id": "0c1f8d18-ca4b-4996-9928-abbc6d45b227",
      "name": "Crawl sitemap",
      "type": "n8n-nodes-base.httpRequest",
      "notes": "or past sitemap URL at here",
      "position": [
        120,
        -40
      ],
      "parameters": {
        "url": "={{ $json.Domain }}sitemap.xml",
        "options": {
          "timeout": 10000
        },
        "responseFormat": "string"
      },
      "notesInFlow": true,
      "typeVersion": 1
    },
    {
      "id": "eaa43363-d059-4c66-8851-7e85d4fb5bd3",
      "name": "Crawl sitemap 2",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        780,
        -40
      ],
      "parameters": {
        "url": "={{ $json.loc }}",
        "options": {
          "timeout": 10000
        },
        "responseFormat": "string"
      },
      "typeVersion": 1
    },
    {
      "id": "692efb13-a6ce-4667-842b-614cf9ee8315",
      "name": "XML 2",
      "type": "n8n-nodes-base.xml",
      "position": [
        1000,
        -40
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 1
    },
    {
      "id": "88a15568-352f-4997-ae2b-522a2713843d",
      "name": "Split Out 2",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        1220,
        -40
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "urlset.url"
      },
      "typeVersion": 1
    },
    {
      "id": "e0e0fb12-1dc5-4665-9530-6b53ed7dc593",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -160,
        -240
      ],
      "parameters": {
        "width": 440,
        "height": 360,
        "content": "## Set website URL at node 1 (or paste sitemap URL at node 2)"
      },
      "typeVersion": 1
    },
    {
      "id": "fc9d88ce-4ec0-4581-b1ff-7c007bdf5f0b",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1340,
        -200
      ],
      "parameters": {
        "color": 4,
        "width": 300,
        "height": 320,
        "content": "## Download file at here\n(or replace this node = Gooogle sheet node)\n"
      },
      "typeVersion": 1
    },
    {
      "id": "39f13360-e94d-4b33-b258-5e8837daab4f",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -980,
        -460
      ],
      "parameters": {
        "color": 6,
        "width": 600,
        "height": 940,
        "content": "# FAQ\n## Q: What happens if the sitemap is large or contains many sub-sitemaps?\n\nA: The workflow handles sitemap indexes by splitting and processing each sub-sitemap individually. For very large sitemaps, ensure your n8n instance has sufficient resources (memory and CPU) to avoid performance issues. See Scaling n8n for optimization tips.\n\n## Q: Can I use this workflow with a specific sitemap URL instead of a domain?\n\nA: Yes, in the Crawl sitemap node, replace the url parameter ({{ $json.Domain }}sitemap.xml) with the direct sitemap URL (e.g., https://example.com/sitemap.xml). Update the node\u2019s notes for clarity.\n\n## Q: Why am I getting a timeout error?\n\nA: The HTTP Request nodes have a default timeout of 10 seconds. If the target server is slow, increase the timeout value in the options parameter of the Crawl sitemap or Crawl sitemap 2 nodes.\n\n## Q: How can I save the URLs to Google Sheets instead of a file?\n\nA: Replace the Convert to File node with a Google Sheets node. Configure it with your Google Sheets credentials and map the loc field from the Split Out 2 node to the desired spreadsheet column. Refer to the Google Sheets node documentation.\n\n## Q: Is this workflow compatible with older n8n versions?\n\nA: The workflow uses nodes compatible with n8n version 1.0 and later. For older versions, check for deprecated features (e.g., MySQL support) in the n8n v1.0 migration guide."
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "d2ef17d5-a482-4b0d-b48a-83d5bd146b9f",
  "connections": {
    "XML": {
      "main": [
        [
          {
            "node": "Split Out",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "XML 2": {
      "main": [
        [
          {
            "node": "Split Out 2",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set URL": {
      "main": [
        [
          {
            "node": "Crawl sitemap",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Out": {
      "main": [
        [
          {
            "node": "Crawl sitemap 2",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Out 2": {
      "main": [
        [
          {
            "node": "Convert to File",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Crawl sitemap": {
      "main": [
        [
          {
            "node": "XML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Convert to File": {
      "main": [
        []
      ]
    },
    "Crawl sitemap 2": {
      "main": [
        [
          {
            "node": "XML 2",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "When clicking \u2018Test workflow\u2019": {
      "main": [
        [
          {
            "node": "Set URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This n8n workflow automates the process of crawling a website's sitemap to extract URLs, which is particularly useful for SEO analysis, website auditing, or content monitoring. By leveraging n8n's nodes, the workflow fetches the sitemap from a specified URL, processes the XML…

Source: https://n8n.io/workflows/4671/ — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

Site owners, SEOs, and marketers who want a single automation to notify Google (Indexing API) and Bing (via IndexNow) whenever site URLs are added or updated. No more need to update it manually. Hours

HTTP Request, XML
Web Scraping

This workflow automates the collection of public procurement data from TenderNed (the official Dutch tender platform). It: Fetches the latest tender publications from the TenderNed API Retrieves detai

HTTP Request, XML, Data Table
Web Scraping

This workflow fetches reports from Qualys, filters out already processed reports, and creates cases in TheHive for the new reports. It runs every hour to ensure continuous monitoring and up-to-date vu

HTTP Request, n8n, XML +1
Web Scraping

This workflow helps SEO professionals and website owners automate the tedious process of monitoring and indexing URLs. It fetches your XML sitemap, filters for recent content, checks the current index

HTTP Request, XML
Web Scraping

Google page indexing too slow? Tired of manually clicking through each page in the Google Search Console? 😴 Say goodbye to that tedious process and hello to automation with this n8n workflow! 🎉

HTTP Request, XML