AutomationFlowsData & Sheets › Read Sitemap and Filter URLs

Read Sitemap and Filter URLs

Read sitemap and filter URLs. Uses manualTrigger, splitOut, httpRequest, xml. Event-driven trigger; 10 nodes.

Event trigger★★★★☆ complexity10 nodesHTTP RequestXML
Data & Sheets Trigger: Event Nodes: 10 Complexity: ★★★★☆ Added:

This workflow follows the HTTP Request → XML recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "7fdJOvYNILCr24fH",
  "name": "Read sitemap and filter URLs",
  "tags": [],
  "nodes": [
    {
      "id": "38910330-5286-4f3f-b62e-9216acccd503",
      "name": "\u2018Test workflow\u2019 trigger",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -460,
        -60
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "d4e5991b-62d9-45ca-962f-c1077f3bce19",
      "name": "Set sitemap URL",
      "type": "n8n-nodes-base.set",
      "position": [
        -280,
        -60
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "d6c5ac86-6d67-42fb-96ec-9826caf452e2",
              "name": "sitemapUrl",
              "type": "string",
              "value": "https://duckduckgo.com/sitemap.xml"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "0d957deb-5830-4077-97e4-437dc7c0e527",
      "name": "Split Out",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        260,
        -60
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "urlset.url"
      },
      "typeVersion": 1
    },
    {
      "id": "7021088c-dfa1-4aae-b2e7-15b0ca10a750",
      "name": "Get Sitemap",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -100,
        -60
      ],
      "parameters": {
        "url": "={{ $json.sitemapUrl }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "d3b86577-01fc-40f8-ab65-93ba420187b8",
      "name": "Convert Sitemap to JSON",
      "type": "n8n-nodes-base.xml",
      "position": [
        80,
        -60
      ],
      "parameters": {
        "options": {
          "trim": true,
          "normalize": true,
          "mergeAttrs": true,
          "ignoreAttrs": true,
          "normalizeTags": true
        }
      },
      "typeVersion": 1
    },
    {
      "id": "bc0758ae-06eb-4a29-a91e-414407ec8ade",
      "name": "Filter URLs",
      "type": "n8n-nodes-base.filter",
      "position": [
        440,
        -60
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "0bf8e98c-b6c5-4129-852c-0d3e63f32f9f",
              "operator": {
                "type": "string",
                "operation": "endsWith"
              },
              "leftValue": "={{ $json.loc }}",
              "rightValue": ".pdf"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "1d3fed97-1e72-426c-a48d-1a9683f40c4c",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -300,
        -140
      ],
      "parameters": {
        "color": 6,
        "width": 150,
        "height": 240,
        "content": "**Set your sitemap.xml\nurl here.**"
      },
      "typeVersion": 1
    },
    {
      "id": "521ec74d-6707-47fd-992d-eecebed415ab",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        420,
        -140
      ],
      "parameters": {
        "color": 6,
        "width": 150,
        "height": 240,
        "content": "**Create your filter here.**"
      },
      "typeVersion": 1
    },
    {
      "id": "07e6c3de-cc72-490d-b614-67034ce04bfb",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -140,
        -180
      ],
      "parameters": {
        "color": 7,
        "width": 540,
        "height": 300,
        "content": "## Fetch and process the sitemap.xml file\nThis part fetches and process the sitemap.xml file from XML data to JSON that we can work with."
      },
      "typeVersion": 1
    },
    {
      "id": "abf5f02d-d2a0-43f1-9a1f-386cc4f9861b",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -780,
        -220
      ],
      "parameters": {
        "width": 280,
        "height": 420,
        "content": "## Sitemap.xml reader\nThis workflow reads an sitemap.xml and filters out the entries you want.\n\nBy default only PDF documents are returned at the end of the workflow.\n\n**SETUP**\n- Edit the **Set sitemap URL** block and add the url to the sitemap you want to read.\n\n- Edit the **Filter URLs** to your needs."
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "74793599-4c7d-4532-bbd5-a2ce4761fbc8",
  "connections": {
    "Split Out": {
      "main": [
        [
          {
            "node": "Filter URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get Sitemap": {
      "main": [
        [
          {
            "node": "Convert Sitemap to JSON",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set sitemap URL": {
      "main": [
        [
          {
            "node": "Get Sitemap",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Convert Sitemap to JSON": {
      "main": [
        [
          {
            "node": "Split Out",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\u2018Test workflow\u2019 trigger": {
      "main": [
        [
          {
            "node": "Set sitemap URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

How this works

This workflow streamlines the process of extracting and refining URLs from a website's sitemap, saving you hours of manual scraping or coding from scratch. It's ideal for SEO specialists, content auditors, or developers who need to analyse site structures without complex tools. The key step involves fetching the sitemap via HTTP Request, converting it to JSON using the XML node, and then applying filters to isolate relevant URLs based on patterns like file types or paths.

Use this workflow when auditing large sites for broken links, preparing crawl lists, or monitoring content changes, especially for event-driven tasks like post-deployment checks. Avoid it for dynamic sites without static sitemaps or when real-time processing exceeds the 10-node limit. Common variations include adding email notifications for filtered results or integrating with Google Sheets to log URLs for team review.

About this workflow

Read sitemap and filter URLs. Uses manualTrigger, splitOut, httpRequest, xml. Event-driven trigger; 10 nodes.

Source: https://github.com/Zie619/n8n-workflows — original creator credit. Request a take-down →

More Data & Sheets workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Data & Sheets

Google Site Index - sitemap.xml example. Uses manualTrigger, splitOut, httpRequest, splitInBatches. Event-driven trigger; 21 nodes.

HTTP Request, XML
Data & Sheets

Splitout Filter. Uses manualTrigger, outputParserStructured, lmChatOpenRouter, informationExtractor. Event-driven trigger; 51 nodes.

Output Parser Structured, OpenRouter Chat, Information Extractor +7
Data & Sheets

Workflow Importer. Uses extractFromFile, executeCommand, readWriteFile, httpRequest. Event-driven trigger; 58 nodes.

Execute Command, Read Write File, HTTP Request +3
Data & Sheets

Retrieves workflows directly from an n8n instance using the n8n API Dynamically generates a form to select which workflows to import Supports both fixed instance configuration and dynamic source/targe

Form Trigger, Form, Notion +2
Data & Sheets

[2/3] Set up medoids (2 types) for anomaly detection (crops dataset). Uses manualTrigger, httpRequest, splitOut, stickyNote. Event-driven trigger; 48 nodes.

HTTP Request