This workflow corresponds to n8n.io template #7758 — we link there as the canonical source.

This workflow follows the Form Trigger → Google Sheets recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json

{
  "id": "ytgCDUiHYhFkJqlY",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Sitemap Page Extractor",
  "tags": [],
  "nodes": [
    {
      "id": "2464b9f1-f0fe-41df-9941-acad5d5dedb9",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -256,
        928
      ],
      "parameters": {
        "color": 2,
        "height": 336,
        "content": "## Build sitemap urls:\nCreates possible sitemap URLs like /sitemap.xml, /sitemap_index.xml, etc., for the domain."
      },
      "typeVersion": 1
    },
    {
      "id": "1b1b00a9-6825-4bae-b74a-6c86d8972299",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        304,
        560
      ],
      "parameters": {
        "color": 2,
        "width": 400,
        "height": 432,
        "content": "## Filter & Extract Sitemap Files\n\nFilter Non-Empty Sitemap Responses:\nFilters out sitemap URLs that are unreachable or returned empty responses to ensure only valid sitemaps are processed.\n\nExtract Sitemap URLs:\nParses sitemap index XML to extract all nested sitemap links (e.g., index-sitemap.xml, page-sitemap.xml) for further crawling."
      },
      "typeVersion": 1
    },
    {
      "id": "e0c1c670-5e7f-4146-a605-e5972d786cf8",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        720,
        560
      ],
      "parameters": {
        "color": 2,
        "width": 432,
        "height": 432,
        "content": "## Fetching Sitemap XML and Extracting Page URLs\n\n\nFetch Sitemap Pages XML:\nFinds HTTP request of each sitemap URL and retrieves the raw XML content for page extraction.\n\nExtract Page URLs from Sitemap:\nParses the XML data to extract individual page URLs listed inside the sitemap for further analysis or processing.\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "c95c2d0b-2a6b-471a-b168-1f33d4dd6cb4",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1488,
        688
      ],
      "parameters": {
        "color": 2,
        "width": 208,
        "height": 304,
        "content": "Append each crawled page URL into the List_Of_All_URLs sheet, avoiding duplicates by matching existing entries automatically."
      },
      "typeVersion": 1
    },
    {
      "id": "1b2c2bde-4edc-4dee-97c1-28f1b1ee8b94",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        288,
        1152
      ],
      "parameters": {
        "color": 2,
        "width": 256,
        "height": 304,
        "content": "Sends an HTTP request to each generated sitemap URL and fetches the raw response to check if the sitemap exists and contains valid data."
      },
      "typeVersion": 1
    },
    {
      "id": "c9f75b84-334a-4926-bdc4-62e59e605b97",
      "name": "Sticky Note6",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1184,
        640
      ],
      "parameters": {
        "color": 2,
        "width": 272,
        "height": 352,
        "content": "## Exclude Sitemap URLs:\n1. This filter checks if the page_url contains the word \"sitemap\".\n2. If it does, the URL is excluded; otherwise, it passes forward."
      },
      "typeVersion": 1
    },
    {
      "id": "5ae6ec48-76f1-44ca-8387-ce5f9a84ac9d",
      "name": "Sticky Note7",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1248,
        784
      ],
      "parameters": {
        "width": 464,
        "height": 640,
        "content": "## \ud83d\udccc Automation Summary:\n\nThis automation is designed to simplify and automate the collection of website page URLs from sitemap URLs\n\nIt begins when a user submits the Website URL through a form. The submitted URL is then prepared and standardized to ensure the domain is ready for sitemap discovery. From there, the automation automatically builds possible sitemap URLs (like /sitemap.xml or /sitemap_index.xml) and checks which of them are valid and accessible.\n\nOnce valid sitemaps are found, the automation fetches their XML content and extracts all page URLs listed inside. To keep the data clean, it applies a filter to remove unwanted entries \u2014 specifically, any URL that contains the word \u201csitemap\u201d is excluded. This ensures that only actual content pages (such as product, service, or blog pages) are retained.\n\nIn the final step, these refined URLs are automatically saved into Google Sheets, where they can be used for SEO audits, content analysis, redirects, or other reporting needs.\n\n\u26a1 In essence:\nUser submits website URL \u2192 Sitemap(s) discovered \u2192 Page URLs extracted \u2192 Unwanted sitemap links removed \u2192 Final clean URLs stored in Google Sheets.\n\n\ud83d\udce9 For any questions or support, please contact:\n\u2709\ufe0f info@incrementors.com\nor fill out this form: https://www.incrementors.com/contact-us/"
      },
      "typeVersion": 1
    },
    {
      "id": "342fb312-ba7a-427e-907e-b57a206b6252",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -672,
        928
      ],
      "parameters": {
        "color": 2,
        "width": 400,
        "height": 336,
        "content": "## Form Input & URL Preparation\n\ud83d\udce5 Workflow starts when the user submits a Website URL through the form.\n\ud83c\udf10 The submitted URL is then stored and formatted for further processing."
      },
      "typeVersion": 1
    },
    {
      "id": "e69d955f-2212-4e5b-afa8-998cd18e11f1",
      "name": "Input Website URL",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        -640,
        1088
      ],
      "parameters": {
        "options": {},
        "formTitle": "Sitemap Page Extractor",
        "formFields": {
          "values": [
            {
              "fieldLabel": "Website URL"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "4d33e736-fb1d-462f-bad2-46a68bc1b837",
      "name": "Prepare website URL",
      "type": "n8n-nodes-base.set",
      "position": [
        -432,
        1088
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "2a310b45-ec77-41dd-9436-f3b58b7df477",
              "name": "url",
              "type": "string",
              "value": "={{ $json[\"Website URL\"] }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "604a9837-436b-4701-9d17-cb59cb2a2099",
      "name": "Build sitemap URLs",
      "type": "n8n-nodes-base.code",
      "position": [
        -192,
        1088
      ],
      "parameters": {
        "jsCode": "const inputData = $input.first().json;\nlet baseUrl = inputData.url || inputData.website_url || '';\n\nif (!baseUrl) {\n  throw new Error(\"No URL provided\");\n}\n\nbaseUrl = baseUrl.replace(/\\/$/, '');\n\nlet domain = '';\nif (baseUrl.includes('://')) {\n  const parts = baseUrl.split('/');\n  domain = parts[0] + '//' + parts[2];\n} else {\n  domain = 'https://' + baseUrl;\n}\n\nconst urls = [\n  `${domain}/robots.txt`,\n  `${domain}/sitemap.xml`,\n  `${domain}/sitemap_index.xml`,\n  `${domain}/sitemap-index.xml`,\n  `${domain}/sitemap1.xml`,\n  `${domain}/sitemap/sitemap.xml`,\n  `${domain}/sitemaps/sitemap.xml`\n];\n\nreturn urls.map(url => ({ sitemap_url: url }));"
      },
      "typeVersion": 2
    },
    {
      "id": "ad781031-ef3e-4378-92d0-837d3e8fbaf2",
      "name": "Sitemap URL Check",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        80,
        1088
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 3
    },
    {
      "id": "8ca63a55-44bc-4e72-906b-a6e2ed5f9c31",
      "name": "Fetch Sitemap Data",
      "type": "n8n-nodes-base.httpRequest",
      "onError": "continueRegularOutput",
      "position": [
        368,
        1280
      ],
      "parameters": {
        "url": "={{ $json.sitemap_url }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "text"
            }
          }
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "02698f76-9606-426d-b953-3f2adfd80a6f",
      "name": "Filter Non-Empty Sitemap Responses",
      "type": "n8n-nodes-base.if",
      "position": [
        336,
        832
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "bf0c94e8-a8c6-4419-9b18-cf5d1a01e577",
              "operator": {
                "type": "string",
                "operation": "notEmpty",
                "singleValue": true
              },
              "leftValue": "={{ $json.data }}",
              "rightValue": ""
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "2554e396-8d0b-44c5-ab1e-91a192751acd",
      "name": "Extract Sitemap URLs",
      "type": "n8n-nodes-base.code",
      "position": [
        560,
        816
      ],
      "parameters": {
        "jsCode": "const allUrls = [];\n\n$input.all().forEach(item => {\n  const content = item.json.data || '';\n\n  const robotMatches = [...content.matchAll(/Sitemap:\\s*(\\S+)/gi)];\n  robotMatches.forEach(match => {\n    allUrls.push(match[1]);\n  });\n\n  const locMatches = [...content.matchAll(/<loc>\\s*(.*?)\\s*<\\/loc>/gi)];\n  locMatches.forEach(match => {\n    allUrls.push(match[1]);\n  });\n});\n\nconst uniqueUrls = [...new Set(allUrls)];\n\nreturn uniqueUrls.map(url => ({ json: { sitemap_url: url }}));"
      },
      "typeVersion": 2
    },
    {
      "id": "95f24edc-bef3-481d-954c-f4245af73faa",
      "name": "Fetch Sitemap Pages XML",
      "type": "n8n-nodes-base.httpRequest",
      "onError": "continueRegularOutput",
      "position": [
        784,
        816
      ],
      "parameters": {
        "url": "={{ $json.sitemap_url }}",
        "method": "=GET",
        "options": {
          "response": {
            "response": {
              "responseFormat": "text"
            }
          }
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "134f2ded-0200-44a7-8df1-343d76cf16d0",
      "name": "Extract Page URLs from Sitemap",
      "type": "n8n-nodes-base.code",
      "position": [
        1008,
        816
      ],
      "parameters": {
        "jsCode": "const urls = new Set();\n\n$input.all().forEach(item => {\n  const content = item.json.data || '';\n\n  const xmlMatches = [...content.matchAll(/<loc>(.*?)<\\/loc>/gi)];\n  xmlMatches.forEach(match => {\n    const url = match[1].trim();\n    urls.add(url);\n  });\n\n  const htmlMatches = [...content.matchAll(/<a\\s[^>]*href=[\"']([^\"']+)[\"']/gi)];\n  htmlMatches.forEach(match => {\n    const url = match[1].trim();\n    if (url.startsWith('/') || url.startsWith('http')) {\n      urls.add(url);\n    }\n  });\n});\n\nif (urls.size === 0) {\n  return [{ json: { message: \"No URLs found in XML or HTML content\" }}];\n}\n\nreturn Array.from(urls).map(url => ({\n  json: { page_url: url }\n}));"
      },
      "typeVersion": 2
    },
    {
      "id": "9708b71a-34c3-42fd-9a3c-15ad7d4cd7c3",
      "name": "Exclude the Sitemap URLs",
      "type": "n8n-nodes-base.filter",
      "position": [
        1296,
        816
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "303d0665-ac2d-452b-8678-a05e53a7372b",
              "operator": {
                "type": "string",
                "operation": "notContains"
              },
              "leftValue": "={{ $json.page_url }}",
              "rightValue": "sitemap"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "535579fc-dcad-4386-920c-6f8857a0bd70",
      "name": "Save Page URLs to Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        1568,
        816
      ],
      "parameters": {
        "columns": {
          "value": {},
          "schema": [],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "List URLs"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "appendOrUpdate",
        "sheetName": "List_Of_All_URLs",
        "documentId": "YOUR_GOOGLE_SHEET_URL"
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.6
    }
  ],
  "active": false,
  "settings": {
    "callerPolicy": "any",
    "executionOrder": "v1"
  },
  "versionId": "ff2c3009-f913-4cd2-9042-a36d056b6f90",
  "connections": {
    "Input Website URL": {
      "main": [
        [
          {
            "node": "Prepare website URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Sitemap URL Check": {
      "main": [
        [
          {
            "node": "Filter Non-Empty Sitemap Responses",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Fetch Sitemap Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Build sitemap URLs": {
      "main": [
        [
          {
            "node": "Sitemap URL Check",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Sitemap Data": {
      "main": [
        [
          {
            "node": "Sitemap URL Check",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Prepare website URL": {
      "main": [
        [
          {
            "node": "Build sitemap URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Sitemap URLs": {
      "main": [
        [
          {
            "node": "Fetch Sitemap Pages XML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Sitemap Pages XML": {
      "main": [
        [
          {
            "node": "Extract Page URLs from Sitemap",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Exclude the Sitemap URLs": {
      "main": [
        [
          {
            "node": "Save Page URLs to Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Page URLs from Sitemap": {
      "main": [
        [
          {
            "node": "Exclude the Sitemap URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter Non-Empty Sitemap Responses": {
      "main": [
        [
          {
            "node": "Extract Sitemap URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

googleSheetsOAuth2Api

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Automatically extracts all page URLs from website sitemaps, filters out unwanted sitemap links, and saves clean URLs to Google Sheets for SEO analysis and reporting.

Source: https://n8n.io/workflows/7758/ — original creator credit. Request a take-down →

More Data & Sheets workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Data & Sheets

Fetch Latest Rss Articles and Store Non-duplicates in Google Sheets

This n8n workflow fetches URLs from an RSS feed, checks which URLs have a valid RSS feed and if true, fetches the latest articles from those URLs. It then stores the article details, including the art

RSS Feed Read, Stop And Error, Google Sheets +1

Data & Sheets

Generate and Validate Bulk Qr Codes with Google Sheets and Google Drive

This workflow allows you to generate QR codes (Barcodes) in bulk from a Google Sheets file and store the generated QR images automatically in Google Drive. Each QR code contains a unique identifier (i

Google Drive, HTTP Request, Google Sheets

Data & Sheets

Automate Interview Scheduling and Data Cleanup with Cal.com and Google Sheets

Workflow Description Automate your candidate interview pipeline with precision. This powerful integration pulls booking data from Cal.com, extracts interview details (name, email, date & time), and sy

HTTP Request, Google Sheets

Data & Sheets

Validate & Enrich Phone Numbers in Google Sheets with Rapidapi

Validate and enrich phone numbers from Google Sheets using the [](https://rapidapi.com/skdeveloper/api/phone-number-validator11) API.

Google Sheets, HTTP Request

Data & Sheets

Automate SEO Keyword & Serp Analysis with Dataforseo and Google Sheets

Overview 🌐

Form Trigger, HTTP Request, Google Sheets

Sitemap Page Extractor: Discover, Clean, and Save Website Urls to Google Sheets

The workflow JSON

Credentials you'll need

About this workflow

Related workflows