AutomationFlowsData & Sheets › Scrape Multi-page Websites Recursively

Scrape Multi-page Websites Recursively

Original n8n title: Scrape Multi-page Websites Recursively with Google Sheets Storage

ByViktor Klepikovskyi @vklepikovskyi on n8n.io

This n8n workflow provides a robust and highly reusable solution for scraping data from paginated websites. Instead of building a complex series of nodes for every new site, you only need to update a simple JSON configuration in the initial Input Node, making your scraping tasks…

Event trigger★★★★☆ complexity17 nodesHTTP RequestGoogle Sheets
Data & Sheets Trigger: Event Nodes: 17 Complexity: ★★★★☆ Added:

This workflow corresponds to n8n.io template #10173 — we link there as the canonical source.

This workflow follows the Google Sheets → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "x8PC9K3CQCTMxKCl",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Configurable Multi-Page Web Scraper Template",
  "tags": [],
  "nodes": [
    {
      "id": "d9c48247-9b7d-4ef7-87b3-2a0109d12e77",
      "name": "Start",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        176,
        240
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "4f24aada-20c6-4ee4-b785-512d35e8e540",
      "name": "Input",
      "type": "n8n-nodes-base.set",
      "position": [
        416,
        240
      ],
      "parameters": {
        "mode": "raw",
        "options": {},
        "jsonOutput": "{\n  \"startUrl\": \"https://quotes.toscrape.com/tag/humor/\",\n  \"nextPageSelector\": \"li.next a[href]\",\n  \"fields\": [\n    {\n      \"name\": \"author\",\n      \"selector\": \"span > small.author\",\n      \"value\": \"text\"\n    },\n    {\n      \"name\": \"text\",\n      \"selector\": \"span.text\",\n      \"value\": \"text\"\n    }\n  ]\n}\n"
      },
      "typeVersion": 3.4
    },
    {
      "id": "84f17c31-7bfb-4cc3-b3a2-9483f239a885",
      "name": "Get Start URL",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        656,
        240
      ],
      "parameters": {
        "url": "={{ $json.startUrl }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "text",
              "outputPropertyName": "content"
            }
          }
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "2f8d0c78-7d85-4a39-b941-2dcc1a36ba9e",
      "name": "Next Page Input",
      "type": "n8n-nodes-base.set",
      "position": [
        1376,
        240
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "e8879b7e-1bda-451f-b83b-68b9d3ed1e2a",
              "name": "startUrl",
              "type": "string",
              "value": "=https://{{ $('Input').item.json.startUrl.extractDomain() }}{{ $json.nextPage }}"
            },
            {
              "id": "d2c403d4-fabb-4961-a202-4690c9f8e990",
              "name": "nextPageSelector",
              "type": "string",
              "value": "={{ $('Input').item.json.nextPageSelector }}"
            },
            {
              "id": "2b2e5ccc-c467-47cb-83b1-f401bb2812f9",
              "name": "fields",
              "type": "array",
              "value": "={{ $('Input').item.json.fields }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "732965f7-fdff-421c-8c41-daeb0ec4ffc0",
      "name": "Split Out Fields",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        656,
        48
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "fields"
      },
      "notesInFlow": false,
      "typeVersion": 1
    },
    {
      "id": "e94991c0-8dec-468b-993a-45426fe737b4",
      "name": "Merge HTML and Fields",
      "type": "n8n-nodes-base.merge",
      "position": [
        896,
        48
      ],
      "parameters": {
        "mode": "combine",
        "options": {},
        "combineBy": "combineAll"
      },
      "typeVersion": 3.2
    },
    {
      "id": "7d4b957e-daa7-4017-9235-d107a5ff112d",
      "name": "Scrape Fields",
      "type": "n8n-nodes-base.html",
      "position": [
        1136,
        48
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "dataPropertyName": "content",
        "extractionValues": {
          "values": [
            {
              "key": "={{ $json.name }}",
              "cssSelector": "={{ $json.selector }}",
              "returnArray": true,
              "returnValue": "={{ $json.value }}"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "804fd9f9-167e-41c2-a023-b95b227d221a",
      "name": "Scrape Next Page Link",
      "type": "n8n-nodes-base.html",
      "position": [
        896,
        240
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "dataPropertyName": "content",
        "extractionValues": {
          "values": [
            {
              "key": "=nextPage",
              "attribute": "href",
              "cssSelector": "={{ $('Input').item.json.nextPageSelector }}",
              "returnValue": "attribute"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "b7ddc82c-c87b-455b-b629-355daecdd9bb",
      "name": "If Next Page Link",
      "type": "n8n-nodes-base.if",
      "position": [
        1136,
        240
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "a1f84a0d-26a8-417c-99a0-329060ca258b",
              "operator": {
                "type": "string",
                "operation": "exists",
                "singleValue": true
              },
              "leftValue": "={{ $json.nextPage }}",
              "rightValue": ""
            },
            {
              "id": "89dd5fa6-0e12-43bc-a7ed-37844e16d627",
              "operator": {
                "type": "string",
                "operation": "notEmpty",
                "singleValue": true
              },
              "leftValue": "={{ $json.nextPage }}",
              "rightValue": ""
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "98c2a9aa-3673-4852-8ee8-c9cea73c9c99",
      "name": "Aggregate Fields",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        1376,
        48
      ],
      "parameters": {
        "options": {},
        "aggregate": "aggregateAllItemData",
        "destinationFieldName": "fields"
      },
      "typeVersion": 1
    },
    {
      "id": "a65c6d07-1d6f-4ca0-be8a-2ca5cfa7044e",
      "name": "Split Out Items",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        1616,
        48
      ],
      "parameters": {
        "options": {
          "destinationFieldName": "={{ $json.fields.map(item => item.keys()[0]).join() }}"
        },
        "fieldToSplitOut": "={{ $json.fields.map((item, index) => 'fields[' + index + '].' + item.keys()[0]).join() }}"
      },
      "typeVersion": 1
    },
    {
      "id": "26ae3f56-cd65-479a-9b51-91f38cf9766b",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -16,
        -336
      ],
      "parameters": {
        "width": 592,
        "height": 528,
        "content": "## Configurable Multi-Page Web Scraper\n### How it Works\nThis workflow is a dynamic, recursive web scraping template. It uses a single JSON object in the Input Node to define the target `startUrl`, the `nextPageSelector` (for pagination), and all data `fields` to extract.\n\nThe flow operates in two parallel branches after the initial HTTP Request:\n1. **Data Branch:** Cross-joins the HTML content with field configurations (Split Out/Merge), extracts the data using the **HTML Node**, and aggregates it.\n2. **Loop Branch**: Extracts the next page link. If a link is found, the **Set Node** updates the original configuration's `startUrl` and sends the flow back to the **HTTP Request Node**, creating a recursive loop that continues until the final page is reached.\n### Setup Steps\n1. **Input Node:** Update the JSON structure with the correct `startUrl`, the `nextPageSelector` (CSS selector for the next page link), and the `fields` array (CSS selectors for the data points you need).\n2. **Execution:** Run the workflow. It will automatically handle multi-page traversal and aggregate the final output.\n\n\nFor a full explanation of the internal logic and the recursive loop structure, view the original blog post: [Flexible Web Scraping with n8n: A Configurable, Multi-Page Template](https://n8nplaybook.com/post/2025/10/flexible-n8n-scraper-template/)"
      },
      "typeVersion": 1
    },
    {
      "id": "4587fe2f-e1f0-4663-a7c9-451aa613c536",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        592,
        -96
      ],
      "parameters": {
        "color": 7,
        "width": 464,
        "height": 96,
        "content": "The **Split Out** node separates the configured data fields (e.g., author, text). The **Merge** node then efficiently combines the fetched HTML content with every single field definition, preparing the data for the extractor."
      },
      "typeVersion": 1
    },
    {
      "id": "60b22f3b-0c34-4fc2-98a7-3edd48d68378",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1072,
        -96
      ],
      "parameters": {
        "color": 7,
        "width": 464,
        "height": 96,
        "content": "The **HTML Node** uses the specific CSS selectors from the configuration to pull the required content. The **Aggregate Node** collects all extracted data items from the current page before the workflow decides whether to proceed to the next page."
      },
      "typeVersion": 1
    },
    {
      "id": "50a652f4-dbbb-42d0-979f-ef1d1bf787f1",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        352,
        464
      ],
      "parameters": {
        "color": 7,
        "width": 464,
        "height": 80,
        "content": "This section defines the entire job via a single JSON config and performs the first action: fetching the HTML content from the current `startUrl` using the **HTTP Request** node."
      },
      "typeVersion": 1
    },
    {
      "id": "61b20e9f-fd02-4e08-acbc-89a334869147",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        832,
        464
      ],
      "parameters": {
        "color": 7,
        "width": 704,
        "height": 80,
        "content": "This branch checks for the next page link. If found (by the **If Node**), the **Set Node** overwrites the `startUrl` with the new link, routing the flow back to the HTTP Request node to start the next iteration."
      },
      "typeVersion": 1
    },
    {
      "id": "c9ba233f-4d6e-4771-9dbc-3af8df4f2594",
      "name": "Store Scraped Data",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        1856,
        48
      ],
      "parameters": {
        "columns": {
          "value": {},
          "schema": [],
          "mappingMode": "autoMapInputData",
          "matchingColumns": [],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {
          "useAppend": true
        },
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1_qgp7BRRHAoEMHjEo5tZ2oddpUVoh5aaGpA5otmT6aQ/edit#gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1_qgp7BRRHAoEMHjEo5tZ2oddpUVoh5aaGpA5otmT6aQ",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1_qgp7BRRHAoEMHjEo5tZ2oddpUVoh5aaGpA5otmT6aQ/edit?usp=drivesdk",
          "cachedResultName": "Web Scraper Results"
        },
        "authentication": "serviceAccount"
      },
      "credentials": {
        "googleApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "07e3c6e2-662f-45e5-aa8c-713d5e5790b6",
  "connections": {
    "Input": {
      "main": [
        [
          {
            "node": "Get Start URL",
            "type": "main",
            "index": 0
          },
          {
            "node": "Split Out Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Start": {
      "main": [
        [
          {
            "node": "Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get Start URL": {
      "main": [
        [
          {
            "node": "Merge HTML and Fields",
            "type": "main",
            "index": 1
          },
          {
            "node": "Scrape Next Page Link",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Scrape Fields": {
      "main": [
        [
          {
            "node": "Aggregate Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Next Page Input": {
      "main": [
        [
          {
            "node": "Get Start URL",
            "type": "main",
            "index": 0
          },
          {
            "node": "Split Out Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Out Items": {
      "main": [
        [
          {
            "node": "Store Scraped Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate Fields": {
      "main": [
        [
          {
            "node": "Split Out Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Out Fields": {
      "main": [
        [
          {
            "node": "Merge HTML and Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "If Next Page Link": {
      "main": [
        [
          {
            "node": "Next Page Input",
            "type": "main",
            "index": 0
          }
        ],
        []
      ]
    },
    "Merge HTML and Fields": {
      "main": [
        [
          {
            "node": "Scrape Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Scrape Next Page Link": {
      "main": [
        [
          {
            "node": "If Next Page Link",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This n8n workflow provides a robust and highly reusable solution for scraping data from paginated websites. Instead of building a complex series of nodes for every new site, you only need to update a simple JSON configuration in the initial Input Node, making your scraping tasks…

Source: https://n8n.io/workflows/10173/ — original creator credit. Request a take-down →

More Data & Sheets workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Data & Sheets

This template is ideal for solo store owners, eCommerce marketers, automation beginners, or anyone using Shopify and Gmail who wants to recover lost revenue without coding.

HTTP Request, Gmail, Twilio +3
Data & Sheets

PCN. Uses googleSheets, httpRequest, @n-octo-n/n8n-nodes-json-database, itemLists. Event-driven trigger; 60 nodes.

Google Sheets, HTTP Request, @N Octo N/N8N Nodes Json Database +3
Data & Sheets

The workflow automates the process of gathering extensive keyword data for a "Main Keyword." It starts by reading initial parameters from a Google Sheets template, creates a new dedicated Google Sheet

Google Sheets, Google Drive, HTTP Request
Data & Sheets

🔥 March Sale – n8n Community Members Get ideoGener8r for Just $27! (Reg. $47) Use Coupon Code: (Valid until 3/31/2025 for n8n community members)

HTTP Request, Google Drive, Google Sheets
Data & Sheets

📄 Documentation: Notion Guide

Google Sheets, Google Drive, HTTP Request +2