This workflow corresponds to n8n.io template #9594 — we link there as the canonical source.

This workflow follows the Google Sheets → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json

{
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "349e50cf-75b8-432c-818e-63f1ff3ead34",
      "name": "Overview Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1696,
        3104
      ],
      "parameters": {
        "color": 4,
        "width": 600,
        "height": 1112,
        "content": "# Automated Website Crawler for AI Knowledge Bases\n\n## \ud83d\udccb What This Template Does\nThis workflow crawls a website's homepage to extract all sublinks, filters images from content pages, scrapes and converts textual content to Markdown, then aggregates everything into Google Sheets\u2014ideal for building AI-ready knowledge bases or company dossiers.\n\n## \ud83d\udd27 Prerequisites\n- Google account with Sheets access\n- n8n instance\n\n## \ud83d\udd11 Required Credentials\n\n### Google Sheets OAuth2 API Setup\n1. Go to console.cloud.google.com \u2192 APIs & Services \u2192 Credentials\n2. Create OAuth client ID for Web application\n3. Add n8n redirect URI: https://your-n8n-instance.com/rest/oauth2-credential/callback\n4. Add to n8n as Google Sheets OAuth2 API and grant Sheets scopes\n\n## \u2699\ufe0f Configuration Steps\n1. Import JSON into n8n\n2. Set target URL in Set Website node\n3. Assign Google credential to Sheet nodes\n4. Update documentId and sheetName to your spreadsheet\n5. Ensure sheet has columns: Website, Links, Scraped Content, Images\n6. Test manually\n\n## \ud83c\udfaf Use Cases\n- Crawl company sites for knowledge base building\n- Extract content for AI agent training datasets\n- Gather competitor intel for market analysis\n- Archive dynamic sites for compliance\n\n## \u26a0\ufe0f Troubleshooting\n- No links: Check homepage <a> tags and test URL\n- Sheet errors: Verify columns and permissions\n- Truncated content: Adjust slice limit or split rows\n- Rate limits: Add Wait node after scraping"
      },
      "typeVersion": 1
    },
    {
      "id": "eb43d67c-01fc-4d83-bb2c-099938a57468",
      "name": "Note: Trigger and Setup",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2512,
        3072
      ],
      "parameters": {
        "color": 6,
        "width": 556,
        "height": 176,
        "content": "## \ud83d\uddb1\ufe0f Trigger & Setup Nodes\n\n**Purpose:** Manual Trigger starts the workflow; Set Website configures the target URL.\n\n**Note:** Update website_url in Set Website for your site; use Schedule Trigger for automation."
      },
      "typeVersion": 1
    },
    {
      "id": "3c8581cb-46cd-4f25-af5a-c52bc2f463c6",
      "name": "Set Website",
      "type": "n8n-nodes-base.set",
      "position": [
        2688,
        3296
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "a652f57e-210e-421e-b20b-781d6f4dc240",
              "name": "website_url",
              "type": "string",
              "value": "https://example.com"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "18201858-7764-4a14-9f6b-12e36eaf158b",
      "name": "Manual Trigger",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        2496,
        3296
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "b7435481-bed3-439f-933c-1c5e0142ad5c",
      "name": "Scrape Homepage",
      "type": "n8n-nodes-base.httpRequest",
      "onError": "continueRegularOutput",
      "position": [
        2880,
        3296
      ],
      "parameters": {
        "url": "={{ $json.website_url }}",
        "options": {
          "redirect": {
            "redirect": {}
          },
          "allowUnauthorizedCerts": false
        }
      },
      "executeOnce": false,
      "typeVersion": 4.2,
      "alwaysOutputData": false
    },
    {
      "id": "ce13710d-24ca-47d4-a25c-8890c1592947",
      "name": "Note: Homepage Scraping",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3168,
        3488
      ],
      "parameters": {
        "color": 5,
        "width": 396,
        "height": 192,
        "content": "## \ud83c\udf10 Homepage Scraping Nodes\n\n**Purpose:** Scrape Homepage fetches HTML; Extract Links pulls hrefs from <a> tags; Split Links breaks array into items.\n\n**Note:** Handles redirects; targets all links for discovery."
      },
      "typeVersion": 1
    },
    {
      "id": "61a60f2c-f032-4b46-83ba-405df0ce05df",
      "name": "Extract Links from HTML",
      "type": "n8n-nodes-base.html",
      "position": [
        3088,
        3296
      ],
      "parameters": {
        "options": {
          "trimValues": true,
          "cleanUpText": true
        },
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "links",
              "attribute": "href",
              "cssSelector": "a",
              "returnArray": true,
              "returnValue": "attribute"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "582eeae0-fec0-4548-9c78-7c05ac5aaebc",
      "name": "Split Links",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        3296,
        3296
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "links"
      },
      "typeVersion": 1
    },
    {
      "id": "17d59531-4d51-4494-8ae9-e91b81851a0b",
      "name": "Remove Duplicate Links",
      "type": "n8n-nodes-base.removeDuplicates",
      "position": [
        3520,
        3296
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 2
    },
    {
      "id": "d50fa2a9-1a58-4dad-8bd0-cfbd31aeae91",
      "name": "Filter Real Hyperlinks",
      "type": "n8n-nodes-base.filter",
      "position": [
        3696,
        3296
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "bd6c6da6-8af7-4809-b6cd-01a38d71953b",
              "operator": {
                "type": "string",
                "operation": "startsWith"
              },
              "leftValue": "={{ $json.links }}",
              "rightValue": "https://"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "cb121b70-a14a-4cbd-a54c-e55c6fc235b7",
      "name": "Note: Link Processing",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3216,
        3056
      ],
      "parameters": {
        "color": 2,
        "width": 556,
        "height": 224,
        "content": "## \ud83d\udd04 Link Processing Nodes\n\n**Purpose:** Remove Duplicate Links cleans list; Filter Real Hyperlinks keeps HTTPS; Separate Images and Links routes via regex.\n\n**Note:** Switch output 0: Images, 1: Content links; adjust regex for custom extensions."
      },
      "typeVersion": 1
    },
    {
      "id": "d69c0dc2-2c4c-474b-ba11-3d79e1390b12",
      "name": "Separate Images and Links",
      "type": "n8n-nodes-base.switch",
      "position": [
        2480,
        3680
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "outputKey": "Images",
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "16724958-4eea-489d-b494-3d76a3ba2562",
                    "operator": {
                      "type": "string",
                      "operation": "regex"
                    },
                    "leftValue": "={{ $json.links }}",
                    "rightValue": "=^https?:\\/\\/.*\\.(?:png|jpe?g|gif|webp|bmp|svg|ico)(?:\\?.*)?$"
                  }
                ]
              },
              "renameOutput": true
            },
            {
              "outputKey": "Links",
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "816392f0-96db-4134-8bee-4b74688ff929",
                    "operator": {
                      "type": "string",
                      "operation": "notRegex"
                    },
                    "leftValue": "={{ $json.links }}",
                    "rightValue": "=^https?:\\/\\/.*\\.(?:png|jpe?g|gif|webp|bmp|svg|ico)(?:\\?.*)?$"
                  }
                ]
              },
              "renameOutput": true
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 3.2
    },
    {
      "id": "23896343-575e-4956-8e95-3b5e6e4c8ae7",
      "name": "Aggregate Images",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        2736,
        3504
      ],
      "parameters": {
        "options": {},
        "fieldsToAggregate": {
          "fieldToAggregate": [
            {
              "fieldToAggregate": "links"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "fcad347b-60d7-4fa2-9b02-e96c2f27116d",
      "name": "Aggregate Links",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        2736,
        3696
      ],
      "parameters": {
        "options": {},
        "fieldsToAggregate": {
          "fieldToAggregate": [
            {
              "fieldToAggregate": "links"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "fc5d6ce1-1765-4768-a9c7-de3677e8109d",
      "name": "Scrape Content Links",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        2736,
        3872
      ],
      "parameters": {
        "url": "={{ $json.links }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "0d4b6a4e-b6cb-4e6c-9a22-bd0dc6a72027",
      "name": "Note: Content Scraping",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2320,
        3984
      ],
      "parameters": {
        "color": 5,
        "width": 428,
        "height": 224,
        "content": "## \ud83d\udcc4 Content Scraping & Aggregation Nodes\n\n**Purpose:** Scrape Content Links fetches pages; Convert to Markdown formats HTML; Aggregate Images/Links/Content combines outputs.\n\n**Note:** Markdown preserves structure for AI; slice content if exceeding sheet limits."
      },
      "typeVersion": 1
    },
    {
      "id": "349e5f7c-c81b-467b-a59b-ea40a47226f0",
      "name": "Convert to Markdown",
      "type": "n8n-nodes-base.markdown",
      "position": [
        2944,
        3872
      ],
      "parameters": {
        "html": "={{ $json.data }}",
        "options": {}
      },
      "typeVersion": 1
    },
    {
      "id": "24f22a31-03a3-4faf-81f4-3c38c0956ee4",
      "name": "Aggregate Scraped Content",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        3136,
        3872
      ],
      "parameters": {
        "options": {},
        "fieldsToAggregate": {
          "fieldToAggregate": [
            {
              "fieldToAggregate": "data"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "a4d34aab-1af2-4196-85f5-1a2d832969dd",
      "name": "Add Images to Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        2944,
        3504
      ],
      "parameters": {
        "columns": {
          "value": {
            "Images": "={{ $json.links.join('\\n\\n') }}",
            "Website": "={{ $('Set Website').item.json.website_url }}"
          },
          "schema": [
            {
              "id": "Website",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Website",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Links",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "Links",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Scraped Content",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "Scraped Content",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Images",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Images",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "Website"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "appendOrUpdate",
        "sheetName": "your-sheet-name",
        "documentId": "your-document-id"
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "6afbfad8-b80f-4a0d-81b4-9138cc2af46a",
      "name": "Add Links to Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        2944,
        3696
      ],
      "parameters": {
        "columns": {
          "value": {
            "Links": "={{ $json.links.join('\\n\\n') }}",
            "Website": "={{ $('Set Website').item.json.website_url }}"
          },
          "schema": [
            {
              "id": "Website",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Website",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Links",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Links",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Scraped Content",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "Scraped Content",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Images",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "Images",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "Website"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "appendOrUpdate",
        "sheetName": "your-sheet-name",
        "documentId": "your-document-id"
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "35ae2c30-a93a-4fd2-82b6-07d2f4c56c88",
      "name": "Add Scraped Content to Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3344,
        3872
      ],
      "parameters": {
        "columns": {
          "value": {
            "Website": "={{ $('Set Website').item.json.website_url }}",
            "Scraped Content": "={{ $json.data.join('\\n\\n').slice(0, 50000) }}"
          },
          "schema": [
            {
              "id": "Website",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Website",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Links",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "Links",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Scraped Content",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Scraped Content",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Images",
              "type": "string",
              "display": true,
              "removed": true,
              "required": false,
              "displayName": "Images",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "Website"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "appendOrUpdate",
        "sheetName": "your-sheet-name",
        "documentId": "your-document-id"
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "c3f7b022-db11-400c-baaa-77392acfb991",
      "name": "Note: Sheet Integration",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3232,
        4048
      ],
      "parameters": {
        "color": 3,
        "width": 444,
        "height": 176,
        "content": "## \ud83d\udcca Sheet Integration Nodes\n\n**Purpose:** Add Images/Links/Scraped Content to Sheet appends aggregated data to Google Sheets.\n\n**Note:** Matches on 'Website' column; update documentId/sheetName for your sheet."
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Set Website": {
      "main": [
        [
          {
            "node": "Scrape Homepage",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Links": {
      "main": [
        [
          {
            "node": "Remove Duplicate Links",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Manual Trigger": {
      "main": [
        [
          {
            "node": "Set Website",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate Links": {
      "main": [
        [
          {
            "node": "Add Links to Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Scrape Homepage": {
      "main": [
        [
          {
            "node": "Extract Links from HTML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate Images": {
      "main": [
        [
          {
            "node": "Add Images to Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Convert to Markdown": {
      "main": [
        [
          {
            "node": "Aggregate Scraped Content",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Scrape Content Links": {
      "main": [
        [
          {
            "node": "Convert to Markdown",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter Real Hyperlinks": {
      "main": [
        [
          {
            "node": "Separate Images and Links",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Remove Duplicate Links": {
      "main": [
        [
          {
            "node": "Filter Real Hyperlinks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Links from HTML": {
      "main": [
        [
          {
            "node": "Split Links",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Aggregate Scraped Content": {
      "main": [
        [
          {
            "node": "Add Scraped Content to Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Separate Images and Links": {
      "main": [
        [
          {
            "node": "Aggregate Images",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Aggregate Links",
            "type": "main",
            "index": 0
          },
          {
            "node": "Scrape Content Links",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

googleSheetsOAuth2Api

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Transform any website into a structured knowledge repository with this intelligent crawler that extracts hyperlinks from the homepage, intelligently filters images and content pages, and aggregates full Markdown-formatted content—perfect for fueling AI agents or building…

Source: https://n8n.io/workflows/9594/ — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

Firecrawl URL List Mini-batch to Resilient Analyzer

04 - Firecrawl URL List Mini-Batch to Resilient Analyzer. Uses googleSheets, httpRequest, executeWorkflowTrigger. Event-driven trigger; 42 nodes.

Google Sheets, HTTP Request, Execute Workflow Trigger

Web Scraping

Extract & Enrich Linkedin Comments to Leads with Apify → Google Sheets/csv

Automate LinkedIn lead generation by scraping comments from targeted posts and enriching profiles with detailed data

Form Trigger, HTTP Request, Google Sheets

Web Scraping

Scrape Upwork Job Listings & Generate Daily Email Reports with Apify & Google Sheets

This automated n8n workflow scrapes job listings from Upwork using Apify, processes and cleans the data, and generates daily email reports with job summaries. The system uses Google Sheets for data st

Google Sheets, HTTP Request, Gmail

Web Scraping

Enrich Lead Profiles From Linkedin Urls with Apify and Google Sheets

Transform LinkedIn profile URLs into comprehensive enriched lead profiles, quickly and automatically.

HTTP Request, Google Sheets

Web Scraping

Find Quality Youtube Videos with Automated Filtering & Relevance Scoring to Google Sheets

Content creators, researchers, educators, and digital marketers who need to discover high-quality YouTube training videos on specific topics. Perfect for building curated learning resource lists, comp

HTTP Request, Google Sheets

Website to AI-Ready Markdown in Google Sheets

The workflow JSON

Credentials you'll need

About this workflow

Related workflows