AutomationFlows β€Ί Data & Sheets β€Ί Scrape Urls From Google Sheets and Save as Markdown to Google Drive

Scrape Urls From Google Sheets and Save as Markdown to Google Drive

ByAllan Vaccarizi @growthaiβœ“ on n8n.io

πŸ“Ί Full walkthrough video: https://youtu.be/x3PDYon4qKk

Chat trigger triggerβ˜…β˜…β˜…β˜…β˜† complexityAI-powered15 nodesChat Trigger@Mendable/N8N Nodes FirecrawlGoogle SheetsGoogle Drive
Data & Sheets Trigger: Chat trigger Nodes: 15 Complexity: β˜…β˜…β˜…β˜…β˜† AI nodes: yes Added:

This workflow corresponds to n8n.io template #7384 β€” we link there as the canonical source.

This workflow follows the Chat Trigger β†’ Google Drive recipe pattern β€” see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide β†’

Download .json
{
  "nodes": [
    {
      "id": "63d513b3-4054-456f-910b-cd4a765df79a",
      "name": "Sticky Note16",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1712,
        -960
      ],
      "parameters": {
        "width": 1024,
        "height": 400,
        "content": "![Logo Growth AI](https://cdn.prod.website-files.com/6825df5b20329ba581df4914/68d413c43f8729fa336568a6_Logo_horizontal.png)"
      },
      "typeVersion": 1
    },
    {
      "id": "15175f62-b287-4455-94ef-2f08914ff656",
      "name": "Sticky Note17",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1712,
        -528
      ],
      "parameters": {
        "color": 7,
        "width": 1024,
        "height": 240,
        "content": "## Need more advanced automation solutions? Contact us for custom enterprise workflows!\n\n# Growth-AI.fr\n\n## https://www.linkedin.com/in/allanvaccarizi/\n## https://www.linkedin.com/in/hugo-marinier-%F0%9F%A7%B2-6537b633/"
      },
      "typeVersion": 1
    },
    {
      "id": "1a314b56-840f-40a1-a148-065c789654cb",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        768,
        -512
      ],
      "parameters": {
        "width": 480,
        "height": 672,
        "content": "## Batch scraping\n\n### How it works\n\n1. The workflow is triggered by a chat message, which kicks off the URL scraping pipeline.\n2. URLs are read from a Google Sheet and filtered to remove any empty or invalid rows.\n3. Valid URLs are processed in batches using a loop to avoid overloading the scraper.\n4. Each URL is scraped with Firecrawl and the raw content is transformed into Markdown via a code node.\n5. The resulting Markdown file is saved to Google Drive, and the corresponding row in Google Sheets is marked as scraped before the loop continues.\n\n### Setup steps\n\n- - [ ] Configure Google Sheets credentials and set the correct spreadsheet/sheet containing the URLs to scrape.\n- - [ ] Configure Google Drive credentials and specify the destination folder for saving Markdown files.\n- - [ ] Add your Firecrawl API credentials to the 'Scrape URL with Firecrawl' node.\n- - [ ] Ensure the Google Sheet has a column to track scraping status (used by 'Mark as Scraped in Sheets').\n- - [ ] Set the desired batch size in the 'Loop Over Items in Batches' node to control throughput.\n\n### Customization\n\nYou can adjust the batch size to control scraping speed and API usage. The 'Process Scraped Content' code node can be modified to reformat, clean, or enrich the scraped Markdown before saving."
      },
      "typeVersion": 1
    },
    {
      "id": "04d083f5-ce2f-4cf8-b949-6df65b9424dc",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1312,
        -224
      ],
      "parameters": {
        "color": 7,
        "width": 640,
        "height": 320,
        "content": "## Trigger and fetch URLs\n\nThe workflow starts when a chat message is received. URLs are then read from a Google Sheet and filtered to remove any empty rows before processing."
      },
      "typeVersion": 1
    },
    {
      "id": "7799c5f1-49a2-4c03-a2df-963ee04d4ea1",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2032,
        -240
      ],
      "parameters": {
        "color": 7,
        "width": 464,
        "height": 352,
        "content": "## Batch loop and scrape\n\nValid URLs are fed into a batch loop that iterates over each item. Each URL is scraped using Firecrawl, with the loop cycling until all URLs are processed."
      },
      "typeVersion": 1
    },
    {
      "id": "a301031e-e3a9-4dfa-96db-73c2fd8cdea9",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2576,
        -224
      ],
      "parameters": {
        "color": 7,
        "width": 688,
        "height": 320,
        "content": "## Process, save, and update status\n\nScraped content is processed and converted to Markdown via a code node, saved as a file in Google Drive, and then the source row in Google Sheets is marked as scraped before the loop resumes."
      },
      "typeVersion": 1
    },
    {
      "id": "096b6879-d342-41d0-a9e4-ec7c0e0cc777",
      "name": "When Chat Message Received",
      "type": "@n8n/n8n-nodes-langchain.chatTrigger",
      "position": [
        1360,
        -64
      ],
      "parameters": {
        "mode": "webhook",
        "public": true,
        "options": {
          "responseMode": "responseNode"
        }
      },
      "typeVersion": 1.1
    },
    {
      "id": "4d20bae7-5cf5-4f44-bb3f-9195c85809f6",
      "name": "Scrape URL with Firecrawl",
      "type": "@mendable/n8n-nodes-firecrawl.firecrawl",
      "onError": "continueErrorOutput",
      "position": [
        2352,
        -48
      ],
      "parameters": {
        "url": "={{ $json.URL }}",
        "operation": "scrape",
        "requestOptions": {}
      },
      "credentials": {
        "firecrawlApi": {
          "name": "<your credential>"
        }
      },
      "retryOnFail": true,
      "typeVersion": 1
    },
    {
      "id": "3fa1b84b-2cb4-4184-9e81-be306cb224b0",
      "name": "Read URLs from Sheets",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        1568,
        -64
      ],
      "parameters": {
        "options": {},
        "sheetName": {
          "__rl": true,
          "mode": "name",
          "value": "Page to doc"
        },
        "documentId": {
          "__rl": true,
          "mode": "url",
          "value": "={{ $json.chatInput }}"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.5
    },
    {
      "id": "799f9382-38a0-409f-8ab7-d499b35a6931",
      "name": "Filter Non-Empty Rows",
      "type": "n8n-nodes-base.filter",
      "position": [
        1808,
        -64
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "48acd975-5041-455b-8e47-3b7eef32b483",
              "operator": {
                "type": "string",
                "operation": "exists",
                "singleValue": true
              },
              "leftValue": "={{ $json.URL }}",
              "rightValue": ""
            },
            {
              "id": "3d28d877-11fb-455d-b328-572c8492ea03",
              "operator": {
                "type": "string",
                "operation": "empty",
                "singleValue": true
              },
              "leftValue": "={{ $json.Scraped }}",
              "rightValue": ""
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "3d34488e-0eaf-4695-89dd-3c1ed61f67bd",
      "name": "Save Markdown to Google Drive",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        2896,
        -64
      ],
      "parameters": {
        "name": "={{ $('Scrape URL with Firecrawl').item.json.data.metadata.url }}",
        "content": "={{ $json.markdown_clean }}",
        "driveId": {
          "__rl": true,
          "mode": "list",
          "value": "0ADUfRaRT2rWIUk9PVA",
          "cachedResultUrl": "https://drive.google.com/drive/folders/0ADUfRaRT2rWIUk9PVA",
          "cachedResultName": "Growth AI"
        },
        "options": {},
        "folderId": {
          "__rl": true,
          "mode": "list",
          "value": "18HHNuVxjYGKv3YHnzIrBxwr_a5Sn1B9_",
          "cachedResultUrl": "https://drive.google.com/drive/folders/18HHNuVxjYGKv3YHnzIrBxwr_a5Sn1B9_",
          "cachedResultName": "Contenu scrap\u00e9"
        },
        "operation": "createFromText"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "dbdced1c-ba11-4216-a851-5b3b74c0dd20",
      "name": "Update Scraped Status in Sheets",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3120,
        -64
      ],
      "parameters": {
        "columns": {
          "value": {
            "URL": "={{ $('Loop Over URLs').item.json.URL }}",
            "Scraped": "OK"
          },
          "schema": [
            {
              "id": "URL",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "URL",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Scraped",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Scraped",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "row_number",
              "type": "number",
              "display": true,
              "removed": true,
              "readOnly": true,
              "required": false,
              "displayName": "row_number",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "URL"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "update",
        "sheetName": {
          "__rl": true,
          "mode": "name",
          "value": "Page to doc"
        },
        "documentId": {
          "__rl": true,
          "mode": "url",
          "value": "={{ $('When Chat Message Received').item.json.chatInput }}"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.6,
      "alwaysOutputData": true
    },
    {
      "id": "e0833704-4d0a-4afc-90ea-1e8740cdbc0e",
      "name": "Transform Scraped Content",
      "type": "n8n-nodes-base.code",
      "onError": "continueRegularOutput",
      "position": [
        2624,
        -64
      ],
      "parameters": {
        "jsCode": "// Code pour node \"Code\" dans n8n\n// Nettoie le markdown en supprimant les liens, URLs et texte ind\u00e9sirable\n\n// R\u00e9cup\u00e9rer le markdown depuis l'item d'entr\u00e9e\nconst markdown = $input.item.json.data.markdown;\n\n// Fonction pour nettoyer le markdown\nfunction cleanMarkdown(text) {\n  if (!text) return '';\n  \n  let cleaned = text;\n  \n  // 1. Supprimer \"Passer au contenu principal\" et \"Aller au contenu\" (insensible \u00e0 la casse)\n  cleaned = cleaned.replace(/passer au contenu principal/gi, '');\n  cleaned = cleaned.replace(/aller au contenu/gi, '');\n  \n  // 2. Convertir les liens markdown [texte](url) en texte simple\n  // Garde le texte entre [], supprime les [] et (url)\n  cleaned = cleaned.replace(/\\[([^\\]]+)\\]\\([^\\)]+\\)/g, '$1');\n  \n  // 3. Supprimer les crochets restants [] et garder leur contenu\n  cleaned = cleaned.replace(/\\[([^\\]]+)\\]/g, '$1');\n  \n  // 4. Supprimer les URLs standalone (http://, https://, www.)\n  cleaned = cleaned.replace(/https?:\\/\\/[^\\s)]+/g, '');\n  cleaned = cleaned.replace(/www\\.[^\\s)]+/g, '');\n  \n  // 5. Supprimer les parenth\u00e8ses qui contiennent des URLs r\u00e9siduelles\n  cleaned = cleaned.replace(/\\([^)]*(?:http|www)[^)]*\\)/g, '');\n  \n  // 6. Nettoyer les espaces multiples cr\u00e9\u00e9s par les suppressions\n  cleaned = cleaned.replace(/  +/g, ' ');\n  \n  // 7. Nettoyer les lignes vides multiples\n  cleaned = cleaned.replace(/\\n{3,}/g, '\\n\\n');\n  \n  // 8. Supprimer les espaces en d\u00e9but/fin de lignes\n  cleaned = cleaned.split('\\n').map(line => line.trim()).join('\\n');\n  \n  // 9. Supprimer les espaces en d\u00e9but/fin du texte\n  cleaned = cleaned.trim();\n  \n  return cleaned;\n}\n\n// Appliquer le nettoyage\nconst cleanedMarkdown = cleanMarkdown(markdown);\n\n// IMPORTANT : Retourner un TABLEAU contenant l'item\n// Cela pr\u00e9serve le \"pairing\" avec les items pr\u00e9c\u00e9dents\nreturn [{\n  json: {\n    markdown_clean: cleanedMarkdown,\n  }\n}];"
      },
      "typeVersion": 2,
      "alwaysOutputData": false
    },
    {
      "id": "5c6cc015-69bc-4471-bea7-d6d81097729f",
      "name": "Loop Over URLs",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        2080,
        -64
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 3
    },
    {
      "id": "52c112a0-0e92-4c56-8d24-d8eb6ece74b6",
      "name": "Sticky Note18",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1664,
        208
      ],
      "parameters": {
        "color": 4,
        "width": 1120,
        "height": 144,
        "content": "# Google Sheets template \n\n## https://docs.google.com/spreadsheets/d/1vgNAV6P3cvBtTUax1rKrhzCLmEBAQu5sgeUxbU5_--0"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Loop Over URLs": {
      "main": [
        [],
        [
          {
            "node": "Scrape URL with Firecrawl",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter Non-Empty Rows": {
      "main": [
        [
          {
            "node": "Loop Over URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Read URLs from Sheets": {
      "main": [
        [
          {
            "node": "Filter Non-Empty Rows",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Scrape URL with Firecrawl": {
      "main": [
        [
          {
            "node": "Transform Scraped Content",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Loop Over URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Transform Scraped Content": {
      "main": [
        [
          {
            "node": "Save Markdown to Google Drive",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "When Chat Message Received": {
      "main": [
        [
          {
            "node": "Read URLs from Sheets",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Save Markdown to Google Drive": {
      "main": [
        [
          {
            "node": "Update Scraped Status in Sheets",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Update Scraped Status in Sheets": {
      "main": [
        [
          {
            "node": "Loop Over URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing β€” you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

πŸ“Ί Full walkthrough video: https://youtu.be/x3PDYon4qKk

Source: https://n8n.io/workflows/7384/ β€” original creator credit. Request a take-down β†’

More Data & Sheets workflows β†’ Β· Browse all categories β†’

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Data & Sheets

πŸ“Ί Full walkthrough video: https://youtu.be/yjeKYfZP0kU

Google Drive, Google Sheets, @Mendable/N8N Nodes Firecrawl +2
Data & Sheets

This workflow builds a free lead generation system that scrapes emails from Google Maps listings and exports them directly into Google Sheets. It’s built in n8n using HTTP requests and JavaScriptβ€”no p

HTTP Request, Google Sheets, Chat Trigger
Data & Sheets

template-demo-chatgpt-image-1-with-drive-and-sheet copy. Uses manualTrigger, httpRequest, googleDrive, splitOut. Event-driven trigger; 16 nodes.

HTTP Request, Google Drive, Google Sheets +1
Data & Sheets

Receive a chat input as an image prompt. Call OpenAI's API to generate an image. Split the returned images and process them one by one. Upload each generated image to Google Drive. Save image links an

HTTP Request, Google Drive, Google Sheets +1
Data & Sheets

This workflow is designed for marketers, researchers, and business owners who need to quickly find and export company data from Google Maps into a structured table format.

Chat Trigger, Google Sheets, HTTP Request