AutomationFlowsWeb Scraping › Extract and email compressed files

Extract and email compressed files

Original n8n title: Extract

Extract. Uses compression, httpRequest, readWriteFile, emailSend. Scheduled trigger; 10 nodes.

Cron / scheduled trigger★★★★☆ complexity10 nodesCompressionHTTP RequestRead Write FileEmail Send
Web Scraping Trigger: Cron / scheduled Nodes: 10 Complexity: ★★★★☆ Added:

This workflow follows the Emailsend → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "nodes": [
    {
      "parameters": {
        "outputPrefix": "=data"
      },
      "type": "n8n-nodes-base.compression",
      "typeVersion": 1.1,
      "position": [
        -1008,
        192
      ],
      "id": "e5da65c0-0303-4cf3-9a25-29d7208babad",
      "name": "Compression",
      "onError": "continueErrorOutput"
    },
    {
      "parameters": {
        "url": "={{ $json.href_download }}",
        "options": {
          "response": {
            "response": {}
          }
        }
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        -1232,
        192
      ],
      "id": "5328ca5d-dde7-448e-8c8f-c55e170480c5",
      "name": "Download arquivos",
      "onError": "continueErrorOutput"
    },
    {
      "parameters": {
        "jsCode": "// Fun\u00e7\u00e3o utilit\u00e1ria para normalizar nome de arquivo\nfunction sanitizeFileName(name) {\n  return name\n    .normalize('NFD') // remove acentos\n    .replace(/[\\u0300-\\u036f]/g, '') // remove diacr\u00edticos\n    .replace(/[^a-zA-Z0-9_-]+/g, '_') // troca espa\u00e7os e s\u00edmbolos por _\n    .replace(/_+/g, '_') // remove repeti\u00e7\u00f5es\n    .replace(/^_+|_+$/g, '') // remove underscores no in\u00edcio/fim\n    .substring(0, 120); // limita comprimento (opcional)\n}\n\n// Tenta pegar o HTML do pr\u00f3prio HTTP Request (body)\nconst html = items[0].json.body || items[0].json.data || '';\n\nif (!html) {\n  return [{ json: { error: 'HTML n\u00e3o encontrado. Conecte o HTTP Request diretamente a este n\u00f3.' } }];\n}\n\nconst esc = (s) => s.replace(/[.*+?^${}()|[\\]\\\\]/g, '\\\\$&');\nconst pattern = /<tr[^>]*>\\s*<td[^>]*>([^<]*Documento CSV de Acidentes[^<]*)<\\/td>[\\s\\S]*?<a[^>]*href=\"(https:\\/\\/drive\\.google\\.com\\/file\\/[^\"]+)\"/gi;\n\nconst out = [];\nlet match;\n\nwhile ((match = pattern.exec(html)) !== null) {\n  const name = match[1].trim().replace(/\\s+/g, ' ');\n  const originalHref = match[2].replace(/\\/download$/, '');\n\n  const idMatch = originalHref.match(/\\/d\\/([^/]+)/);\n  const fileId = idMatch ? idMatch[1] : null;\n  const href_download = fileId\n    ? `https://drive.google.com/uc?export=download&id=${fileId}`\n    : originalHref;\n\n  out.push({\n    json: {\n      name_file: sanitizeFileName(name),\n      name_file_filter : name,\n      href_download\n    },\n  });\n}\n\nreturn out;\n"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        -1632,
        192
      ],
      "id": "cf9bcee7-73be-43e6-8293-ad4e6b4b2c42",
      "name": "format html",
      "alwaysOutputData": false
    },
    {
      "parameters": {
        "operation": "write",
        "fileName": "=/home/tcc/seguranca-publica-brasil/data_processing/dataset/prf/{{ $('format html').item.json.name_file }}.csv",
        "dataPropertyName": "data0",
        "options": {}
      },
      "type": "n8n-nodes-base.readWriteFile",
      "typeVersion": 1,
      "position": [
        -800,
        192
      ],
      "id": "6dbbc02b-6c0d-4599-ab68-89106c16f022",
      "name": "Read/Write Files from Disk"
    },
    {
      "parameters": {
        "url": "https://www.gov.br/prf/pt-br/acesso-a-informacao/dados-abertos/dados-abertos-da-prf",
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        -1856,
        192
      ],
      "id": "815d105b-fec9-42ec-a7e3-a3996c6ffb3d",
      "name": "HTTP Request",
      "alwaysOutputData": false,
      "onError": "continueErrorOutput"
    },
    {
      "parameters": {
        "conditions": {
          "options": {
            "caseSensitive": true,
            "leftValue": "",
            "typeValidation": "loose",
            "version": 2
          },
          "conditions": [
            {
              "id": "c0e33416-05d2-40b6-882c-6914e06834dc",
              "leftValue": "={{ $json.name_file_filter }}",
              "rightValue": "=Documento CSV de Acidentes 2025 (Agrupados por pessoa - Todas as causas e tipos de acidentes)",
              "operator": {
                "type": "string",
                "operation": "contains"
              }
            }
          ],
          "combinator": "or"
        },
        "looseTypeValidation": true,
        "options": {}
      },
      "type": "n8n-nodes-base.filter",
      "typeVersion": 2.2,
      "position": [
        -1424,
        192
      ],
      "id": "7321d639-d3f3-4559-893c-41c52627d54f",
      "name": "Filter year",
      "alwaysOutputData": false,
      "notesInFlow": false
    },
    {
      "parameters": {
        "workflowId": {
          "__rl": true,
          "value": "QNJCw9LgtRFbTSZW",
          "mode": "list",
          "cachedResultName": "[Load] PRF"
        },
        "workflowInputs": {
          "mappingMode": "defineBelow",
          "value": {},
          "matchingColumns": [],
          "schema": [],
          "attemptToConvertTypes": false,
          "convertFieldsToString": true
        },
        "options": {}
      },
      "type": "n8n-nodes-base.executeWorkflow",
      "typeVersion": 1.2,
      "position": [
        -592,
        192
      ],
      "id": "104ccf5f-1635-442d-a090-eb8ed8a4212e",
      "name": "Execute Workflow"
    },
    {
      "parameters": {
        "fromEmail": "daviandre.junkes@gmail.com",
        "toEmail": "daviandre.junkes@gmail.com",
        "subject": "[PRF] Extract bronze erro",
        "html": "=<!DOCTYPE html>\n<html lang=\"pt-BR\">\n<head>\n  <meta charset=\"UTF-8\">\n  <title>Erro no Workflow</title>\n</head>\n<body style=\"font-family: Arial, sans-serif; background-color: #f8f9fa; color: #333; padding: 20px;\">\n  <div style=\"max-width: 600px; margin: 0 auto; background: #fff; border-radius: 8px; box-shadow: 0 2px 5px rgba(0,0,0,0.1); padding: 20px;\">\n    <h2 style=\"color: #d93025;\">\u26a0\ufe0f Erro no Workflow do n8n</h2>\n\n    <p>Ocorreu um erro durante a execu\u00e7\u00e3o do workflow <strong>{{ $workflow.name }}</strong>.</p>\n\n    <p><strong>N\u00f3:</strong>Carga incremental<br>\n    <strong>ID da Execu\u00e7\u00e3o:</strong> {{ $execution.id }}<br>\n    <strong>Data:</strong> {{ new Date().toLocaleString(\"pt-BR\") }}</p>\n\n<strong>Error:</strong> {{ $json.error }}</p>\n\n\n    <p>Confira a execu\u00e7\u00e3o completa no painel do n8n para mais detalhes.</p>\n\n    <p style=\"font-size: 12px; color: #777;\">\u2014 Alerta autom\u00e1tico do n8n</p>\n  </div>\n</body>\n</html>\n",
        "options": {}
      },
      "type": "n8n-nodes-base.emailSend",
      "typeVersion": 2.1,
      "position": [
        -1664,
        448
      ],
      "id": "8a28fb9f-296a-4c9d-9935-a536b223954e",
      "name": "Send email URL",
      "credentials": {
        "smtp": {
          "name": "<your credential>"
        }
      }
    },
    {
      "parameters": {
        "fromEmail": "daviandre.junkes@gmail.com",
        "toEmail": "daviandre.junkes@gmail.com",
        "subject": "[PRF] Extract bronze erro",
        "html": "=<!DOCTYPE html>\n<html lang=\"pt-BR\">\n<head>\n  <meta charset=\"UTF-8\">\n  <title>Erro no Workflow</title>\n</head>\n<body style=\"font-family: Arial, sans-serif; background-color: #f8f9fa; color: #333; padding: 20px;\">\n  <div style=\"max-width: 600px; margin: 0 auto; background: #fff; border-radius: 8px; box-shadow: 0 2px 5px rgba(0,0,0,0.1); padding: 20px;\">\n    <h2 style=\"color: #d93025;\">\u26a0\ufe0f Erro no Workflow do n8n</h2>\n\n    <p>Ocorreu um erro durante a execu\u00e7\u00e3o do workflow <strong>{{ $workflow.name }}</strong>.</p>\n\n    <p><strong>N\u00f3:</strong>Carga incremental<br>\n    <strong>ID da Execu\u00e7\u00e3o:</strong> {{ $execution.id }}<br>\n    <strong>Data:</strong> {{ new Date().toLocaleString(\"pt-BR\") }}</p>\n\n<strong>Error:</strong> {{ $json.error }}</p>\n\n\n    <p>Confira a execu\u00e7\u00e3o completa no painel do n8n para mais detalhes.</p>\n\n    <p style=\"font-size: 12px; color: #777;\">\u2014 Alerta autom\u00e1tico do n8n</p>\n  </div>\n</body>\n</html>\n",
        "options": {}
      },
      "type": "n8n-nodes-base.emailSend",
      "typeVersion": 2.1,
      "position": [
        -816,
        480
      ],
      "id": "d88e5cdf-e87e-419c-bb0e-b9436ee1dceb",
      "name": "Send email DOWNLOAD",
      "credentials": {
        "smtp": {
          "name": "<your credential>"
        }
      }
    },
    {
      "parameters": {
        "rule": {
          "interval": [
            {
              "field": "hours",
              "hoursInterval": 3
            }
          ]
        }
      },
      "type": "n8n-nodes-base.scheduleTrigger",
      "typeVersion": 1.2,
      "position": [
        -2080,
        192
      ],
      "id": "b361ac35-24fd-49ef-a9ba-4eedfeea299d",
      "name": "Schedule Trigger"
    }
  ],
  "connections": {
    "Compression": {
      "main": [
        [
          {
            "node": "Read/Write Files from Disk",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Send email DOWNLOAD",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Download arquivos": {
      "main": [
        [
          {
            "node": "Compression",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Send email DOWNLOAD",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "format html": {
      "main": [
        [
          {
            "node": "Filter year",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Read/Write Files from Disk": {
      "main": [
        [
          {
            "node": "Execute Workflow",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP Request": {
      "main": [
        [
          {
            "node": "format html",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Send email URL",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter year": {
      "main": [
        [
          {
            "node": "Download arquivos",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "HTTP Request",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "meta": {
    "templateCredsSetupCompleted": true
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Extract. Uses compression, httpRequest, readWriteFile, emailSend. Scheduled trigger; 10 nodes.

Source: https://github.com/JnksDavu/seguranca-publica-brasil/blob/f75d8f26df86b8a13e15c184297ce9bc03bcd78c/n8n/workflows/prf/extract.json — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This workflow automates the full cycle of fetching, processing, and storing Telr payment gateway reports — and then notifying your team by email. It runs on a schedule, calls the Telr API twice (once

Compression, Email Send, HTTP Request +2
Web Scraping

This workflow is an improvement of this workflow by Greg Brzezinka.

HTTP Request, Email Send, XML +1
Web Scraping

N8N-Self-Updater. Uses ssh, emailSend, httpRequest. Scheduled trigger; 27 nodes.

Ssh, Email Send, HTTP Request
Web Scraping

&gt; An automated n8n workflow originally built for DigitalOcean-based n8n deployments, but fully compatible with any VPS or cloud hosting (e.g., AWS, Google Cloud, Hetzner, Linode, etc.) where n8n ru

Ssh, Email Send, HTTP Request
Web Scraping

What if you could spot a major sales problem—or a winning campaign—the very next morning, instead of weeks later? Imagine receiving a beautiful, data-rich alert directly in your inbox the moment your

QuickBooks, HTTP Request, Email Send