AutomationFlowsWeb Scraping › Extract Links and Urls From PDF Documents Using Pdf.co

Extract Links and Urls From PDF Documents Using Pdf.co

ByMauricio Perera @rckflr on n8n.io

This workflow allows you to extract all links (URLs) contained in a PDF file by converting it to HTML via PDF.co and then extracting the URLs present in the resulting HTML.

Event trigger★★★★☆ complexity10 nodesForm TriggerN8N Nodes PdfcoHTTP Request
Web Scraping Trigger: Event Nodes: 10 Complexity: ★★★★☆ Added:

This workflow corresponds to n8n.io template #7031 — we link there as the canonical source.

This workflow follows the Form Trigger → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "f6e71b74-1ecb-43e8-baa2-bf05536d01b7",
      "name": "Load PDF",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        -2224,
        -384
      ],
      "parameters": {
        "options": {},
        "formTitle": "pdf",
        "formFields": {
          "values": [
            {
              "fieldType": "file",
              "fieldLabel": "data",
              "multipleFiles": false,
              "acceptFileTypes": ".pdf"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "f24dd98b-b3c4-47f1-8345-10097e53803d",
      "name": "Upload",
      "type": "n8n-nodes-pdfco.PDFco Api",
      "position": [
        -2016,
        -384
      ],
      "parameters": {
        "name": "test",
        "operation": "Upload File to PDF.co",
        "binaryData": true
      },
      "credentials": {
        "pdfcoApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "b354cde6-5354-4052-9a6c-d66c328a946f",
      "name": "PDF to HTML",
      "type": "n8n-nodes-pdfco.PDFco Api",
      "position": [
        -1776,
        -384
      ],
      "parameters": {
        "url": "={{ $json.url }}",
        "operation": "Convert from PDF",
        "advancedOptions": {}
      },
      "credentials": {
        "pdfcoApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "e15b5c0d-5a46-4faa-828f-25e56cfce322",
      "name": "Get HTML",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -1568,
        -384
      ],
      "parameters": {
        "url": "={{ $json.url }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "73506c94-6265-4d89-b386-e908285d14e0",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2288,
        -448
      ],
      "parameters": {
        "width": 208,
        "height": 240,
        "content": "## Load PDF\n"
      },
      "typeVersion": 1
    },
    {
      "id": "6d23ab8a-5bae-4317-b73e-fb1b2ba8ff16",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2080,
        -448
      ],
      "parameters": {
        "color": 2,
        "height": 240,
        "content": "## Upload to PDF.CO \n"
      },
      "typeVersion": 1
    },
    {
      "id": "72be2279-3028-4c24-8973-00879cff375a",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1840,
        -448
      ],
      "parameters": {
        "color": 4,
        "width": 224,
        "height": 240,
        "content": "## PDF to HTML"
      },
      "typeVersion": 1
    },
    {
      "id": "cebf4aeb-549c-4c9e-84eb-41d880834fb5",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1616,
        -448
      ],
      "parameters": {
        "width": 192,
        "height": 240,
        "content": "## Get HTML"
      },
      "typeVersion": 1
    },
    {
      "id": "8f6d9763-dece-45f6-a78b-1b5f6891f2fa",
      "name": "Code1",
      "type": "n8n-nodes-base.code",
      "position": [
        -1360,
        -384
      ],
      "parameters": {
        "jsCode": "// Recorrer todos los items que entran al nodo\nconst resultados = [];\n\nfor (const item of $input.all()) {\n  const texto = item.json.data || '';\n  // Regex para URLs (http, https, www)\n  const regexUrl = /(https?:\\/\\/[^\\s]+)|(www\\.[^\\s]+)/gi;\n  \n  // Extraer URLs, si no hay ninguna, el resultado es []\n  const urls = texto.match(regexUrl) || [];\n  \n  // Por cada URL encontrada, crear un nuevo item con la URL\n  for (const url of urls) {\n    resultados.push({ json: { url } });\n  }\n}\n\n// Devolver un array de objetos con las URLs extra\u00eddas\nreturn resultados;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "0c49f98f-0b3c-4c47-ad34-b60b02c5f3a5",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1424,
        -448
      ],
      "parameters": {
        "color": 5,
        "width": 208,
        "height": 240,
        "content": "## Get URL's \n"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Upload": {
      "main": [
        [
          {
            "node": "PDF to HTML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get HTML": {
      "main": [
        [
          {
            "node": "Code1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Load PDF": {
      "main": [
        [
          {
            "node": "Upload",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "PDF to HTML": {
      "main": [
        [
          {
            "node": "Get HTML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow allows you to extract all links (URLs) contained in a PDF file by converting it to HTML via PDF.co and then extracting the URLs present in the resulting HTML.

Source: https://n8n.io/workflows/7031/ — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This workflow allows you to import any workflow from a file or another n8n instance and map the credentials easily. A multi-form setup guides you through the entire process At the beginning you have t

Execute Command, Read Write File, HTTP Request +3
Web Scraping

[n8n] Advanced URL Parsing and Shortening Workflow - Switchy.io Integration. Uses splitInBatches, stickyNote, httpRequest, html. Event-driven trigger; 56 nodes.

HTTP Request, GitHub, Stop And Error +1
Web Scraping

[](https://youtu.be/c7yCZhmMjtI)

HTTP Request, GitHub, Stop And Error +1
Web Scraping

N8n recently introduced folders and it has been a big improvement on workflow management on top of the tags.

HTTP Request, n8n, Form Trigger +1
Web Scraping

This workflow automates the creation of press releases for music artists releasing a new single. Upload your MP3, fill in basic info, and receive a publication-ready press release saved as a Google Do

Form Trigger, HTTP Request, Google Docs