AutomationFlowsWeb Scraping › Extract Data from Non-Native Services

Extract Data from Non-Native Services

Original n8n title: Pulling Data From Services That N8n Doesnt Have a Pre Built Integration For

Pulling Data From Services That N8N Doesnt Have A Pre Built Integration For. Uses manualTrigger, stickyNote, itemLists, htmlExtract. Event-driven trigger; 14 nodes.

Event trigger★★★★☆ complexity14 nodesItem ListsHtml ExtractHTTP Request
Web Scraping Trigger: Event Nodes: 14 Complexity: ★★★★☆ Added:

This workflow follows the HTTP Request → Itemlists recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "nodes": [
    {
      "id": "25ac6cda-31fb-474a-b6b6-083ec03b9273",
      "name": "On clicking 'execute'",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        925,
        285
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "93eaee43-7a39-4c83-aeaa-9ca14d0f4b4b",
      "name": "Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        380,
        240
      ],
      "parameters": {
        "width": 440,
        "height": 200,
        "content": "## HTTP Request\n### This workflow shows the most common use cases of the HTTP request node, and how to handle its output\n\n\n### Click the `Execute Workflow` button and double click on the nodes to see the input and output items."
      },
      "typeVersion": 1
    },
    {
      "id": "3ccdc45b-aae1-4760-b45e-5b8dca2a9fcf",
      "name": "Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1280,
        480
      ],
      "parameters": {
        "width": 986.3743856726365,
        "height": 460.847917534361,
        "content": "## 3. Handle Pagination\n### Sometimes you need to make the same request multiple times to get all the data you need (pagination).\n\n### The pagination process goes as follow:\n### 1. Loop through the pages of the input source (`HTTP Request` node named \"Get my Starts\")\n### 2. Increment the page at the end of each loop (done with the `set` node named \"Increment Page\") \n### 3. Stop looping when there are no pages left (checked at the `If` node named \"Are we Finished?\")\n\n\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "af19bb6d-5f0a-41ca-93b2-dbd27c3fd07e",
      "name": "Set",
      "type": "n8n-nodes-base.set",
      "position": [
        1345,
        725
      ],
      "parameters": {
        "values": {
          "number": [
            {
              "name": "page"
            },
            {
              "name": "perpage",
              "value": 15
            }
          ],
          "string": [
            {
              "name": "githubUser",
              "value": "that-one-tom"
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 1
    },
    {
      "id": "dad6055d-e06b-4f8c-ab90-deb196fce277",
      "name": "Note6",
      "type": "n8n-nodes-base.stickyNote",
      "disabled": true,
      "position": [
        1280,
        180
      ],
      "parameters": {
        "width": 680,
        "height": 280,
        "content": "## 2. Data Scraping\n### In this example we fetch the titles from the n8n blog using the `HTTP request` node and then we use the `HTML extract` node to pass."
      },
      "typeVersion": 1
    },
    {
      "id": "a7d4b9db-4d38-4b8d-9585-fe65c379e381",
      "name": "Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1280,
        -120
      ],
      "parameters": {
        "width": 500,
        "height": 280,
        "content": "## 1. Split into items\n### In this example, we take the body from an `HTTP Request` node and split it out into items that are easier to manage."
      },
      "typeVersion": 1
    },
    {
      "id": "d8402820-fa72-4957-8cf6-432f928ae799",
      "name": "Item Lists - Create Items from Body",
      "type": "n8n-nodes-base.itemLists",
      "notes": "Create Items from Body",
      "position": [
        1525,
        -15
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "body"
      },
      "notesInFlow": false,
      "typeVersion": 1
    },
    {
      "id": "598939cd-e4c0-4a90-bd1f-f2b13ccbe072",
      "name": "HTML Extract - Extract Article Title",
      "type": "n8n-nodes-base.htmlExtract",
      "position": [
        1505,
        285
      ],
      "parameters": {
        "options": {},
        "sourceData": "binary",
        "extractionValues": {
          "values": [
            {
              "key": "ArticleTitle",
              "cssSelector": "#firstHeading"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "1c9b609c-5e41-4444-ade7-e1069943c904",
      "name": "Item Lists - Fetch Body",
      "type": "n8n-nodes-base.itemLists",
      "position": [
        1705,
        725
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "body"
      },
      "typeVersion": 1,
      "alwaysOutputData": true
    },
    {
      "id": "15dfab42-440c-4d06-9ba2-b7b17371d009",
      "name": "If - Are we finished?",
      "type": "n8n-nodes-base.if",
      "position": [
        1885,
        725
      ],
      "parameters": {
        "conditions": {
          "string": [
            {
              "value1": "={{$node[\"HTTP Request - Get my Stars\"].json[\"body\"]}}",
              "operation": "isEmpty"
            }
          ]
        }
      },
      "executeOnce": true,
      "typeVersion": 1
    },
    {
      "id": "ba6e6904-6749-4ea2-84c1-8409b795bcf5",
      "name": "Set - Increment Page",
      "type": "n8n-nodes-base.set",
      "position": [
        2105,
        745
      ],
      "parameters": {
        "values": {
          "string": [
            {
              "name": "page",
              "value": "={{$node[\"Set\"].json[\"page\"]++}}"
            }
          ]
        },
        "options": {}
      },
      "executeOnce": true,
      "typeVersion": 1
    },
    {
      "id": "9f0df828-27d7-4994-8934-c8fe88af8566",
      "name": "HTTP Request - Get Mock Albums",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1345,
        -15
      ],
      "parameters": {
        "url": "https://jsonplaceholder.typicode.com/albums",
        "options": {
          "response": {
            "response": {
              "fullResponse": true
            }
          }
        }
      },
      "typeVersion": 3
    },
    {
      "id": "cbc64010-f6f4-4c35-b4e2-9e1d4a748308",
      "name": "HTTP Request - Get Wikipedia Page",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1325,
        285
      ],
      "parameters": {
        "url": "https://en.wikipedia.org/wiki/Special:Random",
        "options": {
          "redirect": {
            "redirect": {
              "followRedirects": true
            }
          },
          "response": {
            "response": {
              "responseFormat": "file"
            }
          }
        }
      },
      "typeVersion": 3
    },
    {
      "id": "a1a19268-0be8-4379-99a4-4285c68691b5",
      "name": "HTTP Request - Get my Stars",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        1525,
        725
      ],
      "parameters": {
        "url": "=https://api.github.com/users/{{$node[\"Set\"].json[\"githubUser\"]}}/starred",
        "options": {
          "response": {
            "response": {
              "fullResponse": true
            }
          }
        },
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "per_page",
              "value": "={{$node[\"Set\"].json[\"perpage\"]}}"
            },
            {
              "name": "page",
              "value": "={{$node[\"Set\"].json[\"page\"]}}"
            }
          ]
        }
      },
      "typeVersion": 3
    }
  ],
  "connections": {
    "Set": {
      "main": [
        [
          {
            "node": "HTTP Request - Get my Stars",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set - Increment Page": {
      "main": [
        [
          {
            "node": "HTTP Request - Get my Stars",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "If - Are we finished?": {
      "main": [
        null,
        [
          {
            "node": "Set - Increment Page",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "On clicking 'execute'": {
      "main": [
        [
          {
            "node": "Set",
            "type": "main",
            "index": 0
          },
          {
            "node": "HTTP Request - Get Mock Albums",
            "type": "main",
            "index": 0
          },
          {
            "node": "HTTP Request - Get Wikipedia Page",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Item Lists - Fetch Body": {
      "main": [
        [
          {
            "node": "If - Are we finished?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP Request - Get my Stars": {
      "main": [
        [
          {
            "node": "Item Lists - Fetch Body",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP Request - Get Mock Albums": {
      "main": [
        [
          {
            "node": "Item Lists - Create Items from Body",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP Request - Get Wikipedia Page": {
      "main": [
        [
          {
            "node": "HTML Extract - Extract Article Title",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

How this works

This workflow empowers you to effortlessly extract valuable data from websites or APIs lacking native n8n integrations, saving hours of manual scraping or custom coding. It's ideal for analysts, marketers, or developers needing quick access to unstructured content like article titles, product details, or forum posts without building from scratch. The key step involves using the HTTP Request node to fetch raw HTML, followed by htmlExtract to parse and pull out specific elements, with itemLists handling any array transformations for clean output.

Use this when dealing with one-off data pulls from obscure sources, such as niche forums or legacy sites, where speed trumps scalability. Avoid it for high-volume or real-time needs, as it relies on event-driven manual triggers that aren't suited to automated monitoring. Common variations include chaining multiple htmlExtract nodes for deeper page scraping or integrating with Google Sheets to store extracted data directly.

About this workflow

Pulling Data From Services That N8N Doesnt Have A Pre Built Integration For. Uses manualTrigger, stickyNote, itemLists, htmlExtract. Event-driven trigger; 14 nodes.

Source: https://github.com/Zie619/n8n-workflows — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

Workflow 1748. Uses itemLists, htmlExtract, httpRequest. Event-driven trigger; 14 nodes.

Item Lists, Html Extract, HTTP Request
Web Scraping

Create An Rss Feed Based On A Website S Content. Uses manualTrigger, itemLists, htmlExtract, httpRequest. Event-driven trigger; 12 nodes.

Item Lists, Html Extract, HTTP Request +1
Web Scraping

Rss Feed For Ard Audiothek Podcasts. Uses manualTrigger, httpRequest, htmlExtract, itemLists. Event-driven trigger; 11 nodes.

HTTP Request, Html Extract, Item Lists
Web Scraping

extract_swifts. Uses manualTrigger, httpRequest, htmlExtract, splitInBatches. Event-driven trigger; 23 nodes.

HTTP Request, Html Extract, MongoDB +5
Web Scraping

FL Hazards -> OpenWebUI KB (hazards) - FIXED. Uses itemLists, httpRequest. Event-driven trigger; 11 nodes.

Item Lists, HTTP Request