AutomationFlowsWeb Scraping › Crawl Initial

Crawl Initial

crawl_initial. Uses httpRequest. Event-driven trigger; 18 nodes.

Event trigger★★★★☆ complexity18 nodesHTTP Request
Web Scraping Trigger: Event Nodes: 18 Complexity: ★★★★☆ Added:

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "name": "crawl_initial",
  "nodes": [
    {
      "parameters": {
        "jsCode": "// \uac01 \uc544\uc774\ud15c\uc5d0\uc11c \ud544\uc694\ud55c \ud544\ub4dc\ub9cc \ubf51\uc544\uc11c \uad6c\uc870\ud654\n// items[i].json.hits \ubc30\uc5f4\uc5d0\uc11c \uaebc\ub0c4\n\nconst results = [];\n\nconst hits = items[0].json.hits || [];\n\nfor (const h of hits) {\n  results.push({\n    json: {\n      source: 'hn_stock',            // \ub098\uc911\uc5d0 'naver_news' \ub4f1\uc73c\ub85c \ubcc0\uacbd\n      title: h.title || '',\n      url: h.url || '',\n      created_at: h.created_at || '',\n      crawled_at: new Date().toISOString()\n    }\n  });\n}\n\nreturn results;"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        592,
        320
      ],
      "id": "7e1d6665-6e4a-4c94-bb9b-067c4d641165",
      "name": "Code in JavaScript"
    },
    {
      "parameters": {},
      "type": "n8n-nodes-base.manualTrigger",
      "typeVersion": 1,
      "position": [
        -560,
        -64
      ],
      "id": "a336e52a-d2cd-4256-b2a2-08eba8486731",
      "name": "\ud06c\ub864\ub9c1 \ud2b8\ub9ac\uac70"
    },
    {
      "parameters": {
        "method": "POST",
        "url": "http://host.docker.internal:8001/insert/batch/news",
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "Content-Type",
              "value": "application/json"
            },
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        },
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={{ JSON.stringify($json) }}",
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [
        2528,
        0
      ],
      "id": "2ad5bdfd-36e9-40b8-804b-59f206f1003e",
      "name": "\ub514\ube44 \uc800\uc7a5"
    },
    {
      "parameters": {
        "jsCode": "// \ud604\uc7ac INPUT = \uc5ec\ub7ec \uac1c\uc758 item\n// \uadf8\uac78 \ubc30\uc5f4\ub85c \ubb36\uc5b4\uc11c items: [] \ud615\ud0dc\ub97c \ub9cc\ub4e0\ub2e4.\n\nconst all = $input.all();  // [ {json: {...}}, {json: {...}}, ... ]\n\n// json\ub9cc \ubf51\uc544\uc11c \ubc30\uc5f4\ub85c \ub9cc\ub4e4\uae30\nconst items = all.map(x => x.json);\n\n// Output\uc744 1\uac1c item\uc73c\ub85c\ub9cc \ubc18\ud658\nreturn [\n  {\n    json: {\n      items\n    }\n  }\n];"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        912,
        336
      ],
      "id": "61b85813-adde-438a-bb8e-0c93000a5032",
      "name": "json \ud569\uce58\uae30"
    },
    {
      "parameters": {
        "jsCode": "const all = $input.all();\n\nreturn [\n  {\n    json: {\n      items: all.map(e => e.json)\n    }\n  }\n];"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        2048,
        -224
      ],
      "id": "596a28f6-d1a9-4714-9676-1df835ead8ec",
      "name": "\ud569\uce58\uae30"
    },
    {
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "const d = $json.news;\n\n// content \ubc30\uc5f4\uc5d0\uc11c type === \"text\" \uc778 \uac83\ub9cc \uace8\ub77c\uc11c \ud569\uce58\uae30\nlet textBlocks = [];\n\nif (Array.isArray(d.content)) {\n  for (const block of d.content) {\n    if (block.type === \"text\" && block.content.trim().length > 0) {\n      textBlocks.push(block.content.trim());\n    }\n  }\n}\n\nconst finalContent = textBlocks.join(\"\\n\");\n\n// \ud14d\uc2a4\ud2b8\uac00 \uac70\uc758 \uc5c6\uc73c\uba74(\uc774\ubbf8\uc9c0 \uce74\ub4dc \ub274\uc2a4 \ub4f1) \uc2a4\ud0b5\nif (!finalContent || finalContent.length < 20) {\n  return null;   // <-- null \ub85c \ubcc0\uacbd!!\n}\n\nreturn {\n  json: {\n    source: \"saveticker_news\",\n    title: d.title,\n    url: `https://www.saveticker.com/app/news/${d.id}`,\n    content: finalContent,\n    created_at: d.created_at\n      ? new Date(d.created_at).toISOString()\n      : new Date().toISOString(),\n    crawled_at: new Date().toISOString()\n  }\n};"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        1552,
        -336
      ],
      "id": "e86e57c2-99f2-4962-a4b4-77cff2fd523b",
      "name": "json parse"
    },
    {
      "parameters": {
        "url": "https://api.saveticker.com/api/news/list",
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "page",
              "value": "1"
            },
            {
              "name": "page_size",
              "value": "100"
            },
            {
              "name": "sort",
              "value": "created_at_desc"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [
        -96,
        -336
      ],
      "id": "71d7080e-f273-4576-9dd8-9f4eb6b9f464",
      "name": "\uc138\uc774\ube0c\ud2f0\ucee4(\ub274\uc2a4)"
    },
    {
      "parameters": {
        "url": "https://api.saveticker.com/api/community/list",
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "page",
              "value": "1"
            },
            {
              "name": "page_size",
              "value": "100"
            },
            {
              "name": "sort",
              "value": "created_at_desc"
            },
            {
              "name": "category",
              "value": "user_news"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [
        -96,
        -112
      ],
      "id": "ce728578-4efb-41d5-81df-6c4d43fb4242",
      "name": "\uc138\uc774\ube0c\ud2f0\ucee4(\ucee4\ubba4\ub2c8\ud2f0)"
    },
    {
      "parameters": {
        "url": "=https://api.saveticker.com/api/news/detail/{{$json.id}}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "json"
            }
          }
        }
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [
        1104,
        -336
      ],
      "id": "7b840999-4292-4fbe-a3b5-8952db800615",
      "name": "\uc138\uc774\ube0c\ud2f0\ucee4 \uc0c1\uc138\ub0b4\uc5ed(\ub274\uc2a4)"
    },
    {
      "parameters": {
        "url": "=https://api.saveticker.com/api/community/detail/{{$json.id}}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "json"
            }
          }
        }
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [
        1104,
        -112
      ],
      "id": "92a45adb-1096-4046-a84d-d9a9112cfe82",
      "name": "\uc138\uc774\ube0c\ud2f0\ucee4 \uc0c1\uc138\ub0b4\uc5ed(\ucee4\ubba4\ub2c8\ud2f0)"
    },
    {
      "parameters": {
        "jsCode": "const list = items[0].json.news_list || [];\n\nreturn list.map(n => ({\n  json: {\n    id: n.id,\n    title: n.title,    \n    url: `https://www.saveticker.com/app/news/${n.id}`,\n    created_at: n.created_at\n  }\n}));"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        176,
        -336
      ],
      "id": "1fc4b025-7ef1-4abf-845c-3d793ed646b0",
      "name": "\ub9ac\uc2a4\ud2b8 Id\ucd94\ucd9c(\ub274\uc2a4)"
    },
    {
      "parameters": {
        "jsCode": "const list = items[0].json.posts || [];\n\nreturn list.map(n => ({\n  json: {\n    id: n.id,\n    title: n.title,\n    url: `https://www.saveticker.com/app/community/${n.id}`,\n    created_at: n.created_at\n  }\n}));"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        176,
        -112
      ],
      "id": "f7946152-baf7-406f-8e95-331cac0a40ed",
      "name": "\ub9ac\uc2a4\ud2b8 Id\ucd94\ucd9c(\ucee4\ubba4\ub2c8\ud2f0)"
    },
    {
      "parameters": {
        "mode": "runOnceForEachItem",
        "jsCode": "const d = $json.post;\n\n// content \ubc30\uc5f4\uc5d0\uc11c type === \"text\" \uc778 \uac83\ub9cc \uace8\ub77c\uc11c \ud569\uce58\uae30\nlet textBlocks = [];\n\nif (Array.isArray(d.content)) {\n  for (const block of d.content) {\n    if (block.type === \"text\" && block.content.trim().length > 0) {\n      textBlocks.push(block.content.trim());\n    }\n  }\n}\n\nconst finalContent = textBlocks.join(\"\\n\");\n\n// \ud14d\uc2a4\ud2b8\uac00 \uac70\uc758 \uc5c6\uc73c\uba74(\uc774\ubbf8\uc9c0 \uce74\ub4dc \ub274\uc2a4 \ub4f1) \uc2a4\ud0b5\nif (!finalContent || finalContent.length < 20) {\n  return null;   // <-- null \ub85c \ubcc0\uacbd!!\n}\n\nreturn {\n  json: {\n    source: \"saveticker_community\",\n    title: d.title,\n    url: `https://www.saveticker.com/app/community/${d.id}`,\n    content: finalContent,\n    created_at: d.created_at\n      ? new Date(d.created_at).toISOString()\n      : new Date().toISOString(),\n    crawled_at: new Date().toISOString()\n  }\n};"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        1536,
        -112
      ],
      "id": "c0de3782-6ca1-4f23-b30b-a692d86cbeda",
      "name": "json parse1"
    },
    {
      "parameters": {},
      "type": "n8n-nodes-base.merge",
      "typeVersion": 3.2,
      "position": [
        1824,
        -224
      ],
      "id": "18302f08-f5b8-4005-9cc5-806b5b7b5a0a",
      "name": "Merge"
    },
    {
      "parameters": {
        "method": "POST",
        "url": "http://host.docker.internal:8001/exists/news",
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "Content-Type",
              "value": "application/json"
            },
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        },
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={{ JSON.stringify({\n  items: $input.all().map(item => ({\n    source: 'saveticker_news',\n    find_key: item.json.id\n  }))\n}) }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "json"
            }
          }
        }
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [
        384,
        -336
      ],
      "id": "b157f076-cbde-45c7-ae0e-a7346a79921a",
      "name": "\uc911\ubcf5\uc81c\uac70"
    },
    {
      "parameters": {
        "jsCode": "return $json.not_exists.map(x => ({ json: x }));"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        592,
        -336
      ],
      "id": "6c7ceb65-8d38-41af-ba38-ee070db069fe",
      "name": "\uc911\ubcf5\uc81c\uac70 \ud569\uce58\uae30"
    },
    {
      "parameters": {
        "method": "POST",
        "url": "http://host.docker.internal:8001/exists/news",
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "Content-Type",
              "value": "application/json"
            },
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        },
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={{ JSON.stringify({\n  items: $input.all().map(item => ({\n    source: 'saveticker_community',\n    find_key: item.json.id\n  }))\n}) }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "json"
            }
          }
        }
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [
        384,
        -112
      ],
      "id": "fb917fb3-8ba6-465e-abde-384d92e9bc16",
      "name": "\uc911\ubcf5\uc81c\uac701"
    },
    {
      "parameters": {
        "jsCode": "return $json.not_exists.map(x => ({ json: x }));"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        608,
        -112
      ],
      "id": "b0793a97-4b63-4aa0-8626-d24bf22bd352",
      "name": "\uc911\ubcf5\uc81c\uac70 \ud569\uce58\uae301"
    }
  ],
  "connections": {
    "Code in JavaScript": {
      "main": [
        [
          {
            "node": "json \ud569\uce58\uae30",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ud06c\ub864\ub9c1 \ud2b8\ub9ac\uac70": {
      "main": [
        [
          {
            "node": "\uc138\uc774\ube0c\ud2f0\ucee4(\ub274\uc2a4)",
            "type": "main",
            "index": 0
          },
          {
            "node": "\uc138\uc774\ube0c\ud2f0\ucee4(\ucee4\ubba4\ub2c8\ud2f0)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "json \ud569\uce58\uae30": {
      "main": [
        []
      ]
    },
    "\ud569\uce58\uae30": {
      "main": [
        [
          {
            "node": "\ub514\ube44 \uc800\uc7a5",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "json parse": {
      "main": [
        [
          {
            "node": "Merge",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc138\uc774\ube0c\ud2f0\ucee4(\ub274\uc2a4)": {
      "main": [
        [
          {
            "node": "\ub9ac\uc2a4\ud2b8 Id\ucd94\ucd9c(\ub274\uc2a4)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc138\uc774\ube0c\ud2f0\ucee4(\ucee4\ubba4\ub2c8\ud2f0)": {
      "main": [
        [
          {
            "node": "\ub9ac\uc2a4\ud2b8 Id\ucd94\ucd9c(\ucee4\ubba4\ub2c8\ud2f0)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc138\uc774\ube0c\ud2f0\ucee4 \uc0c1\uc138\ub0b4\uc5ed(\ub274\uc2a4)": {
      "main": [
        [
          {
            "node": "json parse",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc138\uc774\ube0c\ud2f0\ucee4 \uc0c1\uc138\ub0b4\uc5ed(\ucee4\ubba4\ub2c8\ud2f0)": {
      "main": [
        [
          {
            "node": "json parse1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ub9ac\uc2a4\ud2b8 Id\ucd94\ucd9c(\ub274\uc2a4)": {
      "main": [
        [
          {
            "node": "\uc911\ubcf5\uc81c\uac70",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ub9ac\uc2a4\ud2b8 Id\ucd94\ucd9c(\ucee4\ubba4\ub2c8\ud2f0)": {
      "main": [
        [
          {
            "node": "\uc911\ubcf5\uc81c\uac701",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "json parse1": {
      "main": [
        [
          {
            "node": "Merge",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    "Merge": {
      "main": [
        [
          {
            "node": "\ud569\uce58\uae30",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc911\ubcf5\uc81c\uac70": {
      "main": [
        [
          {
            "node": "\uc911\ubcf5\uc81c\uac70 \ud569\uce58\uae30",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc911\ubcf5\uc81c\uac70 \ud569\uce58\uae30": {
      "main": [
        [
          {
            "node": "\uc138\uc774\ube0c\ud2f0\ucee4 \uc0c1\uc138\ub0b4\uc5ed(\ub274\uc2a4)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc911\ubcf5\uc81c\uac701": {
      "main": [
        [
          {
            "node": "\uc911\ubcf5\uc81c\uac70 \ud569\uce58\uae301",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\uc911\ubcf5\uc81c\uac70 \ud569\uce58\uae301": {
      "main": [
        [
          {
            "node": "\uc138\uc774\ube0c\ud2f0\ucee4 \uc0c1\uc138\ub0b4\uc5ed(\ucee4\ubba4\ub2c8\ud2f0)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "9e0ceeaf-0a2f-40d0-ba04-b42fe76355f5",
  "id": "PLyHSoO5F0S9HaCK",
  "tags": []
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

crawl_initial. Uses httpRequest. Event-driven trigger; 18 nodes.

Source: https://github.com/01026551290/auto-income-n8n/blob/bfcddcb7da06ef0333a043247ad9941bdc8cd7a3/workflows/crawl_initial.json — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This workflow uses the Zyte API to automatically detect and extract structured data from E-commerce sites, Articles, Job Boards, and Search Engine Results (SERP) - no custom CSS selectors required.

Form Trigger, HTTP Request, Form
Web Scraping

Automate LinkedIn lead generation by scraping comments from targeted posts and enriching profiles with detailed data

Form Trigger, HTTP Request, Google Sheets
Web Scraping

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

Notion, @Apify/N8N Nodes Apify, HTTP Request
Web Scraping

This workflow runs a spider job in the background via Scrapyd, using a YAML config that defines selectors and parsing rules. When triggered, it schedules the spider with parameters (query, project ID,

HTTP Request
Web Scraping

This n8n workflow collects leads from Google Maps, scrapes their websites via direct HTTP requests, and extracts valid email addresses — all while mimicking real user behavior to improve scraping reli

Form Trigger, @Apify/N8N Nodes Apify, HTTP Request +1