AutomationFlowsAI & RAG › Scrape Website Content & Extract SEO Keywords

Scrape Website Content & Extract SEO Keywords

Original n8n title: Website Content Scraper & SEO Keyword Extractor with Gpt-5-mini and Airtable

ByAbhishek Patoliya @abhishekpatoliya on n8n.io

This workflow allows you to scrape website content, clean the HTML, extract structured information using GPT-5-mini, and store the results along with SEO keywords into Airtable. Ideal for building keyword lists and organizing web content for SEO research. n8n Community or Cloud…

Event trigger★★★★☆ complexityAI-powered16 nodesOpenAI ChatForm TriggerAirtableHTTP RequestAgent
AI & RAG Trigger: Event Nodes: 16 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #5657 — we link there as the canonical source.

This workflow follows the Agent → Airtable recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "ac45f4fc-3549-4150-bf1a-54c824bf309c",
      "name": "OpenAI Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "position": [
        448,
        320
      ],
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-5-mini",
          "cachedResultName": "gpt-5-mini"
        },
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "4779cfff-5642-476b-a613-23af50036bb3",
      "name": "Website Name",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        -576,
        112
      ],
      "parameters": {
        "options": {
          "buttonLabel": "Submit"
        },
        "formTitle": "Website Name",
        "formFields": {
          "values": [
            {
              "fieldLabel": "Website Name ",
              "requiredField": true
            }
          ]
        },
        "responseMode": "lastNode",
        "formDescription": "=Website Scraper"
      },
      "typeVersion": 2.2
    },
    {
      "id": "2bde19ca-d681-4fb9-a17e-006684eeea41",
      "name": "Wait1",
      "type": "n8n-nodes-base.wait",
      "position": [
        1616,
        112
      ],
      "parameters": {
        "amount": 20
      },
      "typeVersion": 1.1
    },
    {
      "id": "0830519d-a286-46bd-92ae-4ed65685ed6c",
      "name": "Split Out1",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        208,
        112
      ],
      "parameters": {
        "include": "allOtherFields",
        "options": {},
        "fieldToSplitOut": "cleanedData"
      },
      "typeVersion": 1
    },
    {
      "id": "24c2705e-d68b-4ed2-838e-fdbeefa59ed7",
      "name": "Split Out2",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        1456,
        112
      ],
      "parameters": {
        "include": "allOtherFields",
        "options": {},
        "fieldToSplitOut": "cleaned"
      },
      "typeVersion": 1
    },
    {
      "id": "ef09a578-6e34-4450-80b0-9fc8e220bff8",
      "name": "Airtable",
      "type": "n8n-nodes-base.airtable",
      "position": [
        2384,
        176
      ],
      "parameters": {
        "base": {
          "__rl": true,
          "mode": "list",
          "value": "appzInHZw2u0BuI74",
          "cachedResultUrl": "https://airtable.com/appzInHZw2u0BuI74",
          "cachedResultName": "Website"
        },
        "table": {
          "__rl": true,
          "mode": "list",
          "value": "tblmDPwBaTyXXgHJX",
          "cachedResultUrl": "https://airtable.com/appzInHZw2u0BuI74/tblmDPwBaTyXXgHJX",
          "cachedResultName": "Table 1"
        },
        "columns": {
          "value": {
            "Keywords": "={{ $json.output }}",
            "WebSite Name": "={{ $('Website Name').item.json['Website Name '] }}"
          },
          "schema": [
            {
              "id": "id",
              "type": "string",
              "display": true,
              "removed": true,
              "readOnly": true,
              "required": false,
              "displayName": "id",
              "defaultMatch": true
            },
            {
              "id": "WebSite Name",
              "type": "string",
              "display": true,
              "removed": false,
              "readOnly": false,
              "required": false,
              "displayName": "WebSite Name",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Keywords",
              "type": "string",
              "display": true,
              "removed": false,
              "readOnly": false,
              "required": false,
              "displayName": "Keywords",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "WebSite Name"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "upsert"
      },
      "credentials": {
        "airtableTokenApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.1
    },
    {
      "id": "d7f75db7-bd09-47ab-b6b3-f46300f575ac",
      "name": "OpenAI Chat Model1",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "position": [
        1472,
        544
      ],
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-5-mini",
          "cachedResultName": "gpt-5-mini"
        },
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "e7adb845-0af4-4d88-902b-a22ebf53e28d",
      "name": "Merge",
      "type": "n8n-nodes-base.merge",
      "position": [
        2064,
        176
      ],
      "parameters": {
        "mode": "combineBySql"
      },
      "typeVersion": 3.1
    },
    {
      "id": "aa4b0cc1-c757-40bd-844a-33beb5584aee",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -384,
        -96
      ],
      "parameters": {
        "content": "## READING WEBSITE \nuser input"
      },
      "typeVersion": 1
    },
    {
      "id": "7e573fd8-a758-464c-a584-b29a866068af",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -80,
        -96
      ],
      "parameters": {
        "width": 150,
        "content": "## cleaned HTML code\n"
      },
      "typeVersion": 1
    },
    {
      "id": "e68d95e2-3485-4aff-bea0-d97c23620101",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        448,
        -96
      ],
      "parameters": {
        "content": "## Topic wise information.\nwebsite name."
      },
      "typeVersion": 1
    },
    {
      "id": "59d6a7a2-8276-4b4e-bec8-7af04ee16a59",
      "name": "HTTP",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -320,
        112
      ],
      "parameters": {
        "url": "={{ $json['Website Name '] }}",
        "options": {}
      },
      "typeVersion": 4.2
    },
    {
      "id": "b211ddee-921d-4397-a599-da0904ae21cb",
      "name": "HTML",
      "type": "n8n-nodes-base.code",
      "position": [
        -32,
        112
      ],
      "parameters": {
        "jsCode": "const data = $(\"HTTP\").all()[0]?.json?.data;\n\nfunction extractTextFromHTML(html) {\n  const cleanedHTML = html\n    .replace(/<style[\\s\\S]*?>[\\s\\S]*?<\\/style>/gi, \"\")\n    .replace(/<[^>]+>/g, \"\")\n    .replace(/\\s+/g, \" \")\n    .trim();\n\n  return cleanedHTML;\n}\n\nconst cleanedData = extractTextFromHTML(data);\n\nreturn { cleanedData };\n"
      },
      "typeVersion": 2
    },
    {
      "id": "8d589808-2355-4221-9e91-30a164a332b8",
      "name": "Topic Wise information.",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        448,
        112
      ],
      "parameters": {
        "text": "={{ $json.cleanedData }}",
        "options": {
          "systemMessage": "={{ $json.cleanedData }}\n\nfind it topic wise information.\n"
        },
        "promptType": "define"
      },
      "typeVersion": 1.8
    },
    {
      "id": "79096058-ad96-4b23-a45a-61f7ae50b212",
      "name": "list",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        1472,
        320
      ],
      "parameters": {
        "text": "={{ $json.cleaned }}",
        "options": {
          "systemMessage": "=only for list number of 90 keyword data \"\"\"Important Keyword List for SEO\"\"\"\n"
        },
        "promptType": "define"
      },
      "typeVersion": 2
    },
    {
      "id": "ad65ecf2-4192-4b07-8eaa-f1bfde81a374",
      "name": "Cleaned ##",
      "type": "n8n-nodes-base.code",
      "position": [
        896,
        112
      ],
      "parameters": {
        "jsCode": "const input = $json[\"output\"]; // Replace \"text\" with your actual field name\nconst cleaned = input\n  .replace(/\\*\\*/g, '')        // Remove all double asterisks **\n  .replace(/^###\\s?/gm, '')  // Remove all ### at the start of lines\n  .replace(/^##\\s?/gm, '');   // Remove all ## at the start of lines\n\n\nreturn {\n  json: {\n    cleaned\n  }\n};\n"
      },
      "typeVersion": 2
    }
  ],
  "connections": {
    "HTML": {
      "main": [
        [
          {
            "node": "Split Out1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP": {
      "main": [
        [
          {
            "node": "HTML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "list": {
      "main": [
        [
          {
            "node": "Merge",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    "Merge": {
      "main": [
        [
          {
            "node": "Airtable",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Wait1": {
      "main": [
        [
          {
            "node": "Merge",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Cleaned ##": {
      "main": [
        [
          {
            "node": "Split Out2",
            "type": "main",
            "index": 0
          },
          {
            "node": "list",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Out1": {
      "main": [
        [
          {
            "node": "Topic Wise information.",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Split Out2": {
      "main": [
        [
          {
            "node": "Wait1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Website Name": {
      "main": [
        [
          {
            "node": "HTTP",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Topic Wise information.",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Chat Model1": {
      "ai_languageModel": [
        [
          {
            "node": "list",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Topic Wise information.": {
      "main": [
        [
          {
            "node": "Cleaned ##",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow allows you to scrape website content, clean the HTML, extract structured information using GPT-5-mini, and store the results along with SEO keywords into Airtable. Ideal for building keyword lists and organizing web content for SEO research. n8n Community or Cloud…

Source: https://n8n.io/workflows/5657/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This workflow generates comprehensive B2B leads, from a selected Business type in ANY CITY IN THE WORLD, including: Company name; Website; Email (enriched with AI Agent); Phone number; Address; Main L

Output Parser Structured, Memory Buffer Window, Agent +8
AI & RAG

Submit a call recording and get a full AI-powered analysis delivered automatically. Your recording is transcribed with OpenAI Whisper, then GPT-4o breaks down sentiment, objections, buying signals, de

Form Trigger, HTTP Request, Agent +3
AI & RAG

Phone agent. Uses httpRequest, formTrigger, openAi, agent. Event-driven trigger; 10 nodes.

HTTP Request, Form Trigger, OpenAI +3
AI & RAG

🎯 Create viral TikToks, Shorts, Reels, podcasts, and ASMR videos in minutes — all on autopilot.

OpenAI, HTTP Request, Form Trigger +7
AI & RAG

The AI-Powered Shopify SEO Content Automation is an enterprise-grade workflow that transforms product content creation for e-commerce stores. This sophisticated multi-agent system integrates GPT-4o, C

Perplexity Tool, Memory Buffer Window, Agent +15