AutomationFlowsAI & RAG › Extract Seed-funded Startup Data with Rss, Gpt-4.1-mini & Brightdata to Excel

Extract Seed-funded Startup Data with Rss, Gpt-4.1-mini & Brightdata to Excel

ByEumentis @eumentis on n8n.io

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

Event trigger★★★★☆ complexityAI-powered14 nodesOpenAIRss Feed Read Trigger@Brightdata/N8N Nodes BrightdataHTTP Request
AI & RAG Trigger: Event Nodes: 14 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #6775 — we link there as the canonical source.

This workflow follows the HTTP Request → OpenAI recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "66526413-badf-48cc-b08d-29a87490bf75",
      "name": "Edit Fields",
      "type": "n8n-nodes-base.set",
      "notes": "Filter the",
      "position": [
        1024,
        176
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "28ed03f9-1e17-432f-9438-6484aab19e35",
              "name": "",
              "type": "array",
              "value": "={{ $json.choices.map(choice => choice.message.content) }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "c527cac4-fb30-48f0-82f2-f516aa266ce5",
      "name": "Message a model",
      "type": "@n8n/n8n-nodes-langchain.openAi",
      "notes": "get seed funded companay data",
      "position": [
        640,
        176
      ],
      "parameters": {
        "modelId": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4.1-mini",
          "cachedResultName": "GPT-4.1-MINI"
        },
        "options": {},
        "messages": {
          "values": [
            {
              "role": "system",
              "content": "You are an AI designed to extract key information from a specified news article related to startup funding. You will receive the link to the article and its content in markdown format. Your task is to meticulously gather relevant data concerning startup funding as outlined below.\n\n### Input:\n- The URL of the news article discussing recent startup funding events.\n- The complete markdown text of the article.\n\n### Tasks:\n1. Review the provided article content and extract necessary information regarding companies that have received seed funding. If the article contains multiple instances of seed funding data, ensure you gather details for each company without addressing any generic explanations of seed funding itself.\n2. Dont use the article url for extracting the data. use it only for output\n\n3. Extract the following information in JSON format for each company reported in the article:\n\n   - **companyName**: Name of the startup company.\n   - **companyWebsite**: Official website of the company (do not reference any URLs provided in the markdown).\n   - **companyLinkedIn**: URL of the company's LinkedIn page.\n   - **fundingAmount**: The total amount raised in this funding round (e.g., \"\u00a3950,000\" or \"$1.2 million\").\n   - **founderName**: An array containing the full names of all the founders.\n   - **founderLinkedIn**: An array of LinkedIn profile URLs for each founder (set to null if not available).\n   - **articleUrl**: Return the input article URL instead of the article content.\n\n### Output Format:\n- Provide your output strictly in JSON format, ensuring proper structure even if some fields contain null values. If multiple companies are mentioned, return an array of objects, each representing a different company.\n\n### JSON Example:\n```json\n[\n  {\n    \"companyName\": \"Sample Startup 1\",\n    \"companyWebsite\": \"https://www.samplestartup1.com\",\n    \"companyLinkedIn\": \"https://www.linkedin.com/company/sample-startup-1\",\n    \"fundingAmount\": \"$1.5 million\",\n    \"founderName\": [\"John Doe\", \"Jane Smith\"],\n    \"founderLinkedIn\": [\"https://www.linkedin.com/in/johndoe\", null],\n    \"articleUrl\": \"https://www.example.com/sample-article\"\n  },\n  {\n    \"companyName\": \"Sample Startup 2\",\n    \"companyWebsite\": \"https://www.samplestartup2.com\",\n    \"companyLinkedIn\": \"https://www.linkedin.com/company/sample-startup-2\",\n    \"fundingAmount\": \"$950,000\",\n    \"founderName\": [\"Alice Johnson\"],\n    \"founderLinkedIn\": [null],\n    \"articleUrl\": \"https://www.example.com/sample-article\"\n  }\n]\n```\n\n### Guidelines:\n- Utilize only verified information from the article provided.\n- Set any unavailable fields to null.\n- Avoid seeking additional information from external websites.\n- Refrain from providing interpretations; stick strictly to the facts as presented in the markdown content."
            },
            {
              "content": "=\narticle content in markdown format : {{ $json.data }}\narticle link : {{ $json.link }}\n"
            }
          ]
        },
        "simplify": false,
        "jsonOutput": true
      },
      "notesInFlow": true,
      "typeVersion": 1.8
    },
    {
      "id": "4c6baf0d-c407-40cb-8d4c-89f1a716de24",
      "name": "Markdown",
      "type": "n8n-nodes-base.markdown",
      "onError": "continueRegularOutput",
      "position": [
        416,
        176
      ],
      "parameters": {
        "html": "={{ $json.body }}",
        "options": {
          "ignore": "head, script, img",
          "useLinkReferenceDefinitions": true
        }
      },
      "typeVersion": 1
    },
    {
      "id": "a0d497c0-81cb-4835-a632-42035ddc01e8",
      "name": "Refactor article link",
      "type": "n8n-nodes-base.code",
      "notes": "Get the redirect URL",
      "position": [
        -256,
        176
      ],
      "parameters": {
        "jsCode": "/** \n* Loop for extracting the valid article URL from the   redirect URL\n*/\nfor (const item of $input.all()) {\n  /** Redirect URL */\n  const rawLink = item.json.link;\n\n  let extractedUrl = rawLink;\n\n  /**\n  * Actual URL is start from \"&Url\" to \"&\" \n  * It will match and extract the URL\n  */\n  const match = rawLink.match(/[?&]url=([^&]+)/);\n  \n  if (match && match[1]) {\n    /** Decode the URL-encoded value */\n    extractedUrl = decodeURIComponent(match[1]);\n  }\n\n  /** Replace the redirect URL with actual URL */\n  item.json.link = extractedUrl;\n  \n}\nreturn $input.all();"
      },
      "notesInFlow": true,
      "typeVersion": 2
    },
    {
      "id": "8df2524a-ba30-48a0-afe7-059187fd334b",
      "name": "Add article link",
      "type": "n8n-nodes-base.code",
      "position": [
        192,
        176
      ],
      "parameters": {
        "jsCode": "/**\n * This code will integrate the article link from Refactor article link node and output of get article page node\n */\n\n/** Input for get articel page node */\nconst inputForBightData = $items(\"Refactor article link\"); \n\n/** Output of get articel page node */\nconst outputOfBightData = $input.all(); // from the BrightData response\n\n\nreturn outputOfBightData.map((item, index) => {\n  const input = inputForBightData[index].json;\n  const output = item.json;\n\n  return {\n    json: {\n      ...output,\n      link: input.link // Add the link from the original input\n    }\n  };\n});"
      },
      "typeVersion": 2
    },
    {
      "id": "0d725bba-0d9e-4e18-a6e8-3fb10bb5835a",
      "name": "RSS Feed Trigger",
      "type": "n8n-nodes-base.rssFeedReadTrigger",
      "position": [
        -464,
        176
      ],
      "parameters": {
        "feedUrl": "https://blog.n8n.io/rss/",
        "pollTimes": {
          "item": [
            {
              "mode": "everyX",
              "unit": "minutes",
              "value": 5
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "30fa6ec5-7cb1-45cb-bfce-169dc5e284f6",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -528,
        -32
      ],
      "parameters": {
        "width": 420,
        "height": 380,
        "content": "## Trigger & Article Discovery\n\n1. Automatically triggers the workflow when a new article is detected\n\n2. Extracts and replaces actual article URLs from redirect links by decoding the url query parameter.\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "c705f45e-9063-4738-85d8-2f67bea53ea5",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -64,
        -32
      ],
      "parameters": {
        "width": 620,
        "height": 380,
        "content": "## Content Scraping & Preparation\n\n1. Bright Data Scraper Scrapes the full content of the article, even behind paywalls or dynamic content\n\n2. Markdown Formatter Cleans and converts the raw article text into markdown format for better AI processing"
      },
      "typeVersion": 1
    },
    {
      "id": "5cdcd2db-5e38-4245-8da1-281e28f238cc",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        592,
        -32
      ],
      "parameters": {
        "width": 380,
        "height": 380,
        "content": "## Data Extraction with AI\nExtracts structured startup seed funding data including company details, funding amount, and founder information from news articles provided in markdown format."
      },
      "typeVersion": 1
    },
    {
      "id": "ba4045f1-628b-4b6c-aef3-40f00c3f0e4e",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        992,
        -32
      ],
      "parameters": {
        "width": 380,
        "height": 380,
        "content": "## Extract Valid Startup Entries from Nested Data\n\nExtracts unique startup entries from nested input data while removing duplicates based on normalized company names."
      },
      "typeVersion": 1
    },
    {
      "id": "c5a79532-75a6-40b5-b7c1-a911ecb5cf82",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1392,
        -32
      ],
      "parameters": {
        "width": 280,
        "height": 380,
        "content": "## Append data to Excel sheet\n\nPost the data to excel sheet with MS Graph API"
      },
      "typeVersion": 1
    },
    {
      "id": "edad49fb-c923-4374-b9e2-87eeb1e2630a",
      "name": "Get article Page",
      "type": "@brightdata/n8n-nodes-brightdata.brightData",
      "onError": "continueRegularOutput",
      "position": [
        -32,
        176
      ],
      "parameters": {
        "url": "={{ $json.link }}",
        "zone": {
          "__rl": true,
          "mode": "list",
          "value": "web_unlocker1"
        },
        "format": "json",
        "country": {
          "__rl": true,
          "mode": "list",
          "value": "us"
        },
        "requestOptions": {}
      },
      "retryOnFail": false,
      "typeVersion": 1
    },
    {
      "id": "684530ce-8472-4312-a3c6-6fc0c9c8ff84",
      "name": "Filter company data",
      "type": "n8n-nodes-base.code",
      "position": [
        1232,
        176
      ],
      "parameters": {
        "jsCode": "/** \n * this code will generate the array of company details by using the row and unstructured data from previous node\n * It also remove the duplicate entry\n*/\n\nconst results = [];\nconst seenCompanyNames = new Set();\n\nfunction extractValidStartups(obj) {\n  if (Array.isArray(obj)) {\n    for (const item of obj) {\n      extractValidStartups(item);\n    }\n  } else if (typeof obj === 'object' && obj !== null) {\n    // Skip if it's an error object\n    if (obj.error) return;\n\n    // Check if it looks like a startup object\n    if (obj.companyName) {\n      const key = obj.companyName.trim().toLowerCase(); // normalize name\n      if (!seenCompanyNames.has(key)) {\n        seenCompanyNames.add(key);\n        results.push({ json: obj });\n      }\n      return;\n    }\n\n    // Otherwise, recursively search its values\n    for (const key in obj) {\n      extractValidStartups(obj[key]);\n    }\n  }\n}\n\nfor (const item of $input.all()) {\n  const root = item.json[\"\"];\n  if (!Array.isArray(root)) continue;\n\n  for (const entry of root) {\n    extractValidStartups(entry);\n  }\n}\n\nreturn results;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "8c50058f-58c4-424e-a45a-ea27df89a47d",
      "name": "Add data into excel sheet",
      "type": "n8n-nodes-base.httpRequest",
      "onError": "continueErrorOutput",
      "position": [
        1456,
        176
      ],
      "parameters": {
        "url": "https://graph.microsoft.com/v1.0/drives/{{drive-id}}/items/{{file-id}}/workbook/tables/{ {{ sheet-id }} }/rows",
        "method": "POST",
        "options": {
          "batching": {
            "batch": {
              "batchSize": 1,
              "batchInterval": 3000
            }
          }
        },
        "jsonBody": "={\n  \"values\": [\n    {{ $input.all().map((item, index) => \n      `${index > 0 ? ',' : ''}[` +\n      `\"${$now.format('yyyy-MM-dd \\'at\\' T')}\",` +\n      `\"${item.json.companyName || \"-\"}\",` +\n      `\"${item.json.companyWebsite || \"-\"}\",` +\n      `\"${item.json.companyLinkedIn || \"-\"}\",` +\n      `\"${item.json.fundingAmount || \"-\"}\",` +\n      `\"${Array.isArray(item.json.founderName) && item.json.founderName.filter(n => n).length > 0 \n          ? item.json.founderName.filter(n => n).join(', ') \n          : \"-\" }\",` +\n      `\"${Array.isArray(item.json.founderLinkedIn) && item.json.founderLinkedIn.filter(n => n).length > 0 \n          ? item.json.founderLinkedIn.filter(n => n).join(', ') \n          : \"-\" }\",` +\n      `\"${item.json.articleUrl || \"-\"}\"` +\n      `]`\n    ).join('\\n') }}\n  ]\n}",
        "sendBody": true,
        "specifyBody": "json",
        "authentication": "genericCredentialType",
        "genericAuthType": "oAuth2Api"
      },
      "executeOnce": true,
      "retryOnFail": true,
      "typeVersion": 4.2
    }
  ],
  "connections": {
    "Markdown": {
      "main": [
        [
          {
            "node": "Message a model",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Edit Fields": {
      "main": [
        [
          {
            "node": "Filter company data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message a model": {
      "main": [
        [
          {
            "node": "Edit Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Add article link": {
      "main": [
        [
          {
            "node": "Markdown",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get article Page": {
      "main": [
        [
          {
            "node": "Add article link",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "RSS Feed Trigger": {
      "main": [
        [
          {
            "node": "Refactor article link",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Filter company data": {
      "main": [
        [
          {
            "node": "Add data into excel sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Refactor article link": {
      "main": [
        [
          {
            "node": "Get article Page",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

Source: https://n8n.io/workflows/6775/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

Domain Outbound Machine is an n8n workflow designed to fully automate the domain sales process: lead generation, email extraction, personalized outreach, and automated email sending. It also stores ex

Google Sheets, HTTP Request, Gmail +1
AI & RAG

Social Media Audio Extractor. Uses telegramTrigger, telegram, openAi, httpRequest. Event-driven trigger; 31 nodes.

Telegram Trigger, Telegram, OpenAI +2
AI & RAG

Baby Chaganti. Uses httpRequest, googleDrive, youTube, openAi. Event-driven trigger; 23 nodes.

HTTP Request, Google Drive, YouTube +1
AI & RAG

Monitor YouTube channels, fetch stats, classify videos as viral (≥ 1000 likes) or normal, and auto‑generate LinkedIn/email summaries with GPT‑4. Deliver via Gmail or SMTP. Clear node names, examples,

RSS Feed Read, HTTP Request, OpenAI +1
AI & RAG

Overview: This workflow automates the full pipeline of preparing scraped leads and loading them into an Instantly campaign for cold outreach.It begins by pulling rows from a Google Sheet that contains

Google Sheets, Telegram, OpenAI +1