This workflow corresponds to n8n.io template #11389 — we link there as the canonical source.

This workflow follows the Agent → Google Sheets recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json

{
  "id": "cN9kMSYH8kj5m2So",
  "name": "AI Research Scraper & Summary Generator",
  "tags": [],
  "nodes": [
    {
      "id": "97a37f53-8d6a-409e-acd5-f2a24545b441",
      "name": "When clicking \u2018Execute workflow\u2019",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -880,
        352
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "a12bdc4e-0c86-4073-ac97-702fbe385288",
      "name": "Decodo",
      "type": "@decodo/n8n-nodes-preview-decodo.decodo",
      "position": [
        -224,
        352
      ],
      "parameters": {
        "operation": "universal"
      },
      "credentials": {},
      "typeVersion": 1
    },
    {
      "id": "7fe1a395-608e-43b2-a53f-8e7e74d15067",
      "name": "Code in JavaScript",
      "type": "n8n-nodes-base.code",
      "position": [
        0,
        352
      ],
      "parameters": {
        "jsCode": "// Este Function node toma la salida del scraper de Decodo\n// y devuelve solo lo que nos interesa para el LLM:\n// - url\n// - fuente (dominio)\n// - titulo (de <title>)\n// - article_text (texto plano del art\u00edculo)\n\nfunction extractTitle(html) {\n  const match = html.match(/<title[^>]*>([^<]+)<\\/title>/i);\n  if (match && match[1]) {\n    return match[1].trim();\n  }\n  return \"\";\n}\n\nfunction htmlToText(html) {\n  if (!html || typeof html !== \"string\") {\n    return \"\";\n  }\n\n  let text = html;\n\n  // 1) Eliminar scripts y estilos\n  text = text.replace(/<script[\\s\\S]*?<\\/script>/gi, \"\");\n  text = text.replace(/<style[\\s\\S]*?<\\/style>/gi, \"\");\n\n  // 2) Sustituir algunos tags por saltos de l\u00ednea\n  text = text.replace(/<\\/(p|div|section|article|li|h1|h2|h3|h4|h5|h6)>/gi, \"\\n\");\n  text = text.replace(/<br\\s*\\/?>/gi, \"\\n\");\n\n  // 3) Eliminar el resto de tags HTML\n  text = text.replace(/<\\/?[^>]+>/g, \"\");\n\n  // 4) Decodificar entidades b\u00e1sicas\n  text = text\n    .replace(/&nbsp;/gi, \" \")\n    .replace(/&amp;/gi, \"&\")\n    .replace(/&quot;/gi, \"\\\"\")\n    .replace(/&#39;/gi, \"'\")\n    .replace(/&lt;/gi, \"<\")\n    .replace(/&gt;/gi, \">\");\n\n  // 5) Limpiar espacios extra y l\u00edneas vac\u00edas\n  text = text\n    .split(\"\\n\")\n    .map(line => line.trim())\n    .filter(line => line.length > 0)\n    .join(\"\\n\");\n\n  return text;\n}\n\nfunction getDomainFromUrl(url) {\n  try {\n    const u = new URL(url);\n    return u.hostname.replace(/^www\\./, \"\");\n  } catch (e) {\n    return \"\";\n  }\n}\n\nconst newItems = [];\n\nfor (const item of items) {\n  const json = item.json || {};\n\n  // Si viene como en tu ejemplo: { results: [ { content, url, ... } ] }\n  let result = json;\n  if (Array.isArray(json.results) && json.results.length > 0) {\n    result = json.results[0];\n  }\n\n  const html = result.content || \"\";\n  const url = result.url || json.url || \"\";\n\n  const titulo = extractTitle(html);\n  const article_text = htmlToText(html);\n  const fuente = getDomainFromUrl(url);\n\n  // Aqu\u00ed tambi\u00e9n podemos ya incluir la fecha de guardado (hoy)\n  const today = new Date().toISOString().slice(0, 10); // YYYY-MM-DD\n\n  newItems.push({\n    json: {\n      url,\n      fuente,\n      titulo,\n      article_text,\n      fecha_guardado: today\n    }\n  });\n}\n\nreturn newItems;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "8cc45657-66e8-43a1-a5de-cf2718105853",
      "name": "AI Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        224,
        352
      ],
      "parameters": {
        "text": "=I will give you the scraped content from a webpage, including:\n\n- URL\n- Source domain\n- Extracted title\n- Today's date (saved date)\n- Full cleaned article text\n\nYour task is to analyze this information and respond ONLY with a JSON object using the EXACT keys and structure below:\n\n{\n  \"url\": \"\",\n  \"title\": \"\",\n  \"source\": \"\",\n  \"published_date\": \"\",\n  \"saved_date\": \"\",\n  \"resource_type\": \"\",\n  \"main_topic\": \"\",\n  \"level\": \"\",\n  \"three_key_insights\": \"\",\n  \"short_summary\": \"\",\n  \"content_idea\": \"\",\n  \"language\": \"\"\n}\n\nHere are the detailed instructions for each field:\n\n- \"url\":\n  Copy exactly the URL provided.\n\n- \"title\":\n  Provide a clean, concise title for the article or resource.\n  You may improve formatting if the extracted title is messy.\n\n- \"source\":\n  Convert the domain into a readable name.\n  Examples:\n  - \"openai.com\" \u2192 \"OpenAI\"\n  - \"anthropic.com\" \u2192 \"Anthropic\"\n  - \"arxiv.org\" \u2192 \"arXiv\"\n\n- \"published_date\":\n  - If a clear publication date appears in the article text or metadata, extract it and return it in \"YYYY-MM-DD\" format.\n  - If NO publication date is clearly indicated, return an empty string \"\".\n\n- \"saved_date\":\n  Copy exactly the date I provided (in \"YYYY-MM-DD\" format).\n  Do NOT invent anything here.\n\n- \"resource_type\":\n  Choose ONE of the following (fallback to \"blog\" if unclear):\n  \"blog\", \"paper\", \"docs\", \"video\", \"tweet\", \"thread\", \"repository\", \"documentation\"\n\n- \"main_topic\":\n  Summarize the central topic in a maximum of 2\u20133 words.\n  Examples: \"RAG\", \"fine-tuning\", \"evaluation\", \"LLM agents\", \"robotics + LLMs\", \"MLOps\"\n\n- \"level\":\n  Choose one of:\n  \"beginner\"\n  \"intermediate\"\n  \"advanced\"\n  Base it on complexity of the language and concepts.\n\n- \"three_key_insights\":\n  Write exactly three bullet points, each separated by a newline \"\\n\".\n  Each bullet should capture an important idea, in one or two lines max.\n\n- \"short_summary\":\n  A concise 3\u20134 line paragraph summarizing the article and why it matters.\n\n- \"content_idea\":\n  Suggest a reusable content idea based on the article.\n  Examples:\n  - \"YouTube video explaining the experiment\"\n  - \"LinkedIn post with the key takeaways\"\n  - \"Module for an AI course\"\n  - \"Example for a lesson on agents\"\n\n- \"language\":\n  Return the language code of the article's main content:\n  - \"en\" for English\n  - \"es\" for Spanish\n  - use other ISO codes when relevant\n\n--------------------------------------\n\nHere is the content you must analyze:\n\nURL:\n{{ $json[\"url\"] }}\n\nSource domain:\n{{ $json[\"source\"] }}\n\nExtracted title:\n{{ $json[\"title\"] }}\n\nSaved date (today):\n{{ $json.fecha_guardado }}\n\nFull article text:\n\"\"\"\n{{ $json[\"article_text\"] }}\n\"\"\"\n\nReturn ONLY a valid JSON object. No explanations, no markdown, no backticks.\n",
        "options": {
          "systemMessage": "You are an AI assistant specialized in analyzing articles, papers, blogs and documentation related to Artificial Intelligence, Machine Learning and Large Language Models (LLMs).\n\nYour task is to read the input content extracted from a webpage and return a structured JSON object containing metadata and insights useful for research and knowledge management.\n\nYou MUST:\n- Follow the JSON schema exactly as requested.\n- Return ONLY a valid JSON object.\n- Avoid any explanation outside of the JSON.\n- Infer missing fields when possible, but NEVER fabricate dates or factual information.\n- Keep fields concise, clean and useful for later indexing.\n\nAlways output strictly and only the final JSON object.\n"
        },
        "promptType": "define"
      },
      "typeVersion": 3
    },
    {
      "id": "89d95a58-f246-4a65-80f0-187a67b5ab3f",
      "name": "Google Gemini Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        304,
        576
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 1
    },
    {
      "id": "ad37c2d2-ce1d-4203-9970-d8c267b7564d",
      "name": "Code in JavaScript1",
      "type": "n8n-nodes-base.code",
      "position": [
        576,
        352
      ],
      "parameters": {
        "jsCode": "// Each item has a field \"output\" that is a JSON string.\n// We parse it and return the parsed object as the new item.json\n\nconst newItems = [];\n\nfor (const item of items) {\n  const rawOutput = item.json.output;\n\n  if (typeof rawOutput === 'string') {\n    try {\n      const parsed = JSON.parse(rawOutput);\n\n      newItems.push({\n        json: {\n          url: parsed.url || \"\",\n          title: parsed.title || \"\",\n          source: parsed.source || \"\",\n          published_date: parsed.published_date || \"\",\n          saved_date: parsed.saved_date || \"\",\n          resource_type: parsed.resource_type || \"\",\n          main_topic: parsed.main_topic || \"\",\n          three_key_insights: parsed.three_key_insights || \"\",\n          short_summary: parsed.short_summary || \"\",\n          content_idea: parsed.content_idea || \"\",\n          language: parsed.language || \"\"\n        }\n      });\n    } catch (error) {\n      // If parsing fails, you can decide what to do.\n      // For now, we just keep the original item with an error message.\n      newItems.push({\n        json: {\n          error: 'Failed to parse LLM output',\n          original_output: rawOutput\n        }\n      });\n    }\n  } else {\n    // If output is not a string, just forward the item\n    newItems.push(item);\n  }\n}\n\nreturn newItems;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "59bdf3b7-fb95-43b4-9ef0-e503c3590a25",
      "name": "Get row(s) in sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        -656,
        352
      ],
      "parameters": {
        "options": {},
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.YOUR_AWS_SECRET_KEY_HERE_amoMjNx9azfxJKYPZNU-ek/edit#gid=0",
          "cachedResultName": "input"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1HeeycUkLvoP1tzQw9R5n_amoMjNx9azfxJKYPZNU-ek",
          "cachedResultUrl": "https://docs.google.YOUR_AWS_SECRET_KEY_HERE_amoMjNx9azfxJKYPZNU-ek/edit?usp=drivesdk",
          "cachedResultName": "urls_AI_ML"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "9efa2562-3417-41ec-9d78-c68e781343c0",
      "name": "Append row in sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        784,
        352
      ],
      "parameters": {
        "columns": {
          "value": {
            "url": "={{ $json.url }}",
            "title": "={{ $json.title }}",
            "topic": "={{ $json.main_topic }}",
            "source": "={{ $json.source }}",
            "summary": "={{ $json.short_summary }}",
            "key_ideas": "={{ $json.three_key_insights }}",
            "text_type": "={{ $json.resource_type }}",
            "main_topic": "={{ $json.main_topic }}",
            "published_date": "={{ $json.published_date }}"
          },
          "schema": [
            {
              "id": "url",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "url",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "topic",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "topic",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "key_ideas",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "key_ideas",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "summary",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "summary",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "published_date",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "published_date",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "title",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "title",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "source",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "source",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "text_type",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "text_type",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "main_topic",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "main_topic",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": 60764768,
          "cachedResultUrl": "https://docs.google.YOUR_AWS_SECRET_KEY_HERE_amoMjNx9azfxJKYPZNU-ek/edit#gid=60764768",
          "cachedResultName": "output"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1HeeycUkLvoP1tzQw9R5n_amoMjNx9azfxJKYPZNU-ek",
          "cachedResultUrl": "https://docs.google.YOUR_AWS_SECRET_KEY_HERE_amoMjNx9azfxJKYPZNU-ek/edit?usp=drivesdk",
          "cachedResultName": "urls_AI_ML"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "a6e6c98c-42c4-48aa-99c5-361f21cc48cb",
      "name": "Loop Over Items",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        -448,
        352
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 3
    },
    {
      "id": "8a96f291-630a-4e60-b45c-df66395d5455",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1328,
        -160
      ],
      "parameters": {
        "width": 320,
        "height": 880,
        "content": "###  **AI Research Scraper & Summary Generator**\n\n## How it works  \nThis workflow takes a list of links from Google Sheets, visits each page, extracts the main text using [Decodo](https://visit.decodo.com/raqXGD), and creates a summary with the help of artificial intelligence.  \nIt helps you turn research articles or web pages into clear, structured insights you can reuse for your projects, content ideas, or newsletters.\n\n**Input:** A Google Sheet named `input` with one column called `url`.  \n**Output:** Another Google Sheet named `output`, where all the processed data is stored:  \n- **URL:** original article link  \n- **Title:** article title  \n- **Source:** website or domain  \n- **Published Date:** publication date (if found)  \n- **Main Topic:** main theme of the article  \n- **Key Ideas:** three main takeaways or insights  \n- **Summary:** short text summary  \n- **Text Type:** type of content (e.g., article, blog, research paper)\n\n## Setup steps  \n1. Connect your Google Sheets account.  \n2. Add your links to the `input` sheet.  \n3. In the **[Decodo](https://visit.decodo.com/raqXGD)** node, insert your API key.  \n4. Configure the AI model (for example, Gemini).  \n5. Run the workflow and check the results in the `output` sheet.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "3c3e9c54-3c83-421f-8b9b-2ac4848d3ac7",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -848,
        48
      ],
      "parameters": {
        "color": 7,
        "height": 176,
        "content": "###  **Get Links**  \nReads all the URLs from the `input` sheet and prepares them one by one for processing.  \n**Input:** Google Sheets (`input`)  \n**Output:** A list of URLs to be scraped.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "5d64789c-0e26-4990-8c07-424503cd46f8",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -576,
        32
      ],
      "parameters": {
        "color": 7,
        "height": 176,
        "content": "### **Scrape Content**  \nUses [Decodo](https://visit.decodo.com/raqXGD) to extract the main text and metadata from each webpage.  \n**Input:** URL  \n**Output:** Raw HTML and metadata ready for cleaning.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "768d2f2d-7c8d-4d93-b7ea-98da875c6015",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        560,
        32
      ],
      "parameters": {
        "color": 7,
        "height": 192,
        "content": "###  **Save Results**  \nSaves the final information (title, summary, ideas, etc.) into the `output` Google Sheet.  \n**Input:** Structured data  \n**Output:** A growing list of summarized articles ready to review.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "8dde3144-d797-4df7-9e74-61efd07d38b8",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        240,
        32
      ],
      "parameters": {
        "color": 7,
        "height": 176,
        "content": "### **Parse & Format Data**  \nChecks and formats the JSON output to make sure all fields are complete and readable.  \n**Input:** AI response  \n**Output:** Organized data ready to be added to Google Sheets."
      },
      "typeVersion": 1
    },
    {
      "id": "7f5d31e5-c175-4aaa-9fe9-a2ce2cd475d6",
      "name": "Sticky Note6",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -304,
        16
      ],
      "parameters": {
        "color": 7,
        "height": 256,
        "content": "###  **Clean Text**  \nRemoves unnecessary HTML and keeps only the useful article text (title, source, and date).  \n**Input:** HTML content from Decodo  \n**Output:** Clean, readable text ready for AI analysis."
      },
      "typeVersion": 1
    },
    {
      "id": "3290ee6a-14ad-4c1e-8c54-69d7effd6829",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -48,
        32
      ],
      "parameters": {
        "color": 7,
        "width": 256,
        "height": 208,
        "content": "### **Generate Summary (AI)**  \nSends the clean text to the AI model (like Gemini) to create a short summary with key ideas and topics.  \n**Input:** Clean text  \n**Output:** AI-generated summary in JSON format.\n"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "10d230f1-e3a1-4c94-a98c-f8df799862af",
  "connections": {
    "Decodo": {
      "main": [
        [
          {
            "node": "Code in JavaScript",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "AI Agent": {
      "main": [
        [
          {
            "node": "Code in JavaScript1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over Items": {
      "main": [
        [],
        [
          {
            "node": "Decodo",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Code in JavaScript": {
      "main": [
        [
          {
            "node": "AI Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Append row in sheet": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Code in JavaScript1": {
      "main": [
        [
          {
            "node": "Append row in sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get row(s) in sheet": {
      "main": [
        [
          {
            "node": "Loop Over Items",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "AI Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "When clicking \u2018Execute workflow\u2019": {
      "main": [
        [
          {
            "node": "Get row(s) in sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow takes a list of links from Google Sheets, visits each page, extracts the main text using Decodo, and creates a summary with the help of artificial intelligence. It helps you turn research articles or web pages into clear, structured insights you can reuse for your…

Source: https://n8n.io/workflows/11389/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

Build a Multi-modal Telegram AI Assistant with Gemini, Voice & Image Generation

This workflow creates a multi-talented AI assistant named Simran that interacts with users via Telegram. It can handle text and voice messages, understand the user's intent, and perform various tasks.

MongoDB, Chain Llm, Google Gemini Chat +11

AI & RAG

University Faq & Calendar Assistant with Telegram, Mongodb and Gemini AI

This project is a template for building a complete academic virtual assistant using n8n. It connects to Telegram, answers frequently asked questions by querying MongoDB, keeps the community informed a

Telegram, MongoDB, Telegram Trigger +6

AI & RAG

Convert Emailed Timesheets Into Quickbooks Invoices with Ocr, Ai, Gmail and Sheets

> Note: This workflow uses sticky notes extensively to document each logical section of the automation. Sticky notes are mandatory and already included to explain OCR, AI parsing, folder logic, dup

QuickBooks, Google Sheets, Google Drive +5

AI & RAG

Create an AI Image Remix and Design Bot for Telegram with Browseract and Gemini

This workflow transforms your Telegram bot into an intelligent creative assistant. It can chat conversationally, fetch trending image prompts from PromptHero for inspiration, or perform a deep "remix"

Telegram Trigger, Output Parser Structured, Telegram +6

AI & RAG

Nutrition Tracker & Meal Logger with Telegram, Gemini AI and Google Sheets

> AI-powered nutrition assistant for Telegram — log meals, set goals, and get personalized daily reports with Google Sheets integration.

Telegram, Google Gemini, Google Gemini Chat +7

Web Research Summarizer with Decodo Scraper and Google Gemini AI

The workflow JSON

About this workflow

Related workflows