AutomationFlowsData & Sheets › Bright Data Research Papers to Google Sheets

Bright Data Research Papers to Google Sheets

Original n8n title: Automate Research Paper Collection with Bright Data & N8n

ByYaron Been @yaron-nofluff on n8n.io

This workflow automatically collects and organizes research papers from academic databases and journals into Google Sheets. It helps researchers and students save time by eliminating manual searches across multiple academic sources and centralizing research materials.

Event trigger★★★★☆ complexity12 nodesHTTP RequestGoogle Sheets
Data & Sheets Trigger: Event Nodes: 12 Complexity: ★★★★☆ Added:

This workflow corresponds to n8n.io template #5221 — we link there as the canonical source.

This workflow follows the Google Sheets → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "giq3zqaP4QbY6LgC",
  "name": "Research_Paper_Scraper_to_Google_Sheets",
  "tags": [],
  "nodes": [
    {
      "id": "7d81edf3-6f00-4634-b79f-dbda3f9958e5",
      "name": "Start Scraping (Manual Trigger)",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -1080,
        580
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "6e172db5-7483-4079-bf8a-785602526bdc",
      "name": "Set Research topic",
      "type": "n8n-nodes-base.set",
      "position": [
        -860,
        580
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "b530a847-0bb2-4039-9ad0-cbc9cc4d909e",
              "name": "Topic",
              "type": "string",
              "value": "machine+learning"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "e65d092a-6854-478c-b33e-2fc309f71ae8",
      "name": "Send Request to Bright Data API",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -600,
        580
      ],
      "parameters": {
        "url": "https://api.brightdata.com/request",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "sendHeaders": true,
        "bodyParameters": {
          "parameters": [
            {
              "name": "zone",
              "value": "n8n_unblocker"
            },
            {
              "name": "url",
              "value": "=https://scholar.google.com/scholar?q={{ $json.Topic }}"
            },
            {
              "name": "country",
              "value": "us"
            },
            {
              "name": "format",
              "value": "raw"
            }
          ]
        },
        "headerParameters": {
          "parameters": [
            {
              "name": "Authorization",
              "value": "Bearer YOUR_TOKEN_HERE"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "211bae33-32c5-44e8-b306-a5e0d520a4a0",
      "name": "Extract Data from HTML (Title, Author, etc.)",
      "type": "n8n-nodes-base.html",
      "position": [
        -400,
        580
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "Title",
              "cssSelector": "h3.gs_rt, a.gs_rt",
              "returnArray": true
            },
            {
              "key": "Author",
              "cssSelector": "div.gs_a",
              "returnArray": true
            },
            {
              "key": "Abstract",
              "cssSelector": "div.gs_rs",
              "returnArray": true
            },
            {
              "key": "PDF Link\t",
              "cssSelector": "a[href*='pdf']",
              "returnArray": true,
              "returnValue": "attribute"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "9ab7ba20-8614-46c5-b57a-3749d6ae04c4",
      "name": "Clean & Structure Extracted Data",
      "type": "n8n-nodes-base.code",
      "position": [
        -200,
        580
      ],
      "parameters": {
        "jsCode": "const titles = items[0].json.Title || [];\nconst authors = items[0].json.Author || [];\nconst abstracts = items[0].json.Abstract || [];\nconst pdfLinks = items[0].json[\"PDF Link\\t\"] || [];\n\nconst output = [];\n\nfor (let i = 0; i < titles.length; i++) {\n  // Clean title (remove tags like [PDF][B])\n  let title = titles[i].replace(/\\[.*?\\]/g, '').trim();\n\n  // Clean author (remove any trailing dashes or HTML leftovers)\n  let author = authors[i] ? authors[i].replace(/\\s*-\\s*.*/, '').trim() : '';\n\n  // Abstract fallback\n  let abstract = abstracts[i] || '';\n\n  // Get PDF link \u2014 from either a single object or array of duplicates\n  let linkObj = pdfLinks[i];\n  let pdfLink = '';\n\n  if (Array.isArray(linkObj)) {\n    // If multiple objects per item\n    pdfLink = linkObj.find(obj => obj.href)?.href || '';\n  } else if (linkObj?.href) {\n    pdfLink = linkObj.href;\n  }\n\n  // Push cleaned object\n  output.push({\n    json: {\n      title,\n      author,\n      abstract,\n      pdfLink\n    }\n  });\n}\n\nreturn output;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "a246f20c-2bb9-4319-8812-e296c87a7df0",
      "name": "Save Results to Google Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        120,
        580
      ],
      "parameters": {
        "columns": {
          "value": {
            "Topic": "={{ $('Set Research topic').item.json.Topic }}",
            "title": "={{ $json.title }}",
            "author": "={{ $json.author }}",
            "abstract": "={{ $json.abstract }}",
            "pdf link": "={{ $json.pdfLink }}"
          },
          "schema": [
            {
              "id": "Topic",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "Topic",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "title",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "title",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "author",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "author",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "abstract",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "abstract",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "pdf link",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "pdf link",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1sOfCFsvHS9-BeE_PQ6J_jtQofCRcOv02XS7hrmFmpxQ/edit#gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "1sOfCFsvHS9-BeE_PQ6J_jtQofCRcOv02XS7hrmFmpxQ",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1sOfCFsvHS9-BeE_PQ6J_jtQofCRcOv02XS7hrmFmpxQ/edit?usp=drivesdk",
          "cachedResultName": "Research papers from Google Scholar"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.6
    },
    {
      "id": "1b4a1504-4a4a-4a0d-892b-d0c3e205ed85",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1140,
        60
      ],
      "parameters": {
        "color": 5,
        "width": 420,
        "height": 720,
        "content": "## \ud83d\udd39 **Section 1: User Input & Trigger**\n\n**\ud83e\udde9 Nodes: Start Scraping | Set Topic**\n\ud83d\udccd **Purpose:** Let users easily input the topic they want to scrape \u2014 no need to deal with complex URLs.\n\n| \ud83e\uddf1 Node   | \u2705 New Name                   | \ud83d\udca1 Description                                                                                                                                                                         |\n| --------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| \u26a1 Trigger | **Start Scraping (Manual)**  | This node starts the workflow when you click \u201cExecute Workflow.\u201d It's the entry point.                                                                                                 |\n| \u270f\ufe0f Set    | **Set Topic (Manual Input)** | Instead of requiring a URL, the user will enter a topic (like \"machine learning\" or \"digital marketing\"). This topic will be used to automatically generate the URL behind the scenes. |\n\n### \ud83e\udde0 How it helps:\n\n* Great for beginners: Just type the topic, hit run.\n* Keeps the interface clean and user-friendly.\n* Avoids confusion around URLs and formats.\n\n---\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "bc56f528-6d18-4e05-942f-c06bb6e10b27",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -660,
        80
      ],
      "parameters": {
        "color": 6,
        "width": 600,
        "height": 700,
        "content": "## \ud83d\udd38 **Section 2: Scrape & Parse Website**\n\n**\ud83e\udde9 Nodes: Send Request | Extract HTML | Clean Data**\n\ud83d\udccd **Purpose:** Uses the Bright Data proxy to access the webpage, extract raw HTML content, and clean it up into a readable format (title, author, abstract, etc.).\n\n| \ud83e\uddf1 Node         | \u2705 New Name                            | \ud83d\udca1 Description                                                                                                                                                        |\n| --------------- | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| \ud83c\udf10 HTTP Request | **Send Topic Request to Bright Data** | This sends a request to the Bright Data API using the topic you set earlier. It uses Bright Data\u2019s network to safely load the actual website and return HTML content. |\n| \ud83e\uddf1 HTML Extract | **Extract Data from Webpage**         | Parses the returned HTML to find relevant data like titles, authors, abstracts, and links.                                                                            |\n| \ud83d\udd23 Code         | **Clean and Format Scraped Data**     | A custom code block that organizes the messy data into neat records. For example: title \u2192 column A, abstract \u2192 column B, etc.                                         |\n\n### \ud83e\udde0 How it helps:\n\n* Makes web scraping safe and reliable by using proxies.\n* Converts unreadable HTML into structured information.\n* Beginner-friendly: No need to write a parser yourself.\n\n---\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "2c54e5e6-011a-4562-98ac-9cc9834bc284",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        0,
        0
      ],
      "parameters": {
        "color": 3,
        "width": 340,
        "height": 780,
        "content": "## \ud83d\udfe2 **Section 3: Save to Google Sheets**\n\n**\ud83e\udde9 Node: Append to Google Sheets**\n\ud83d\udccd **Purpose:** Automatically sends the clean data into a Google Sheet for easy access, filtering, or sharing.\n\n| \ud83e\uddf1 Node          | \u2705 New Name                            | \ud83d\udca1 Description                                                                                                                      |\n| ---------------- | ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |\n| \ud83d\udcc4 Google Sheets | **Store Scraped Data in Spreadsheet** | Takes the structured output and appends it to the connected Google Sheet. Each result gets a row with title, author, abstract, etc. |\n\n### \ud83e\udde0 How it helps:\n\n* No manual copy-pasting ever again!\n* Shareable and searchable format.\n* Updates automatically as you scrape more topics.\n\n---\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "4ce90703-961e-4070-9356-c9dffc23a6c5",
      "name": "Sticky Note9",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2980,
        80
      ],
      "parameters": {
        "color": 4,
        "width": 1300,
        "height": 320,
        "content": "=======================================\n            WORKFLOW ASSISTANCE\n=======================================\nFor any questions or support, please contact:\n    Yaron@nofluff.online\n\nExplore more tips and tutorials here:\n   - YouTube: https://www.youtube.com/@YaronBeen/videos\n   - LinkedIn: https://www.linkedin.com/in/yaronbeen/\n=======================================\n"
      },
      "typeVersion": 1
    },
    {
      "id": "069ddb89-f7a1-4c4b-b65d-212be3252750",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2980,
        420
      ],
      "parameters": {
        "color": 4,
        "width": 1289,
        "height": 1878,
        "content": "## \ud83c\udf1f Research Paper Scraper to Google Sheets\n\n**Automate extraction of data from any website based on a topic \u2014 no coding needed!**\n\n---\n\n## \ud83d\udd39 **Section 1: User Input & Trigger**\n\n**\ud83e\udde9 Nodes: Start Scraping | Set Topic**\n\ud83d\udccd **Purpose:** Let users easily input the topic they want to scrape \u2014 no need to deal with complex URLs.\n\n| \ud83e\uddf1 Node   | \u2705 New Name                   | \ud83d\udca1 Description                                                                                                                                                                         |\n| --------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| \u26a1 Trigger | **Start Scraping (Manual)**  | This node starts the workflow when you click \u201cExecute Workflow.\u201d It's the entry point.                                                                                                 |\n| \u270f\ufe0f Set    | **Set Topic (Manual Input)** | Instead of requiring a URL, the user will enter a topic (like \"machine learning\" or \"digital marketing\"). This topic will be used to automatically generate the URL behind the scenes. |\n\n### \ud83e\udde0 How it helps:\n\n* Great for beginners: Just type the topic, hit run.\n* Keeps the interface clean and user-friendly.\n* Avoids confusion around URLs and formats.\n\n---\n\n## \ud83d\udd38 **Section 2: Scrape & Parse Website**\n\n**\ud83e\udde9 Nodes: Send Request | Extract HTML | Clean Data**\n\ud83d\udccd **Purpose:** Uses the Bright Data proxy to access the webpage, extract raw HTML content, and clean it up into a readable format (title, author, abstract, etc.).\n\n| \ud83e\uddf1 Node         | \u2705 New Name                            | \ud83d\udca1 Description                                                                                                                                                        |\n| --------------- | ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| \ud83c\udf10 HTTP Request | **Send Topic Request to Bright Data** | This sends a request to the Bright Data API using the topic you set earlier. It uses Bright Data\u2019s network to safely load the actual website and return HTML content. |\n| \ud83e\uddf1 HTML Extract | **Extract Data from Webpage**         | Parses the returned HTML to find relevant data like titles, authors, abstracts, and links.                                                                            |\n| \ud83d\udd23 Code         | **Clean and Format Scraped Data**     | A custom code block that organizes the messy data into neat records. For example: title \u2192 column A, abstract \u2192 column B, etc.                                         |\n\n### \ud83e\udde0 How it helps:\n\n* Makes web scraping safe and reliable by using proxies.\n* Converts unreadable HTML into structured information.\n* Beginner-friendly: No need to write a parser yourself.\n\n---\n\n## \ud83d\udfe2 **Section 3: Save to Google Sheets**\n\n**\ud83e\udde9 Node: Append to Google Sheets**\n\ud83d\udccd **Purpose:** Automatically sends the clean data into a Google Sheet for easy access, filtering, or sharing.\n\n| \ud83e\uddf1 Node          | \u2705 New Name                            | \ud83d\udca1 Description                                                                                                                      |\n| ---------------- | ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |\n| \ud83d\udcc4 Google Sheets | **Store Scraped Data in Spreadsheet** | Takes the structured output and appends it to the connected Google Sheet. Each result gets a row with title, author, abstract, etc. |\n\n### \ud83e\udde0 How it helps:\n\n* No manual copy-pasting ever again!\n* Shareable and searchable format.\n* Updates automatically as you scrape more topics.\n\n---\n\n## \u2705 What a Beginner Gains from This Workflow\n\n| \ud83d\udca1 Feature                  | \ud83d\ude80 Benefit                                                                        |\n| --------------------------- | --------------------------------------------------------------------------------- |\n| Topic-based input           | You don\u2019t need to find or understand complex URLs. Just type \u201cAI\u201d or \u201cmarketing.\u201d |\n| Fully automated scraping    | You don\u2019t need to open browsers or inspect elements.                              |\n| Ready-to-use Google Sheet   | The final data is clean and saved into a sheet you can use anywhere.              |\n| Beautiful, modular workflow | Each step is visual, editable, and reusable without coding skills.                |\n\n---\n\n## \ud83c\udfaf Final Result:\n\nYou type a **topic** \u2192 Bright Data scrapes the web \u2192 It extracts content \u2192 Cleans it \u2192 Saves it into **Google Sheets**.\nEverything happens automatically. **No code. No hassle. Just data.**\n\n---\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "a1a5e609-756a-4757-a026-1349cf388e61",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        400,
        0
      ],
      "parameters": {
        "color": 7,
        "width": 380,
        "height": 240,
        "content": "## I\u2019ll receive a tiny commission if you join Bright Data through this link\u2014thanks for fueling more free content!\n\n### https://get.brightdata.com/1tndi4600b25"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "f931202a-3c22-495d-b775-71665bdf6c27",
  "connections": {
    "Set Research topic": {
      "main": [
        [
          {
            "node": "Send Request to Bright Data API",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Send Request to Bright Data API": {
      "main": [
        [
          {
            "node": "Extract Data from HTML (Title, Author, etc.)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Start Scraping (Manual Trigger)": {
      "main": [
        [
          {
            "node": "Set Research topic",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Clean & Structure Extracted Data": {
      "main": [
        [
          {
            "node": "Save Results to Google Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Data from HTML (Title, Author, etc.)": {
      "main": [
        [
          {
            "node": "Clean & Structure Extracted Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow automatically collects and organizes research papers from academic databases and journals into Google Sheets. It helps researchers and students save time by eliminating manual searches across multiple academic sources and centralizing research materials.

Source: https://n8n.io/workflows/5221/ — original creator credit. Request a take-down →

More Data & Sheets workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Data & Sheets

This template is ideal for solo store owners, eCommerce marketers, automation beginners, or anyone using Shopify and Gmail who wants to recover lost revenue without coding.

HTTP Request, Gmail, Twilio +3
Data & Sheets

PCN. Uses googleSheets, httpRequest, @n-octo-n/n8n-nodes-json-database, itemLists. Event-driven trigger; 60 nodes.

Google Sheets, HTTP Request, @N Octo N/N8N Nodes Json Database +3
Data & Sheets

The workflow automates the process of gathering extensive keyword data for a "Main Keyword." It starts by reading initial parameters from a Google Sheets template, creates a new dedicated Google Sheet

Google Sheets, Google Drive, HTTP Request
Data & Sheets

🔥 March Sale – n8n Community Members Get ideoGener8r for Just $27! (Reg. $47) Use Coupon Code: (Valid until 3/31/2025 for n8n community members)

HTTP Request, Google Drive, Google Sheets
Data & Sheets

📄 Documentation: Notion Guide

Google Sheets, Google Drive, HTTP Request +2