AutomationFlowsWeb Scraping › Scheduled arXiv Paper Fetcher

Scheduled arXiv Paper Fetcher

Original n8n title: Arxiv Fetch

arxiv_fetch. Uses httpRequest. Scheduled trigger; 5 nodes.

Cron / scheduled trigger★★★★☆ complexity5 nodesHTTP Request
Web Scraping Trigger: Cron / scheduled Nodes: 5 Complexity: ★★★★☆ Added:

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "name": "arxiv_fetch",
  "nodes": [
    {
      "parameters": {
        "triggerTimes": {
          "item": [
            {
              "mode": "everyX",
              "value": 6,
              "unit": "hours"
            }
          ]
        }
      },
      "id": "Cron",
      "name": "Cron",
      "type": "n8n-nodes-base.cron",
      "typeVersion": 2,
      "position": [
        260,
        260
      ]
    },
    {
      "parameters": {
        "url": "https://export.arxiv.org/api/query?search_query=all:(LLM%20OR%20%5C%22prompt%20engineering%5C%22%20OR%20RAG%20OR%20Agents)&sortBy=lastUpdatedDate&sortOrder=descending&max_results=20",
        "responseFormat": "string",
        "options": {}
      },
      "id": "FetchArxiv",
      "name": "Fetch arXiv",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4,
      "position": [
        520,
        260
      ]
    },
    {
      "parameters": {
        "functionCode": "// Parse arXiv Atom XML to JSON items\nconst text = typeof items[0].json === \"string\" ? items[0].json : JSON.stringify(items[0].json);\nconst rawEntries = text.split('<entry>').slice(1).map(e => e.split('</entry>')[0]);\nfunction pick(e, tag){ const m = e.match(new RegExp('<' + tag + '>([\\\\s\\\\S]*?)</' + tag + '>')); return m ? m[1].replace(/<[^>]+>/g,'').trim() : ''; }\nfunction pickLink(e){ const m = e.match(/<link[^>]*href=\\\"([^\\\\\"]+)\\\"/); return m ? m[1] : ''; }\nfunction pickAuthors(e){ const arr = e.match(/<author>[\\\\s\\\\S]*?<name>([\\\\s\\\\S]*?)<\\\\/name>[\\\\s\\\\S]*?<\\\\/author>/g) || []; return arr.map(a => { const mm = a.match(/<name>([\\\\s\\\\S]*?)<\\\\/name>/); return mm ? mm[1].trim() : ''; }).filter(Boolean).join(', '); }\nconst parsed = rawEntries.map(e => { const idraw = pick(e,'id'); const arxivId = idraw.includes('abs/') ? idraw.split('abs/').pop().split('v')[0] : undefined; const title = pick(e,'title'); const summary = pick(e,'summary'); const url = pickLink(e); const authors = pickAuthors(e); const published_at = pick(e,'updated') || pick(e,'published'); return { arxiv_id: arxivId, title, authors, summary, url, tags: 'LLM', published_at }; });\nreturn parsed.map(p => ({ json: p }));"
      },
      "id": "Transform",
      "name": "Transform",
      "type": "n8n-nodes-base.function",
      "typeVersion": 2,
      "position": [
        770,
        260
      ]
    },
    {
      "parameters": {
        "options": {},
        "url": "http://127.0.0.1:8000/api/v1/papers/import",
        "method": "POST",
        "jsonParameters": true,
        "sendHeaders": true,
        "headerParametersJson": "={\\\"x-api-key\\\": $json.n8n_api_key}",
        "bodyParametersJson": "={\\\"items\\\": $items.map(i => i.json)}"
      },
      "id": "PostBackend",
      "name": "POST Backend",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4,
      "position": [
        1020,
        260
      ]
    },
    {
      "parameters": {
        "values": {
          "string": [
            {
              "name": "n8n_api_key",
              "value": "CHANGE_ME"
            }
          ]
        },
        "options": {}
      },
      "id": "SetApiKey",
      "name": "Set API Key",
      "type": "n8n-nodes-base.set",
      "typeVersion": 2,
      "position": [
        520,
        380
      ]
    }
  ],
  "connections": {
    "Cron": {
      "main": [
        [
          {
            "node": "Fetch arXiv",
            "type": "main",
            "index": 0
          },
          {
            "node": "Set API Key",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch arXiv": {
      "main": [
        [
          {
            "node": "Transform",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Transform": {
      "main": [
        [
          {
            "node": "POST Backend",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set API Key": {
      "main": [
        [
          {
            "node": "POST Backend",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

arxiv_fetch. Uses httpRequest. Scheduled trigger; 5 nodes.

Source: https://github.com/charlietrallie/HubTool/blob/7b8b20b4815ce81150dfbe84c4531e827ccc07ab/automations/n8n/arxiv_fetch.json — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

As n8n instances scale, teams often lose track of sub-workflows—who uses them, where they are referenced, and whether they can be safely updated. This leads to inefficiencies like unnecessary copies o

HTTP Request, n8n, N8N Trigger +1
Web Scraping

This workflow is an improvement of this workflow by Greg Brzezinka.

HTTP Request, Email Send, XML +1
Web Scraping

N8N-Workflow-Github-Manager. Uses github, httpRequest, n8n. Scheduled trigger; 38 nodes.

GitHub, HTTP Request, n8n
Web Scraping

This workflow uses KlickTipp community nodes, available for self-hosted n8n instances only.

N8N Nodes Klicktipp, Salesforce, Salesforce Trigger +1
Web Scraping

This workflow acts as an automated engagement bot. It sends a Direct Message (DM) with a link or resource to any follower who replies to your post with a specific target keyword.

HTTP Request