AutomationFlowsAI & RAG › Daily RAG Research Paper Hub with Arxiv, Gemini Ai, and Notion

Daily RAG Research Paper Hub with Arxiv, Gemini Ai, and Notion

Bydongou @dongou on n8n.io

Fetch user-specific research papers from arXiv on a daily schedule, process and structure the data, and create or update entries in a Notion database, with support for data delivery Paper Topic: single query keyword Update Frequency: Daily updates, with fewer than 20 entries…

Cron / scheduled trigger★★★★☆ complexityAI-powered22 nodesChain LlmGoogle Gemini ChatHTTP RequestGmailGoogle GeminiNotion
AI & RAG Trigger: Cron / scheduled Nodes: 22 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #8847 — we link there as the canonical source.

This workflow follows the Chainllm → Gmail recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "nodes": [
    {
      "id": "7e9f18f1-edfe-4af6-835b-12fe16a99034",
      "name": "Basic LLM Chain",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        272,
        0
      ],
      "parameters": {
        "text": "={{ $json.data }}",
        "batching": {},
        "messages": {
          "messageValues": [
            {
              "message": "You are a paper content analysis assistant. You can analyze and inspect JSON data, accurately identify the content in the `summary` field, make judgments, and enrich the data items. The main tasks are as follows:\n\n1. RAG Relevance and Labeling:\n   - Analyze the `summary` field to determine whether the content is related to RAG (Retrieval-Augmented Generation) and assign labels.\n   - For each data item, add three new fields:\n     - `RAG_TF`: \"T\" if related, \"F\" if not\n     - `RAG_REASON`: if not related, provide the reason in English; otherwise, leave empty\n     - `RAG_Category`: if related, assign a category label based on the `summary` content (e.g., Framework / Application / \u2026); otherwise, leave empty\n\n2. RAG Method Extraction:\n   - Analyze the `summary` and extract the RAG method proposed in the paper.\n   - Store it in the new field `RAG_NAME`.\n\n3. External Link Extraction:\n   - Analyze the `summary` content for `github` or `huggingface` links.\n   - If present, extract the URLs and populate the existing `github` and `huggingface` fields.\n   - If not present, leave them unchanged.\n\nOutput Format: standard JSON\n\nExample:\n\nGiven a data item with the following `summary`:\n\n\"summary\":\"Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answer\n"
            }
          ]
        },
        "promptType": "define"
      },
      "typeVersion": 1.7
    },
    {
      "id": "92d37dc1-aaaf-47ec-987a-e6d23c93e055",
      "name": "Google Gemini Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        272,
        144
      ],
      "parameters": {
        "options": {},
        "modelName": "=models/gemini-2.5-flash"
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "aaa67776-c308-443e-98f6-e1fe7035cbb5",
      "name": "submittedDate:T-1",
      "type": "n8n-nodes-base.code",
      "position": [
        -1664,
        320
      ],
      "parameters": {
        "jsCode": "// Function \u8282\u70b9\u4ee3\u7801\nconst now = new Date();\nconst yesterday = new Date(now);\nyesterday.setDate(now.getDate() - 2);\n\nconst y = yesterday.getFullYear();\nconst m = String(yesterday.getMonth() + 1).padStart(2, '0');\nconst d = String(yesterday.getDate()).padStart(2, '0');\n\nreturn [\n  {\n    json: {\n      from: `${y}${m}${d}0000`,\n      to: `${y}${m}${d}2359`\n    }\n  }\n];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "c3685631-8bbd-409a-978a-fbb3e9847115",
      "name": "If",
      "type": "n8n-nodes-base.if",
      "position": [
        -160,
        16
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "de0a5a7e-67dd-4dd0-8ccc-3406e17bd09c",
              "operator": {
                "type": "number",
                "operation": "notEquals"
              },
              "leftValue": "={{ $json.paperCount }}",
              "rightValue": 0
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "4dd24343-1872-472d-8d7d-4cd28a9dbabe",
      "name": "Schedule Trigger",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        -1856,
        320
      ],
      "parameters": {
        "rule": {
          "interval": [
            {
              "triggerAtHour": 6
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "a38b1b58-a6f6-4c6b-ba6e-f153980a220d",
      "name": "FEISHU",
      "type": "n8n-nodes-base.switch",
      "position": [
        576,
        720
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "7b804f5e-6702-4d4a-99b9-3f06f8eb20d4",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "feishu"
                  }
                ]
              }
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 3.2
    },
    {
      "id": "ac6b1c0d-b18e-4b42-b49e-8cb4daf0d384",
      "name": "FEISHU POST",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        800,
        720
      ],
      "parameters": {
        "url": "=",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "bodyParameters": {
          "parameters": [
            {
              "name": "msg_type",
              "value": "={{ $json.msg_type }}"
            },
            {
              "name": "content",
              "value": "={{ $json.content }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "9151ab18-379f-4d3b-8ca2-cf65c547e78d",
      "name": "gmail",
      "type": "n8n-nodes-base.switch",
      "position": [
        576,
        544
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "3222832c-bbf2-46a2-abd8-2bb14095b7bf",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $json.type }}",
                    "rightValue": "gmail"
                  }
                ]
              }
            }
          ]
        },
        "options": {}
      },
      "typeVersion": 3.2
    },
    {
      "id": "869f80ec-c14c-4d1e-ae11-bb6eb4c99e5d",
      "name": "Send a message",
      "type": "n8n-nodes-base.gmail",
      "position": [
        800,
        544
      ],
      "parameters": {
        "sendTo": "user@example.com",
        "message": "={{ $json.message }}",
        "options": {},
        "subject": "={{ $json.subject }}"
      },
      "credentials": {
        "gmailOAuth2": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.1
    },
    {
      "id": "3df82b76-e9c8-4b0b-a552-428f2fc12c97",
      "name": "Message a model",
      "type": "@n8n/n8n-nodes-langchain.googleGemini",
      "position": [
        -1040,
        320
      ],
      "parameters": {
        "modelId": {
          "__rl": true,
          "mode": "list",
          "value": "models/gemini-2.5-flash-lite",
          "cachedResultName": "models/gemini-2.5-flash-lite"
        },
        "options": {},
        "messages": {
          "values": [
            {
              "role": "model",
              "content": "You are a daily paper content summarization assistant capable of analyzing XML data. Your main tasks are as follows:\n\n1. Set the daily title field `Title`: {yyyy-mm-dd} paper summary\n2. Set the daily date field `Date`: yyyy-mm-dd\n3. Identify the `<opensearch:totalResults>` tag in the XML and set its numeric value to the field `Number of papers`.\n4. Provide a brief summary of all papers for the day, covering all topics. Set the Chinese summary as `SUMMARY_CN` and the English summary as `SUMMARY_EN`. Ensure that both summaries reflect the comprehensive summary of all papers for the day.\n5. Output format: standard JSON. If there are no papers for the day, set `Number of papers` to 0, but still include the `SUMMARY_CN` and `SUMMARY_EN` fields with empty content.\n\nExample: If there are papers:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 2,\n  \"SUMMARY_CN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a KG based on climate publications to improve access and utilization of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm AGP-Static++ and enhancing dynamic graph support for better query and update efficiency.\",\n  \"SUMMARY_EN\": \"Today's papers cover the Knowledge Graph (KG) for climate knowledge and the Approximate Graph Propagation (AGP) framework. The first paper introduces a domain-specific KG built from climate publications aimed at improving access and use of climate science literature. The second paper focuses on the AGP framework, proposing a new algorithm, AGP-Static++, and improving dynamic graph support, enhancing query and update efficiency.\"\n}\n\nIf the number of papers is 0, maintain the JSON structure:\n{\n  \"Number of papers\":\"2025-09-13 paper summary\",\n  \"Date\":2025-09-13,\n  \"Number of papers\": 0,\n  \"SUMMARY_CN\": \"\",\n  \"SUMMARY_EN\": \"\"\n}"
            },
            {
              "content": "={{ $json.data }}"
            }
          ]
        },
        "simplify": false
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "024c6399-857e-45a3-a15d-8b733e16da67",
      "name": "RAG Daily Paper Summary",
      "type": "n8n-nodes-base.notion",
      "position": [
        800,
        320
      ],
      "parameters": {
        "title": "={{ $json.title }}",
        "simple": false,
        "options": {},
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "mode": "list",
          "value": "26fa136d-cee4-8092-8b85-cf9e9cbc424f",
          "cachedResultUrl": "https://www.notion.so/26fa136dcee480928b85cf9e9cbc424f",
          "cachedResultName": "RAG Daily Paper Summary"
        },
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "DATE|date",
              "date": "={{ $json.date }}"
            },
            {
              "key": "Number of papers|number",
              "numberValue": "={{ $json.paperCount }}"
            },
            {
              "key": "SUMMARY_EN|rich_text",
              "textContent": "={{ $json.summaryEN }}"
            },
            {
              "key": "SUMMARY_CN|rich_text",
              "textContent": "={{ $json.summaryCN }}"
            }
          ]
        }
      },
      "credentials": {
        "notionApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "3282f989-a9a4-4d4f-aaf0-097fc0d72e0d",
      "name": "JSON FORMAT",
      "type": "n8n-nodes-base.code",
      "position": [
        -688,
        320
      ],
      "parameters": {
        "jsCode": "const items = $input.all();\nconst response = items[0].json;\n\ntry {\n  // Extract text content from Gemini API response\n  // Note: response is directly an object, not an array\n  const text = response.candidates[0].content.parts[0].text;\n  \n  // Extract JSON content\n  const jsonMatch = text.match(/```json\\n([\\s\\S]*?)\\n```/);\n  const jsonStr = jsonMatch[1];\n  \n  // Parse JSON\n  const data = JSON.parse(jsonStr);\n  \n  // Manually handle duplicate keys - extract from original string\n  const titleMatch = jsonStr.match(/\"Number of papers\":\\s*\"([^\"]+)\"/);\n  const countMatch = jsonStr.match(/\"Number of papers\":\\s*(\\d+)/);\n  \n  // Construct result\n  items[0].json = {\n    title: titleMatch ? titleMatch[1] : '',\n    date: data.Date || '',\n    paperCount: countMatch ? parseInt(countMatch[1]) : 0,\n    summaryCN: data.SUMMARY_CN || '',\n    summaryEN: data.SUMMARY_EN || ''\n  };\n  \n} catch (error) {\n  items[0].json = {\n    error: error.message,\n    originalData: response\n  };\n}\n\nreturn items;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "f1a331fa-d830-4656-b108-7e18e7430b04",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1984,
        544
      ],
      "parameters": {
        "width": 736,
        "height": 768,
        "content": "## 1. Data Retrieval\n### arXiv API\n\nThe arXiv provides a public API that allows users to query research papers by topic or by predefined categories.\n\n[arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html#arxiv-api-users-manual)\n\n**Key Notes:**\n\n1. **Response Format**: The API returns data as a typical *Atom Response*.\n2. **Timezone & Update Frequency**:  \n   - The arXiv submission process operates on a 24-hour cycle.  \n   - Newly submitted articles become available in the API only at midnight *after* they have been processed.  \n   - Feeds are updated daily at midnight Eastern Standard Time (EST).  \n   - Therefore, a single request per day is sufficient.  \n3. **Request Limits**:  \n   - The maximum number of results per call (`max_results`) is **30,000**,  \n   - Results must be retrieved in slices of at most **2,000** at a time, using the `max_results` and `start` query parameters.  \n4. **Time Format**:  \n   - The expected format is `[YYYYMMDDTTTT+TO+YYYYMMDDTTTT]`,  \n   - `TTTT` is provided in 24-hour time to the minute, in GMT.\n\n### Scheduled Task\n\n- **Execution Frequency**: Daily  \n- **Execution Time**: 6:00 AM  \n- **Time Parameter Handling (JS)**:  \n  According to arXiv\u2019s update rules, the scheduled task should query the **previous day\u2019s (T-1)** `submittedDate` data.\n\n"
      },
      "typeVersion": 1
    },
    {
      "id": "ae855e91-2363-4b97-8933-761934b269fe",
      "name": "arXiv API",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -1440,
        320
      ],
      "parameters": {
        "url": "=https://export.arxiv.org/api/query?search_query=all:RAG+AND+submittedDate:[{{$json[\"from\"]}}+TO+{{$json[\"to\"]}}]",
        "options": {},
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "={{ $json.from }}"
            },
            {
              "name": "={{ $json.to }}"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "6f3df3be-a376-42e9-b0be-32c4fba5a8e2",
      "name": "Message Construction",
      "type": "n8n-nodes-base.code",
      "position": [
        -128,
        528
      ],
      "parameters": {
        "jsCode": "// Get current date\nconst now = new Date();\nconst year = now.getFullYear();\nconst month = String(now.getMonth() + 1).padStart(2, '0');\nconst day = String(now.getDate()).padStart(2, '0');\nconst date = `${year}-${month}-${day}`;\n\n// Get input data\nconst inputData = $input.first().json;\n\n// Generate message content\nconst messageContent = inputData.SUMMARY_CN;\n\n// Gmail message body\nconst gmailMessage = {\n    subject: inputData.title || `Daily Paper Summary - ${date}`,\n    message: `<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n    <title> RAG Daily Paper Summary - ${date}</title>\n    <style type=\"text/css\">\n        /* Gmail safe styles */\n        body {\n            font-family: Arial, sans-serif;\n            line-height: 1.4;\n            margin: 0;\n            padding: 0;\n            background-color: #f9f9f9;\n            color: #333333;\n        }\n        \n        table {\n            border-collapse: collapse;\n            mso-table-lspace: 0pt;\n            mso-table-rspace: 0pt;\n        }\n        \n        .email-wrapper {\n            width: 100%;\n            background-color: #f9f9f9;\n            padding: 40px 20px;\n        }\n        \n        .email-container {\n            width: 100%;\n            max-width: 600px;\n            margin: 0 auto;\n            background-color: #ffffff;\n            border-radius: 8px;\n            box-shadow: 0 2px 12px rgba(0, 0, 0, 0.1);\n        }\n        \n        .header {\n            background-color: #2563eb;\n            padding: 24px;\n            text-align: center;\n            border-radius: 8px 8px 0 0;\n        }\n        \n        .header h1 {\n            margin: 0 0 8px 0;\n            font-size: 24px;\n            font-weight: 600;\n            color: #ffffff;\n        }\n        \n        .date {\n            font-size: 14px;\n            color: #ffffff;\n            opacity: 0.9;\n        }\n        \n        .stats {\n            background-color: #f1f5f9;\n            padding: 16px 24px;\n            font-size: 14px;\n            color: #64748b;\n        }\n        \n        .content {\n            padding: 32px 24px 40px 24px;\n        }\n        \n        .section {\n            margin-bottom: 24px;\n        }\n        \n        .section-title {\n            font-size: 16px;\n            font-weight: 600;\n            color: #1e293b;\n            margin-bottom: 12px;\n            padding-bottom: 8px;\n            border-bottom: 1px solid #e2e8f0;\n        }\n        \n        .flag {\n            display: inline-block;\n            width: 20px;\n            height: 14px;\n            margin-right: 8px;\n            border-radius: 2px;\n            vertical-align: middle;\n        }\n        \n        .flag-cn {\n            background-color: #de2910;\n        }\n        \n        .flag-en {\n            background-color: #012169;\n        }\n        \n        .summary {\n            font-size: 14px;\n            line-height: 1.6;\n            color: #475569;\n            padding: 16px;\n            background-color: #f8fafc;\n            border-radius: 6px;\n            border-left: 3px solid #2563eb;\n        }\n        \n        .divider {\n            height: 1px;\n            background-color: #e2e8f0;\n            margin: 20px 0;\n            border: none;\n        }\n        \n        /* Mobile responsive */\n        @media screen and (max-width: 600px) {\n            .email-wrapper {\n                padding: 20px 10px !important;\n            }\n            \n            .header, .stats {\n                padding: 20px 16px !important;\n            }\n            \n            .content {\n                padding: 24px 16px 32px 16px !important;\n            }\n            \n            .email-container {\n                border-radius: 0;\n            }\n        }\n        \n        /* Gmail specific fixes */\n        .gmail-fix {\n            display: none;\n        }\n        \n        /* Outlook specific fixes */\n        .ExternalClass {\n            width: 100%;\n        }\n        \n        .ExternalClass,\n        .ExternalClass p,\n        .ExternalClass span,\n        .ExternalClass font,\n        .ExternalClass td,\n        .ExternalClass div {\n            line-height: 100%;\n        }\n    </style>\n    <!--[if mso]>\n    <style type=\"text/css\">\n        .email-container {\n            width: 600px !important;\n        }\n    </style>\n    <![endif]-->\n</head>\n<body>\n    <table role=\"presentation\" class=\"email-wrapper\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n        <tr>\n            <td align=\"center\">\n                <table role=\"presentation\" class=\"email-container\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\">\n                    <!-- Header -->\n                    <tr>\n                        <td class=\"header\">\n                            <h1>RAG Daily Papers</h1>\n                            <div class=\"date\">${inputData.Date || date}</div>\n                        </td>\n                    </tr>\n                    \n                    <!-- Stats -->\n                    <tr>\n                        <td class=\"stats\">\n                            <strong>${inputData[\"Number of papers\"] || inputData.paperCount || 0} papers</strong> reviewed today\n                        </td>\n                    </tr>\n                    \n                    <!-- Content -->\n                    <tr>\n                        <td class=\"content\">\n                            <!-- Chinese Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                  \ud83c\udde8\ud83c\uddf3 Chinese\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_CN || inputData.summaryCN || 'No Chinese summary available'}\n                                </div>\n                            </div>\n                            \n                            <!-- Divider -->\n                            <hr class=\"divider\">\n                            \n                            <!-- English Section -->\n                            <div class=\"section\">\n                                <h2 class=\"section-title\">\n                                    \ud83c\uddfa\ud83c\uddf8 English\n                                </h2>\n                                <div class=\"summary\">\n                                    ${inputData.SUMMARY_EN || inputData.summaryEN || 'No English summary available'}\n                                </div>\n                            </div>\n                        </td>\n                    </tr>\n                </table>\n            </td>\n        </tr>\n    </table>\n</body>\n</html>`\n};\n\n// Feishu message body\nconst feishuMessage = {\n    msg_type: \"text\",\n    content: {\n        text: `Today ${$input.first().json.date} ${$input.first().json.paperCount}  papers. ${$input.first().json.summaryEN} ${$input.first().json.summaryCN}`\n    }\n};\n\n// n8n output format\nreturn [\n    { json: { type: \"gmail\", ...gmailMessage } },\n    { json: { type: \"feishu\", ...feishuMessage } }\n];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "2582c7df-9b15-4473-bc47-91cf6f7304e0",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -176,
        896
      ],
      "parameters": {
        "width": 1152,
        "height": 576,
        "content": "## 5. Message Push\n\nSet up two channels for message delivery: **EMAIL** and **IM**, and define the message format and content.\n\n### Email: Gmail\n\n**GMAIL OAuth 2.0 \u2013 Official Documentation**  \n[Configure your OAuth consent screen](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#configure-your-oauth-consent-screen)\n\n**Steps:**\n- Enable Gmail API  \n- Create OAuth consent screen  \n- Create OAuth client credentials  \n- Audience: Add **Test users** under Testing status  \n\n**Message format**: HTML  \n(Model: OpenAI GPT \u2014 used to design an HTML email template)\n\n### IM: Feishu (LARK)\n\n**Bots in groups**  \n[Use bots in groups](https://www.larksuite.com/hc/en-US/articles/360048487736-use-bots-in-groups)\n"
      },
      "typeVersion": 1
    },
    {
      "id": "f7ba78f8-19cb-492c-840c-3570d2865fb1",
      "name": "RAG Daily papers",
      "type": "n8n-nodes-base.notion",
      "position": [
        800,
        0
      ],
      "parameters": {
        "title": "={{ $json.title }}",
        "simple": false,
        "blockUi": {
          "blockValues": [
            {
              "textContent": "={{ $json.summary }}"
            }
          ]
        },
        "options": {},
        "resource": "databasePage",
        "databaseId": {
          "__rl": true,
          "mode": "list",
          "value": "26ba136d-cee4-8029-ad3d-e0e8ac64993f",
          "cachedResultUrl": "https://www.notion.so/26ba136dcee48029ad3de0e8ac64993f",
          "cachedResultName": "RAG DAILY"
        },
        "propertiesUi": {
          "propertyValues": [
            {
              "key": "published|date",
              "date": "={{ $json.published }}"
            },
            {
              "key": "summary|rich_text",
              "textContent": "={{ $json.summary }}"
            },
            {
              "key": "id|rich_text",
              "textContent": "={{ $json.id }}"
            },
            {
              "key": "html_url|url",
              "urlValue": "={{ $json.html_url }}"
            },
            {
              "key": "pdf_url|url",
              "urlValue": "={{ $json.pdf_url }}"
            },
            {
              "key": "primary_category|rich_text",
              "textContent": "={{ $json.primary_category }}"
            },
            {
              "key": "github|url",
              "urlValue": "={{ $json.github }}",
              "ignoreIfEmpty": true
            },
            {
              "key": "huggingface|url",
              "urlValue": "={{ $json.huggingface }}",
              "ignoreIfEmpty": true
            },
            {
              "key": "RAG_TF|rich_text",
              "textContent": "={{ $json.RAG_TF }}"
            },
            {
              "key": "RAG_REASON|rich_text",
              "textContent": "={{ $json.RAG_REASON }}"
            },
            {
              "key": "RAG_Category|rich_text",
              "textContent": "={{ $json.RAG_Category }}"
            },
            {
              "key": "RAG_NAME|rich_text",
              "textContent": "={{ $json.RAG_NAME }}"
            },
            {
              "key": "updated|date",
              "date": "={{ $json.updated }}"
            },
            {
              "key": "author|multi_select",
              "multiSelectValue": "={{ $json.authors }}"
            },
            {
              "key": "category|multi_select",
              "multiSelectValue": "={{ $json.categories }}"
            }
          ]
        }
      },
      "credentials": {
        "notionApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "5d897d4d-968b-4336-bbee-d1d3b4dcae06",
      "name": "Data Extraction",
      "type": "n8n-nodes-base.code",
      "position": [
        112,
        0
      ],
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "typeVersion": 2
    },
    {
      "id": "ae2d8994-7a52-4f7b-81fd-61c0538ba380",
      "name": "JSON Format",
      "type": "n8n-nodes-base.code",
      "position": [
        592,
        0
      ],
      "parameters": {
        "jsCode": "// Get input data\nconst xmlData = $('arXiv API').first().json.data\n\nif (!xmlData) {\n    return [{\n        json: {\n            error: \"XML data not found. Please ensure the input contains XML content\",\n            message: \"Check the field names in the input data\",\n            success: false\n        }\n    }];\n}\n\n// Function to format date-time\nfunction formatDateTime(isoString) {\n    if (!isoString) return '';\n    \n    try {\n        const date = new Date(isoString);\n        if (isNaN(date.getTime())) return '';\n        \n        const year = date.getFullYear();\n        const month = String(date.getMonth() + 1).padStart(2, '0');\n        const day = String(date.getDate()).padStart(2, '0');\n        const hours = String(date.getUTCHours()).padStart(2, '0');\n        const minutes = String(date.getUTCMinutes()).padStart(2, '0');\n        const seconds = String(date.getUTCSeconds()).padStart(2, '0');\n        \n        return `${year}-${month}-${day} ${hours}:${minutes}:${seconds}`;\n    } catch (error) {\n        return '';\n    }\n}\n\n// General function to extract tag content\nfunction extractTagContent(xml, tagName) {\n    const regex = new RegExp(`<${tagName}[^>]*>([\\\\s\\\\S]*?)<\\\\/${tagName}>`, 'i');\n    const match = xml.match(regex);\n    return match ? match[1].trim().replace(/\\s+/g, ' ') : '';\n}\n\n// Extract links\nfunction extractLink(entryXml, linkType) {\n    // Fixed link extraction to fit actual XML format\n    // Format: <link href=\"...\" rel=\"...\" type=\"...\"/>\n    const patterns = [\n        new RegExp(`<link[^>]*href=\"([^\"]*)\"[^>]*type=\"${linkType}\"`, 'i'),\n        new RegExp(`<link[^>]*type=\"${linkType}\"[^>]*href=\"([^\"]*)\"`, 'i')\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// Fixed author extraction function - returns array\nfunction extractAuthors(entryXml) {\n    const authorBlocks = entryXml.match(/<author[^>]*>([\\s\\S]*?)<\\/author>/gi) || [];\n    const authors = [];\n    \n    for (const block of authorBlocks) {\n        const nameMatch = block.match(/<name[^>]*>(.*?)<\\/name>/i);\n        if (nameMatch && nameMatch[1].trim()) {\n            authors.push(nameMatch[1].trim());\n        }\n    }\n    \n    return authors; // Return array instead of string\n}\n\n// Extract categories\nfunction extractCategories(entryXml) {\n    const categories = [];\n    const regex = /<category[^>]*term=\"([^\"]*)\"/gi;\n    let match;\n    \n    while ((match = regex.exec(entryXml)) !== null) {\n        if (match[1]) {\n            categories.push(match[1]);\n        }\n    }\n    \n    return categories;\n}\n\n// Extract primary category\nfunction extractPrimaryCategory(entryXml) {\n    // Handle namespace-prefixed primary category extraction\n    const patterns = [\n        /primary_category[^>]*term=\"([^\"]*)\"/i,\n        /arxiv:primary_category[^>]*term=\"([^\"]*)\"/i\n    ];\n    \n    for (const pattern of patterns) {\n        const match = entryXml.match(pattern);\n        if (match && match[1]) {\n            return match[1];\n        }\n    }\n    return '';\n}\n\n// New: extract arxiv comment\nfunction extractArxivComment(entryXml) {\n    const commentMatch = entryXml.match(/<arxiv:comment[^>]*>(.*?)<\\/arxiv:comment>/i);\n    return commentMatch ? commentMatch[1].trim() : '';\n}\n\ntry {\n    // Extract all entry blocks\n    const entryRegex = /<entry[^>]*>([\\s\\S]*?)<\\/entry>/gi;\n    const entries = [];\n    let match;\n    \n    while ((match = entryRegex.exec(xmlData)) !== null) {\n        entries.push(match[1]);\n    }\n    \n    if (entries.length === 0) {\n        return [{\n            json: {\n                error: \"No <entry> elements found\",\n                message: \"Please check if the XML data format is correct\",\n                success: false\n            }\n        }];\n    }\n\n    // Process each entry\n    const processedData = [];\n    let processedCount = 0;\n\n    for (let i = 0; i < entries.length; i++) {\n        const entryXml = entries[i];\n        \n        try {\n            const item = {\n                id: extractTagContent(entryXml, 'id'),\n                updated: formatDateTime(extractTagContent(entryXml, 'updated')),\n                published: formatDateTime(extractTagContent(entryXml, 'published')),\n                title: extractTagContent(entryXml, 'title'),\n                summary: extractTagContent(entryXml, 'summary'),\n                authors: extractAuthors(entryXml), // field name changed to authors, returns array\n                html_url: extractLink(entryXml, 'text/html'),\n                pdf_url: extractLink(entryXml, 'application/pdf'),\n                primary_category: extractPrimaryCategory(entryXml),\n                categories: extractCategories(entryXml), // field name changed to categories\n                arxiv_comment: extractArxivComment(entryXml), // new arxiv comment\n                github: '',\n                huggingface: ''\n            };\n\n            // Validate required fields\n            if (item.id && item.title) {\n                processedData.push(item);\n                processedCount++;\n            }\n            \n        } catch (error) {\n            console.log(`Error processing entry ${i+1}: ${error.message}`);\n            // Continue processing next entry\n        }\n    }\n\n    // Return processed results\n    return [{\n        json: {\n            success: true,\n            message: `Successfully processed ${processedCount} entries`,\n            data: processedData,\n            processing_time: new Date().toISOString()\n        }\n    }];\n\n} catch (error) {\n    // Error handling\n    return [{\n        json: {\n            error: \"An error occurred during processing\",\n            message: error.message,\n            success: false\n        }\n    }];\n}\n"
      },
      "typeVersion": 2
    },
    {
      "id": "8fbefc67-e9f7-4597-b935-d5f5895cf93c",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -160,
        -224
      ],
      "parameters": {
        "width": 656,
        "height": 192,
        "content": "## 3. Data Processing\n\nAnalyze and summarize paper data using AI, then standardize output as JSON.\n\n### Single Paper Basic Information Analysis and Enhancement  \n### Daily Paper Summary and Multilingual Translation"
      },
      "typeVersion": 1
    },
    {
      "id": "884f2c40-4628-4376-a040-709e2db34c48",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1024,
        16
      ],
      "parameters": {
        "width": 624,
        "height": 368,
        "content": "## 4. Data Storage: Notion Database\n\n- Create a corresponding database in Notion with the same predefined field names.  \n- In Notion, create an integration under **Integrations** and grant access to the database. Obtain the corresponding **Secret Key**.  \n- Use the Notion **\"Create a database page\"** node to configure the field mapping and store the data.  \n\n**Notes**  \n- **\"Create a database page\"** only adds new entries; data will not be updated.  \n- The `updated` and `published` timestamps of arXiv papers are in **UTC**.  \n- Notion **single-select** and **multi-select** fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.  \n- Notion does not accept `null` values, which causes a **400 error**.  \n"
      },
      "typeVersion": 1
    },
    {
      "id": "4991129d-9406-4c52-bd8f-87e2721c4a6f",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1088,
        544
      ],
      "parameters": {
        "width": 624,
        "height": 912,
        "content": "## 2. **Data Extraction**\n\n### Data Cleaning Rules (Convert to Standard JSON)\n\n1. **Remove Header**  \n   - Keep only the `<entry></entry>` blocks representing paper items.\n\n2. **Single Item**  \n   - Each `<entry></entry>` represents a single item.\n\n3. **Field Processing Rules**  \n   - `<id></id>` \u27a1\ufe0f `id`  \n     Extract content.  \n     Example: `<id>http://arxiv.org/abs/2409.06062v1</id>` \u2192 `http://arxiv.org/abs/2409.06062v1`  \n   - `<updated></updated>` \u27a1\ufe0f `updated`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<published></published>` \u27a1\ufe0f `published`  \n     Convert timestamp to `yyyy-mm-dd hh:mm:ss`  \n   - `<title></title>` \u27a1\ufe0f `title`  \n     Extract text content  \n   - `<summary></summary>` \u27a1\ufe0f `summary`  \n     Keep text, remove line breaks  \n   - `<author></author>` \u27a1\ufe0f `author`  \n     Combine all authors into an array  \n     Example: `[ \"Ernest Pusateri\", \"Anmol Walia\" ]` (for Notion multi-select field)  \n   - `<arxiv:comment></arxiv:comment>` \u27a1\ufe0f Ignore / discard  \n   - `<link type=\"text/html\">` \u27a1\ufe0f `html_url`  \n     Extract URL  \n   - `<link type=\"application/pdf\">` \u27a1\ufe0f `pdf_url`  \n     Extract URL  \n   - `<arxiv:primary_category term=\"cs.CL\">` \u27a1\ufe0f `primary_category`  \n     Extract `term` value  \n   - `<category>` \u27a1\ufe0f `category`  \n     Merge all `<category>` values into an array  \n     Example: `[ \"eess.AS\", \"cs.SD\" ]` (for Notion multi-select field)  \n\n4. **Add Empty Fields**  \n   - `github`  \n   - `huggingface`\n"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "If": {
      "main": [
        [
          {
            "node": "Data Extraction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "gmail": {
      "main": [
        [
          {
            "node": "Send a message",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "FEISHU": {
      "main": [
        [
          {
            "node": "FEISHU POST",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "arXiv API": {
      "main": [
        [
          {
            "node": "Message a model",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON FORMAT": {
      "main": [
        [
          {
            "node": "RAG Daily Paper Summary",
            "type": "main",
            "index": 0
          },
          {
            "node": "If",
            "type": "main",
            "index": 0
          },
          {
            "node": "Message Construction",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "JSON Format": {
      "main": [
        [
          {
            "node": "RAG Daily papers",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Basic LLM Chain": {
      "main": [
        [
          {
            "node": "JSON Format",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Data Extraction": {
      "main": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message a model": {
      "main": [
        [
          {
            "node": "JSON FORMAT",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "submittedDate:T-1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "submittedDate:T-1": {
      "main": [
        [
          {
            "node": "arXiv API",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message Construction": {
      "main": [
        [
          {
            "node": "gmail",
            "type": "main",
            "index": 0
          },
          {
            "node": "FEISHU",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Fetch user-specific research papers from arXiv on a daily schedule, process and structure the data, and create or update entries in a Notion database, with support for data delivery Paper Topic: single query keyword Update Frequency: Daily updates, with fewer than 20 entries…

Source: https://n8n.io/workflows/8847/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

kisisel asistan. Uses toolWorkflow, toolHttpRequest, toolCalculator, toolThink. Scheduled trigger; 43 nodes.

Tool Workflow, Tool Http Request, Tool Calculator +15
AI & RAG

LinkedIn_Job_Hunt_and_Cover_Letter. Uses outputParserStructured, outputParserAutofixing, googleDrive, agent. Scheduled trigger; 85 nodes.

Output Parser Structured, Output Parser Autofixing, Google Drive +6
AI & RAG

Categories Content Creation AI Automation Publishing Social Media

Google Docs, HTTP Request, Slack +7
AI & RAG

Effortlessly generate, review, and publish SEO-optimized blog posts to WordPress using AI and automation.

WordPress, Google Gemini Chat, Output Parser Structured +6
AI & RAG

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

Form Trigger, HTTP Request, Google Gemini Chat +3