AutomationFlowsWeb Scraping › Generate Research Questions From Pdfs Using Infranodus Content Gap Analysis

Generate Research Questions From Pdfs Using Infranodus Content Gap Analysis

ByInfraNodus @infranodus on n8n.io

This template can be used to generate research questions from PDF documents (e.g. research papers, market reports) based on the content gaps found in text using the InfraNodus knowledge graph GraphRAG knowledge graph representation.

Event trigger★★★★☆ complexity15 nodesHTTP RequestForm TriggerForm
Web Scraping Trigger: Event Nodes: 15 Complexity: ★★★★☆ Added:

This workflow corresponds to n8n.io template #5744 — we link there as the canonical source.

This workflow follows the Form → Form Trigger recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "oOxhkss1gOyLvJyf",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Generate Research Questions and AI Prompts from PDF Documents based on Content Gaps",
  "tags": [
    {
      "id": "66wgFoDi9Xjl74M3",
      "name": "Support",
      "createdAt": "2025-05-21T17:06:32.355Z",
      "updatedAt": "2025-05-21T17:06:32.355Z"
    },
    {
      "id": "kRM0hQV2zw7VxrON",
      "name": "Research",
      "createdAt": "2025-05-21T19:44:19.136Z",
      "updatedAt": "2025-05-21T19:44:19.136Z"
    },
    {
      "id": "sJk9cUvmMU8FkJXv",
      "name": "AI",
      "createdAt": "2025-05-20T13:16:15.636Z",
      "updatedAt": "2025-05-20T13:16:15.636Z"
    }
  ],
  "nodes": [
    {
      "id": "a2339bb9-abb9-41bf-8064-8c3af94df039",
      "name": "Convert File to PDF",
      "type": "n8n-nodes-base.httpRequest",
      "disabled": true,
      "position": [
        1880,
        180
      ],
      "parameters": {
        "url": "https://v2.convertapi.com/convert/pdf/to/txt",
        "method": "POST",
        "options": {
          "response": {
            "response": {
              "responseFormat": "text"
            }
          }
        },
        "sendBody": true,
        "contentType": "multipart-form-data",
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "file",
              "parameterType": "formBinaryData",
              "inputDataFieldName": "data"
            }
          ]
        },
        "genericAuthType": "httpBearerAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "Accept",
              "value": "application/octet-stream"
            }
          ]
        }
      },
      "credentials": {
        "httpBearerAuth": {
          "name": "<your credential>"
        }
      },
      "notesInFlow": true,
      "typeVersion": 4.2
    },
    {
      "id": "989a0d6c-12a1-45d4-8b6b-206855177df7",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1840,
        -400
      ],
      "parameters": {
        "color": 2,
        "width": 360,
        "height": 820,
        "content": "## Optional: Better PDF Conversion\n\n### Standard Map PDF to Text node will split your PDF files into very short chunks, which deteriorates retrieval. \n\nUse can use [ConvertAPI](https://convertapi.com?ref=4l54n) which is a high-quality convertor that will respect the layout of the original document and not cut the paragraphs into short chunks. \n\nHere is an HTTP node that makes a request to their API to convert the PDF into text. If you have a ConvertAPI account, you can replace the \"Extract Text from PDF\" node in Step 3 with this node. \n\nNote that you will need to map the text output from this node correctly in the Step 4 after.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "4d698efb-6b02-4e95-9a6f-dca8bc1fbea7",
      "name": "On form submission",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        -380,
        -60
      ],
      "parameters": {
        "options": {
          "appendAttribution": false
        },
        "formTitle": "Find Content Gaps in Your PDF Files",
        "formFields": {
          "values": [
            {
              "fieldType": "file",
              "fieldLabel": "Add Your Files",
              "acceptFileTypes": ".pdf"
            }
          ]
        },
        "formDescription": "Upload the files you'd like to analyze and we will extract content gaps and interesting questions based on them."
      },
      "typeVersion": 2.2
    },
    {
      "id": "5798ca71-eb05-4097-ace1-9457be450e21",
      "name": "Convert binary files to PDF",
      "type": "n8n-nodes-base.code",
      "position": [
        -60,
        -60
      ],
      "parameters": {
        "jsCode": "let results = [];\n\nfor (let item of items) {\n    if (item.binary) {\n        // If there's binary data in the item, process each binary file\n        for (let key in item.binary) {\n            // Use the key as the file name\n            let binaryKey = key.replace(/\\s/g, '_'); // Replace spaces with underscores for the key\n            results.push({\n                json: {\n                    fileName: binaryKey\n                },\n                binary: {\n                    [binaryKey]: item.binary[key] // Use the modified key for the binary data\n                }\n            });\n        }\n    }\n}\n\nreturn results;\n"
      },
      "typeVersion": 2
    },
    {
      "id": "6770b616-db38-48a8-8063-f3f5639d0946",
      "name": "Extract text from PDF files",
      "type": "n8n-nodes-base.extractFromFile",
      "position": [
        280,
        -60
      ],
      "parameters": {
        "options": {},
        "operation": "pdf",
        "binaryPropertyName": "={{ $json.fileName }}"
      },
      "typeVersion": 1
    },
    {
      "id": "2e47ecb8-cb9e-434a-ae9e-aae2ddb5fb54",
      "name": "Prepare for InfraNodus",
      "type": "n8n-nodes-base.code",
      "position": [
        580,
        -60
      ],
      "parameters": {
        "jsCode": "\nlet plainText = '' // we send plain text from all the PDFs to InfraNodus for analysis\n\nconst randomNum = Math.floor(Math.random() * 3); // replace this with a 0 if you'd like to address the biggest gap in the knowledge graph\n\nfor (let item of items) {\n   plainText += item.json.text + '\\n\\n'  \n}\n\n\nreturn {text: plainText, randomNum};"
      },
      "typeVersion": 2
    },
    {
      "id": "422faf1e-1545-4e5e-98aa-75c51a06c863",
      "name": "Display on the Form to the User",
      "type": "n8n-nodes-base.form",
      "position": [
        1380,
        -60
      ],
      "parameters": {
        "operation": "completion",
        "respondWith": "showText",
        "responseText": "=<br>\n<h3>{{ $json.aiAdvice[0].text }}</h3>\n<br>\n"
      },
      "typeVersion": 1
    },
    {
      "id": "820a3e64-1108-4c41-88e2-90d98fbd548f",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -420,
        -400
      ],
      "parameters": {
        "height": 520,
        "content": "## Step 1: User uploads the PDF files for analysis\n\n### You can expose this endpoint and make it publicly available via a URL to your organization."
      },
      "typeVersion": 1
    },
    {
      "id": "8dc2769a-b797-42c4-b531-82ce7f866dac",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -120,
        -400
      ],
      "parameters": {
        "width": 280,
        "height": 520,
        "content": "## Step 2: Convert uploaded binaries into PDF files\n\n### We need to convert the binaries uploaded to the PDF files so we can extract text from them."
      },
      "typeVersion": 1
    },
    {
      "id": "8f8cf47a-726c-4db8-9362-f68c94e75254",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        220,
        -400
      ],
      "parameters": {
        "width": 220,
        "height": 520,
        "content": "## Step 3: Extract plain text from PDF files\n\n### For better quality text extraction, you can use the optional [ConvertAPI](https://convertapi.com?ref=4l54n) node to the right, which respects the files' original formatting."
      },
      "typeVersion": 1
    },
    {
      "id": "f8da708a-ab86-40d7-bef0-dea600e5a032",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        520,
        -400
      ],
      "parameters": {
        "width": 220,
        "height": 520,
        "content": "## Step 4: Combine extracted text into a text string\n\n### Prepare data for InfraNodus: combine all the extracted text into a text string and also tell InfraNodus the gap depth it should use when generating advice"
      },
      "typeVersion": 1
    },
    {
      "id": "d63bb46c-5d0a-4fb9-8f9c-06a3aba63959",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        820,
        -400
      ],
      "parameters": {
        "width": 380,
        "height": 820,
        "content": "## Step 5: Use InfraNodus GraphRAG to build a knowledge graph, find the gap, and generate a research question based on it.\n\n### [InfraNodus](https://infranodus.com) builds a knowledge graph from all the texts, identifies the topical clusters that are least connected, and generates a research question that has a potential to bridge them in a new way.\n\n\ud83d\udea8 PROVIDE YOUR INFRANODUS API KEY HERE"
      },
      "typeVersion": 1
    },
    {
      "id": "0eeaef66-b9f5-4589-b663-9a992913fe1e",
      "name": "Sticky Note6",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1280,
        -400
      ],
      "parameters": {
        "width": 380,
        "height": 820,
        "content": "## Step 6: Show question / prompt to the user\n\n### Optionally, you can feed the response to your other n8n workflow or expose it via a webhook and show it in your own app using an iframe."
      },
      "typeVersion": 1
    },
    {
      "id": "b82fefa5-5882-4ecf-8b68-f884b42411c9",
      "name": "Sticky Note7",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -420,
        180
      ],
      "parameters": {
        "color": 5,
        "width": 1160,
        "height": 1000,
        "content": "# How does InfraNodus GraphRAG generate research questions?\n\n## [InfraNodus](https://infranodus.com) GraphRAG helps avoid generic responses and LLM bias through analyzing your text's structure. Here's how it works:\n\n### 1. It represents your text as a network of concepts and relations building a knowledge graph.\n\n### 2. It then identifies the clusters of cocnepts that are furthest apart from each other \u2014 they appear in the same context (your texts) but are not well connected.\n\n### 3. InfraNodus will then use the AI to generate a question / prompt that bridges this gap \u2014 touching upon relevant topics but connecting them in a new way.\n\n![structural gap infranodus](https://infranodus.com/images/front/infranodus-structural-gaps-ideas.jpg)"
      },
      "typeVersion": 1
    },
    {
      "id": "da7cf09a-d3f3-41d8-9ae4-4b4b1bcfc80f",
      "name": "InfraNodus GraphRAG Question Generator",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        960,
        0
      ],
      "parameters": {
        "url": "=https://infranodus.com/api/v1/graphAndAdvice?doNotSave=true&optimize=develop&includeGraph=false&includeGraphSummary=true&gapDepth={{ $json.randomNum }}",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "authentication": "genericCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "aiTopics",
              "value": "true"
            },
            {
              "name": "requestMode",
              "value": "question"
            },
            {
              "name": "text",
              "value": "={{ $json.text }}"
            }
          ]
        },
        "genericAuthType": "httpBearerAuth"
      },
      "credentials": {
        "httpBearerAuth": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.2
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "abd96e27-8999-4490-9c4c-8eda846dfc3b",
  "connections": {
    "On form submission": {
      "main": [
        [
          {
            "node": "Convert binary files to PDF",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Prepare for InfraNodus": {
      "main": [
        [
          {
            "node": "InfraNodus GraphRAG Question Generator",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Convert binary files to PDF": {
      "main": [
        [
          {
            "node": "Extract text from PDF files",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract text from PDF files": {
      "main": [
        [
          {
            "node": "Prepare for InfraNodus",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "InfraNodus GraphRAG Question Generator": {
      "main": [
        [
          {
            "node": "Display on the Form to the User",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This template can be used to generate research questions from PDF documents (e.g. research papers, market reports) based on the content gaps found in text using the InfraNodus knowledge graph GraphRAG knowledge graph representation.

Source: https://n8n.io/workflows/5744/ — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This workflow allows you to import any workflow from a file or another n8n instance and map the credentials easily. A multi-form setup guides you through the entire process At the beginning you have t

Execute Command, Read Write File, HTTP Request +3
Web Scraping

N8n recently introduced folders and it has been a big improvement on workflow management on top of the tags.

HTTP Request, n8n, Form Trigger +1
Web Scraping

Git Commit. Uses github, n8n, formTrigger, httpRequest. Event-driven trigger; 34 nodes.

GitHub, n8n, Form Trigger +2
Web Scraping

Small businesses, consultants, agencies… anyone who bills with PayPal.

Form, Form Trigger, HTTP Request
Web Scraping

Transform your GLPI system's user experience with a modern, optimized web interface that simplifies technical support ticket creation. How it works

Form Trigger, Form, HTTP Request