AutomationFlowsAI & RAG › Analyze Documents & Web Content with Gpt-4o Q&a Assistant

Analyze Documents & Web Content with Gpt-4o Q&a Assistant

ByAadarsh Jain @aadarsh-jain on n8n.io

AI-powered document and web page analysis using n8n and GPT model. Ask questions about any local file or web URL and get intelligent, formatted answers.

Chat trigger trigger★★★★☆ complexityAI-powered12 nodesChat TriggerRead Binary FileHTTP RequestOpenAI ChatAgent
AI & RAG Trigger: Chat trigger Nodes: 12 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #9651 — we link there as the canonical source.

This workflow follows the Agent → Chat Trigger recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "NMAA4tOidWSLW3On",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Document Analyzer and Q&A",
  "tags": [],
  "nodes": [
    {
      "id": "8ec9d857-a965-47e0-a367-3172a1056232",
      "name": "Document Q&A Chat",
      "type": "@n8n/n8n-nodes-langchain.chatTrigger",
      "position": [
        192,
        208
      ],
      "parameters": {
        "options": {}
      },
      "typeVersion": 1.1
    },
    {
      "id": "aaaefabe-e8fe-45b0-b841-109afc0049b3",
      "name": "Parse Document & Question",
      "type": "n8n-nodes-base.code",
      "position": [
        432,
        208
      ],
      "parameters": {
        "jsCode": "const chatInput = $input.first().json.chatInput;\nconst parts = chatInput.split('|').map(part => part.trim());\n\nif (parts.length < 2) {\n  throw new Error('Please provide input in format: \"document_path_or_url | your_question\"\\nExamples:\\n- \"/Users/docs/readme.md | What is this project about?\"\\n- \"https://docs.example.com/api | What are the endpoints?\"');\n}\n\nconst documentPath = parts[0];\nconst userQuestion = parts.slice(1).join('|').trim();\n\nif (!documentPath || documentPath.length < 3) {\n  throw new Error('Please provide a valid document path or URL');\n}\n\nif (!userQuestion || userQuestion.length < 5) {\n  throw new Error('Please provide a detailed question (minimum 5 characters)');\n}\n\nconst isUrl = documentPath.startsWith('http://') || documentPath.startsWith('https://');\n\nlet fileType = 'unknown';\nlet inputType = 'file';\n\nif (isUrl) {\n  inputType = 'url';\n  fileType = 'html';\n  \n  const urlParts = documentPath.split('.');\n  if (urlParts.length > 1) {\n    const possibleExtension = urlParts[urlParts.length - 1].split(/[?#]/)[0].toLowerCase();\n    const supportedTypes = ['pdf', 'md', 'txt', 'doc', 'docx', 'json', 'yaml', 'yml'];\n    if (supportedTypes.includes(possibleExtension)) {\n      fileType = possibleExtension;\n    }\n  }\n} else {\n  const fileExtension = documentPath.split('.').pop().toLowerCase();\n  const supportedTypes = ['pdf', 'md', 'txt', 'doc', 'docx', 'json', 'yaml', 'yml'];\n  \n  if (!supportedTypes.includes(fileExtension)) {\n    throw new Error(`Unsupported file type: ${fileExtension}. Supported types: ${supportedTypes.join(', ')}`);\n  }\n  \n  fileType = fileExtension;\n}\n\nreturn {\n  documentPath: documentPath,\n  userQuestion: userQuestion,\n  fileType: fileType,\n  inputType: inputType,\n  isUrl: isUrl,\n  timestamp: new Date().toISOString()\n};"
      },
      "typeVersion": 2
    },
    {
      "id": "8ad92e8a-54e5-4896-84aa-a054abc5c7f3",
      "name": "File Path Check",
      "type": "n8n-nodes-base.if",
      "position": [
        672,
        96
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 1,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "file-condition",
              "operator": {
                "type": "boolean",
                "operation": "equals"
              },
              "leftValue": "={{ $json.isUrl }}",
              "rightValue": false
            }
          ]
        }
      },
      "typeVersion": 2
    },
    {
      "id": "afd1bad4-bc51-4f3d-8099-98dd9de1aac4",
      "name": "Read Document File",
      "type": "n8n-nodes-base.readBinaryFile",
      "position": [
        832,
        96
      ],
      "parameters": {
        "filePath": "={{ $json.documentPath }}"
      },
      "typeVersion": 1
    },
    {
      "id": "c7024fab-3b1c-4892-abab-152a91c94b53",
      "name": "URL Check",
      "type": "n8n-nodes-base.if",
      "position": [
        672,
        320
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 1,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "url-condition",
              "operator": {
                "type": "boolean",
                "operation": "equals"
              },
              "leftValue": "={{ $json.isUrl }}",
              "rightValue": true
            }
          ]
        }
      },
      "typeVersion": 2
    },
    {
      "id": "bdb7e2fd-1441-4cac-a66e-dc20d8f04aeb",
      "name": "Fetch Web Content",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        832,
        320
      ],
      "parameters": {
        "url": "={{ $json.documentPath }}",
        "options": {
          "timeout": 30000,
          "redirect": {
            "redirect": {
              "maxRedirects": 5
            }
          },
          "response": {
            "response": {
              "responseFormat": "text"
            }
          }
        },
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "User-Agent",
              "value": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/0.0.0.0 Safari/537.36"
            },
            {
              "name": "Accept",
              "value": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            },
            {
              "name": "Accept-Language",
              "value": "en-US,en;q=0.9"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "9a96077e-2158-42c0-853f-3dd43188c28d",
      "name": "Extract Document Content",
      "type": "n8n-nodes-base.extractFromFile",
      "position": [
        992,
        96
      ],
      "parameters": {
        "options": {},
        "operation": "text"
      },
      "typeVersion": 1
    },
    {
      "id": "053e4752-6062-4641-8119-3d7aacbf6af1",
      "name": "Process Document Content",
      "type": "n8n-nodes-base.code",
      "position": [
        1200,
        208
      ],
      "parameters": {
        "jsCode": "const fileType = $node['Parse Document & Question'].json.fileType;\nconst inputType = $node['Parse Document & Question'].json.inputType;\nconst isUrl = $node['Parse Document & Question'].json.isUrl;\nconst documentPath = $node['Parse Document & Question'].json.documentPath;\n\nlet extractedContent;\nlet processedContent;\n\nconst inputData = $input.first();\n\nif (isUrl) {\n  extractedContent = inputData.json.body || inputData.json.data || inputData.json || '';\n  if (fileType === 'html') {\n    const htmlString = typeof extractedContent === 'object' ? JSON.stringify(extractedContent) : String(extractedContent || '');\n    processedContent = htmlString\n      .replace(/<script[^>]*>[\\s\\S]*?<\\/script>/gi, '')\n      .replace(/<style[^>]*>[\\s\\S]*?<\\/style>/gi, '')\n      .replace(/<[^>]*>/g, ' ')\n      .replace(/&nbsp;/g, ' ')\n      .replace(/&amp;/g, '&')\n      .replace(/&lt;/g, '<')\n      .replace(/&gt;/g, '>')\n      .replace(/&quot;/g, '\"')\n      .replace(/&#39;/g, \"'\")\n      .replace(/\\s+/g, ' ')\n      .trim();\n  } else {\n    processedContent = typeof extractedContent === 'object' ? JSON.stringify(extractedContent) : String(extractedContent || '');\n  }\n} else {\n  extractedContent = inputData.json.data || inputData.json.text || inputData.json || inputData.binary?.data?.toString() || '';\n  processedContent = typeof extractedContent === 'object' ? JSON.stringify(extractedContent) : String(extractedContent || '');\n}\nswitch(fileType) {\n  case 'json':\n    try {\n      const jsonData = JSON.parse(processedContent);\n      processedContent = JSON.stringify(jsonData, null, 2);\n    } catch (e) {\n      processedContent = processedContent;\n    }\n    break;\n    \n  case 'yaml':\n  case 'yml':\n    break;\n    \n  case 'md':\n    processedContent = processedContent\n      .replace(/^\\s*```[^\\n]*\\n/gm, '\\n--- CODE BLOCK ---\\n')\n      .replace(/^\\s*```\\s*$/gm, '\\n--- END CODE BLOCK ---\\n');\n    break;\n    \n  case 'html':\n    break;\n    \n  default:\n    processedContent = processedContent\n      .replace(/\\s+/g, ' ')\n      .trim();\n}\n\nif (!processedContent || processedContent.length < 10) {\n  throw new Error(`Failed to extract meaningful content from ${isUrl ? 'web page' : 'document'}. Content may be empty, corrupted, or unsupported.`);\n}\n\nconst maxLength = 15000;\nconst originalLength = processedContent.length;\nif (processedContent.length > maxLength) {\n  processedContent = processedContent.substring(0, maxLength) + '\\n\\n[Content truncated for processing...]';\n}\n\nreturn {\n  documentContent: processedContent,\n  documentPath: documentPath,\n  fileType: fileType,\n  inputType: inputType,\n  isUrl: isUrl,\n  contentLength: processedContent.length,\n  isContentTruncated: originalLength > maxLength\n};"
      },
      "typeVersion": 2
    },
    {
      "id": "ce10391d-1e7b-4a68-9708-690961633dd1",
      "name": "OpenAI Document Analyzer",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "position": [
        1408,
        368
      ],
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4o"
        },
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "62a54e68-13ae-468c-acf0-b164138a19c0",
      "name": "Analyze Document & Answer",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        1408,
        208
      ],
      "parameters": {
        "text": "=You are an expert document analysis assistant. Your task is to carefully read and understand the provided content, then answer the user's question accurately and comprehensively with well-formatted, human-readable output.\n\n**Content Information:**\n- Source: {{ $node['Parse Document & Question'].json.isUrl ? 'Web URL' : 'Local File' }}\n- Path/URL: {{ $node['Parse Document & Question'].json.documentPath }}\n- Content Type: {{ $node['Process Document Content'].json.fileType }}\n- Input Type: {{ $node['Process Document Content'].json.inputType }}\n- Content Length: {{ $node['Process Document Content'].json.contentLength }} characters\n- Content Truncated: {{ $node['Process Document Content'].json.isContentTruncated }}\n\n**Content:**\n{{ $node['Process Document Content'].json.documentContent }}\n\n**User Question:**\n{{ $node['Parse Document & Question'].json.userQuestion }}\n\n**Instructions:**\n1. Start your response with a header showing the source and question\n2. **Carefully read and analyze** the entire content provided above\n3. **Understand the context** and structure of the content\n4. **Answer the user's question** based on the content\n5. **Format your response** for maximum readability using proper formatting\n6. **Use tables, bullet points, and structured layouts** when appropriate\n7. **If the answer is not in the content**, clearly state that the information is not available\n8. **Provide context** around your answer when helpful\n9. **For web content**, focus on the main content and ignore navigation/footer elements\n\n**Response Formatting Requirements:**\n- Use bullet points (\u2022) for lists and key points\n- Create tables when presenting structured data or comparisons\n- Use headings and subheadings for organization\n- Include numbered steps for processes or procedures\n- Quote relevant sections with proper formatting\n- Use emojis sparingly for visual appeal and clarity\n\n**Response Structure:**\n\n# \ud83d\udcca Document Analysis Report\n\n## \ud83c\udfaf Query Information\n\u2022 **Question Asked:** {{ $node['Parse Document & Question'].json.userQuestion }}\n\u2022 **Source Type:** {{ $node['Parse Document & Question'].json.isUrl ? '\ud83c\udf10 Web Page' : '\ud83d\udcc4 Local File' }}\n\u2022 **Source:** {{ $node['Parse Document & Question'].json.documentPath }}\n\u2022 **Analysis Status:** \u2705 Complete\n\n---\n\n## \ud83d\udccb **Direct Answer**\n[Provide a clear, concise answer to the user's question]\n\n## \ud83d\udcd6 **Key Information**\n\u2022 [Main point 1]\n\u2022 [Main point 2] \n\u2022 [Main point 3]\n\n## \ud83d\udcca **Detailed Analysis**\n[Use tables, bullet points, or structured format as appropriate]\n\n| Aspect | Details |\n|--------|----------|\n| [Key 1] | [Value 1] |\n| [Key 2] | [Value 2] |\n\n## \ud83d\udd0d **Supporting Evidence**\n> \"[Relevant quote from content]\"\n\n\u2022 **Section Reference:** [Where this information was found]\n\u2022 **Context:** [Additional context if needed]\n\n## \u26a0\ufe0f **Important Notes**\n\u2022 [Any limitations, caveats, or important considerations]\n\u2022 [Whether content was truncated and might affect completeness]\n\n## \ud83d\udca1 **Summary**\n[Brief summary if the response was complex]\n\n**Remember:** Format your response to be highly readable with proper structure, bullet points, tables, and clear organization. Base your answer ONLY on the content provided.",
        "options": {
          "systemMessage": "You are a helpful document analysis assistant that provides clear, well-formatted answers based on document content."
        },
        "promptType": "define"
      },
      "typeVersion": 2.1
    },
    {
      "id": "890ac685-4897-46aa-9e1a-e96566891b81",
      "name": "Workflow Info",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        192,
        -352
      ],
      "parameters": {
        "color": 4,
        "width": 416,
        "height": 528,
        "content": "## Hybrid Document & Web Analyzer\n\n### Purpose\nAnalyze documents (PDF, MD, TXT, JSON, YAML) and web pages, then answer user questions about the content.\n\n### Input Format\n\"path_or_url | your_question\"\n\n### Supported Sources\n- **Local Files:** PDF, Markdown, Text, JSON, YAML, Word docs\n- **Web URLs:** Documentation sites, HTML pages, online docs\n\n### Features\n- Content extraction and processing\n- Web page content fetching\n- HTML to text conversion\n- Intelligent Q&A with GPT-4o\n- Clean, focused responses"
      },
      "typeVersion": 1
    },
    {
      "id": "149ed301-6176-4f63-9890-7b69aaf8b264",
      "name": "Pipeline Info",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        672,
        -160
      ],
      "parameters": {
        "color": 5,
        "width": 500,
        "height": 216,
        "content": "## Simplified Processing Pipeline\n\n1. **Parse Input** - Extract document path and question\n2. **Read File** - Load document content\n3. **Extract Content** - Handle different file formats\n4. **Process** - Clean and prepare content\n5. **Analyze** - AI-powered question answering\n6. **Final Response** - Clean, formatted output"
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "0b3b020f-958d-4e8c-acc8-5648076da1a5",
  "connections": {
    "URL Check": {
      "main": [
        [
          {
            "node": "Fetch Web Content",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "File Path Check": {
      "main": [
        [
          {
            "node": "Read Document File",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Document Q&A Chat": {
      "main": [
        [
          {
            "node": "Parse Document & Question",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Web Content": {
      "main": [
        [
          {
            "node": "Process Document Content",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Read Document File": {
      "main": [
        [
          {
            "node": "Extract Document Content",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Document Content": {
      "main": [
        [
          {
            "node": "Process Document Content",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Document Analyzer": {
      "ai_languageModel": [
        [
          {
            "node": "Analyze Document & Answer",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Process Document Content": {
      "main": [
        [
          {
            "node": "Analyze Document & Answer",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Parse Document & Question": {
      "main": [
        [
          {
            "node": "File Path Check",
            "type": "main",
            "index": 0
          },
          {
            "node": "URL Check",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

AI-powered document and web page analysis using n8n and GPT model. Ask questions about any local file or web URL and get intelligent, formatted answers.

Source: https://n8n.io/workflows/9651/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

💰 Beginner Investor – Learn the market faster with AI-powered insights guiding your decisions. 📈 Retail Trader – Optimize your trading strategy with in-depth analysis typically reserved for profession

HTTP Request, Output Parser Structured, OpenAI Chat +4
AI & RAG

by Varritech Technologies

Chat Trigger, Agent, OpenAI Chat +8
AI & RAG

Who’s it for Creators who want to create faceless videos automatically, while keeping human oversight and quality control.

Read Write File, Agent, OpenAI Chat +7
AI & RAG

The Best Linkedin Posting System. Uses httpRequest, lmChatOpenAi, agent, chatTrigger. Chat trigger; 49 nodes.

HTTP Request, OpenAI Chat, Agent +8
AI & RAG

Who is this workflow for? This workflow is designed for SEO analysts, content creators, marketing agencies, and developers who need to index a website and then interact with its content as if it were

Agent, OpenAI Chat, Memory Buffer Window +10