This workflow corresponds to n8n.io template #4336 — we link there as the canonical source.
This workflow follows the Informationextractor → OpenAI Chat recipe pattern — see all workflows that pair these two integrations.
The workflow JSON
Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →
{
"meta": {
"templateCredsSetupCompleted": true
},
"nodes": [
{
"id": "505f25b4-eaa7-4d42-98d9-2154d7d4cc81",
"name": "File Ref",
"type": "n8n-nodes-base.noOp",
"position": [
-1360,
280
],
"parameters": {},
"typeVersion": 1
},
{
"id": "59721669-a3a9-48ab-8cb0-29ea0af84f6d",
"name": "Success1",
"type": "n8n-nodes-base.respondToWebhook",
"onError": "continueRegularOutput",
"position": [
620,
-20
],
"parameters": {
"options": {
"responseCode": 200,
"responseHeaders": {
"entries": [
{
"name": "Content-Type",
"value": "application/json"
}
]
}
},
"respondWith": "json",
"responseBody": "={{\n{\n \"error\": null,\n \"meta\": {\n \"hash\": $('Get File Hash').first().json.data,\n },\n \"result\": $json.response\n}\n}}"
},
"typeVersion": 1.1
},
{
"id": "16479572-c3a6-4127-9bf2-db7380bd734c",
"name": "Is Valid File?",
"type": "n8n-nodes-base.if",
"position": [
-660,
280
],
"parameters": {
"options": {},
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "a277fc5c-6873-49d5-b67b-39df0efc8a61",
"operator": {
"type": "number",
"operation": "lte"
},
"leftValue": "={{ $('Get Filesize').item.json.filesize_in_bytes }}",
"rightValue": "={{ 10000000 }}"
},
{
"id": "d908d5a9-3550-45ce-8645-19c08d3e5982",
"operator": {
"type": "number",
"operation": "lte"
},
"leftValue": "={{ $json.numpages }}",
"rightValue": 20
}
]
}
},
"typeVersion": 2.2
},
{
"id": "d35868fb-2a0b-4f07-a942-216a8d226b46",
"name": "Get Filesize",
"type": "n8n-nodes-base.code",
"position": [
-980,
280
],
"parameters": {
"mode": "runOnceForEachItem",
"jsCode": "function convertToBytes(size) {\n const units = {\n b: 1,\n bytes: 1,\n kb: 1024,\n mb: 1024 * 1024,\n gb: 1024 * 1024 * 1024,\n };\n\n // Extract the number and unit using regex\n const regex = /([\\d.]+)\\s*(bytes|b|kb|mb|gb)/i;\n const match = size.match(regex);\n\n if (!match) {\n throw new Error(`Invalid size format: \"${size}\"`);\n }\n\n const value = parseFloat(match[1]);\n const unit = match[2].toLowerCase();\n\n // Calculate the size in bytes\n const factor = units[unit];\n if (!factor) {\n throw new Error(`Unsupported unit: \"${unit}\"`);\n }\n\n return Math.round(value * factor);\n}\n\nconst data = $('File Ref').first().binary.data;\n\nreturn {\n json: { filesize_in_bytes: convertToBytes(data.fileSize) },\n binary: { data }\n}"
},
"typeVersion": 2
},
{
"id": "87e6b82c-4099-4435-b47f-647ae0b97d39",
"name": "Extract from File",
"type": "n8n-nodes-base.extractFromFile",
"position": [
-820,
280
],
"parameters": {
"options": {},
"operation": "pdf"
},
"typeVersion": 1
},
{
"id": "e4f623cd-418b-49f1-bc1b-2e804a089617",
"name": "Sticky Note1",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1040,
140
],
"parameters": {
"color": 7,
"width": 720,
"height": 480,
"content": "### 2. Validate PDF File\nWe need to add validation here to meet service limits of our LLM. Here, we need to get for filesize and number of pages. "
},
"typeVersion": 1
},
{
"id": "ece12347-4205-447a-85cd-298bfa7595e0",
"name": "Sticky Note2",
"type": "n8n-nodes-base.stickyNote",
"position": [
-280,
-120
],
"parameters": {
"color": 7,
"width": 600,
"height": 480,
"content": "### 3. Document Parsing with Gemini API \nGoogle Gemini is used to breakdown the PDF into topics and provide insights for each. The number of topics and insights are not set and can be variable."
},
"typeVersion": 1
},
{
"id": "c7197f2f-7ac9-4c97-bd58-859c1d0950f0",
"name": "Sticky Note4",
"type": "n8n-nodes-base.stickyNote",
"position": [
340,
-120
],
"parameters": {
"color": 7,
"width": 480,
"height": 280,
"content": "### 5. Format Response\nFinally, we'll format the output to return the summary as markdown."
},
"typeVersion": 1
},
{
"id": "9ef4b032-de21-48bd-ad3a-fb1cf391fe77",
"name": "Information Extractor",
"type": "@n8n/n8n-nodes-langchain.informationExtractor",
"onError": "continueErrorOutput",
"position": [
-180,
40
],
"parameters": {
"text": "=Document as follows:\n{{ $json.text }}",
"options": {
"systemPromptTemplate": "You are a note-taking agent helping the user to summarise this PDF in note form. Breakdown the document into distinct topics and provide a 3 paragraph summary of each topic discussed in the document."
},
"schemaType": "manual",
"inputSchema": "{\n \"type\": \"array\",\n \"items\": {\n \"type\": \"object\",\n \"properties\": {\n \"topic\": { \"type\": \"string\" },\n \"insights\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"object\",\n \"properties\": {\n \"title\": { \"type\": \"string\" },\n \"body\": { \"type\": \"string\" }\n }\n }\n }\n }\n }\n}"
},
"typeVersion": 1
},
{
"id": "4d9f3117-972b-417b-9dee-4098bbd71102",
"name": "Format Response",
"type": "n8n-nodes-base.set",
"position": [
440,
-20
],
"parameters": {
"options": {},
"assignments": {
"assignments": [
{
"id": "10335d9e-7e0a-4ee1-a275-71d2f75b6bba",
"name": "response",
"type": "string",
"value": "={{\n$json.output.map(item => {\nreturn\n`## ${item.topic}\n${item.insights.map(insight =>\n`- **${insight.title}**: ${insight.body}`\n).join('\\n')}\n`;\n}).join('\\n\\n');\n}}"
}
]
}
},
"typeVersion": 3.4
},
{
"id": "db32ecb9-79d0-45ee-aced-05aca99c9f27",
"name": "400 Error",
"type": "n8n-nodes-base.respondToWebhook",
"position": [
-480,
440
],
"parameters": {
"options": {
"responseCode": 400
},
"respondWith": "json",
"responseBody": "{\n \"error\": \"Unable to parse document. Please ensure file is less than 10mb with maximum of 20 pages.\",\n \"result\": []\n}"
},
"typeVersion": 1.1
},
{
"id": "61d6e5e7-dcd7-4cdc-82be-8fec58f26e5a",
"name": "500 Error",
"type": "n8n-nodes-base.respondToWebhook",
"position": [
140,
120
],
"parameters": {
"options": {
"responseCode": 500
},
"respondWith": "json",
"responseBody": "{\n \"error\": \"Unable to parse document. An unknown error occurred.\",\n \"result\": []\n}"
},
"typeVersion": 1.1
},
{
"id": "732d7d7b-f7f5-4342-93b7-e8b49d381dfb",
"name": "Get File Hash",
"type": "n8n-nodes-base.crypto",
"disabled": true,
"position": [
-1180,
280
],
"parameters": {
"value": "={{ $('File Ref').first().binary }}"
},
"typeVersion": 1
},
{
"id": "e6221a49-d4d8-4aa5-bbd8-69b885ff7254",
"name": "POST /ai_pdf_summariser",
"type": "n8n-nodes-base.webhook",
"position": [
-1520,
280
],
"parameters": {
"path": "ai_pdf_summariser",
"options": {
"ignoreBots": false
},
"httpMethod": "POST",
"responseMode": "responseNode"
},
"typeVersion": 2
},
{
"id": "3d605049-571f-45d2-b1ba-914f0c9d68f4",
"name": "Sticky Note9",
"type": "n8n-nodes-base.stickyNote",
"position": [
-1680,
140
],
"parameters": {
"width": 280,
"height": 100,
"content": "### POST /ai_pdf_summariser\nCall this endpoint to upload a document for summarising."
},
"typeVersion": 1
},
{
"id": "d79d5cf3-2f7c-459c-9977-9e0a77ed8d7b",
"name": "OpenAI Chat Model1",
"type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
"position": [
-100,
220
],
"parameters": {
"model": {
"__rl": true,
"mode": "list",
"value": "gpt-4o-mini"
},
"options": {}
},
"credentials": {
"openAiApi": {
"name": "<your credential>"
}
},
"typeVersion": 1.2
}
],
"connections": {
"File Ref": {
"main": [
[
{
"node": "Get File Hash",
"type": "main",
"index": 0
}
]
]
},
"Get Filesize": {
"main": [
[
{
"node": "Extract from File",
"type": "main",
"index": 0
}
]
]
},
"Get File Hash": {
"main": [
[
{
"node": "Get Filesize",
"type": "main",
"index": 0
}
]
]
},
"Is Valid File?": {
"main": [
[
{
"node": "Information Extractor",
"type": "main",
"index": 0
}
],
[
{
"node": "400 Error",
"type": "main",
"index": 0
}
]
]
},
"Format Response": {
"main": [
[
{
"node": "Success1",
"type": "main",
"index": 0
}
]
]
},
"Extract from File": {
"main": [
[
{
"node": "Is Valid File?",
"type": "main",
"index": 0
}
]
]
},
"OpenAI Chat Model1": {
"ai_languageModel": [
[
{
"node": "Information Extractor",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"Information Extractor": {
"main": [
[
{
"node": "Format Response",
"type": "main",
"index": 0
}
],
[
{
"node": "500 Error",
"type": "main",
"index": 0
}
]
]
},
"POST /ai_pdf_summariser": {
"main": [
[
{
"node": "File Ref",
"type": "main",
"index": 0
}
]
]
}
}
}
Credentials you'll need
Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.
openAiApi
For the full experience including quality scoring and batch install features for each workflow upgrade to Pro
About this workflow
This template is for researchers, students, professionals, or content creators who need to quickly extract and summarize key insights from PDF documents using AI-powered analysis.
Source: https://n8n.io/workflows/4336/ — original creator credit. Request a take-down →
Related workflows
Workflows that share integrations, category, or trigger type with this one. All free to copy and import.
The Ultimate Scraper for n8n uses Selenium and AI to retrieve any information displayed on a webpage. You can also use session cookies to log in to the targeted webpage for more advanced scraping need
This template automates the extraction of structured data from Thai government letters received via LINE or uploaded to Google Drive. It uses Mistral AI for OCR and OpenAI for information extraction,
🤝🖊️🤖 This workflow automates the process of retrieving meeting transcripts from Fireflies.ai, extracting and summarizing relevant content using Google Gemini, and sending or drafting well-formatted su
I'll be honest, I built this because I was getting lazy in meetings and missing key details. I started with a simple VEXA integration for transcripts, then added AI to pull out summaries and tasks. Bu
Automatically scrape trending TikTok videos, analyze their virality using Gemini AI, and store insights directly into Airtable for creative research or content planning.