This workflow follows the Agent → HTTP Request recipe pattern — see all workflows that pair these two integrations.
The workflow JSON
Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →
{
"name": "RAG CHATBOT Main",
"nodes": [
{
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"leftValue": "={{ $json.type }}",
"rightValue": "video",
"operator": {
"type": "string",
"operation": "equals"
},
"id": "ee97bdcd-85e9-4a53-9e77-74ff9d8811cd"
}
],
"combinator": "and"
}
},
{
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "57cddf6d-0a6a-43d2-b8a8-d79ee18948cc",
"leftValue": "={{ $json.type }}",
"rightValue": "web",
"operator": {
"type": "string",
"operation": "equals"
}
}
],
"combinator": "and"
}
},
{
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "72df165a-aac1-4b28-a212-d4f0b4e287d6",
"leftValue": "={{ $json.type }}",
"rightValue": "file",
"operator": {
"type": "string",
"operation": "equals",
"name": "filter.operator.equals"
}
}
],
"combinator": "and"
}
}
]
},
"options": {}
},
"type": "n8n-nodes-base.switch",
"typeVersion": 3.2,
"position": [
1780,
-140
],
"id": "b9750e31-4775-498c-865a-2e21f5049980",
"name": "Switch"
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "66c450c0-67d3-43b5-af09-58d197191a99",
"leftValue": "={{ $('Telegram Trigger').item.json.message.document.file_name }}",
"rightValue": "",
"operator": {
"type": "string",
"operation": "exists",
"singleValue": true
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
960,
-80
],
"id": "932dca78-1f92-41c0-9bf3-b13bcec39696",
"name": "file?"
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "dba766ff-d7e2-4b0c-bcb5-5433a1cb9042",
"leftValue": "={{ $('Telegram Trigger').item.json.message.document.mime_type }}",
"rightValue": "^(text/plain|application/pdf|application/vnd\\.openxmlformats-officedocument\\.wordprocessingml\\.document)$",
"operator": {
"type": "string",
"operation": "notRegex"
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
1100,
-280
],
"id": "c5d1005d-e68e-4182-ba67-862807f32265",
"name": "Format ok?"
},
{
"parameters": {
"chatId": "={{ $('Telegram Trigger').item.json.message.from.id }}",
"text": "\u041f\u0440\u0438\u0448\u043b\u0438\u0442\u0435 \u0444\u0430\u0439\u043b\u044b \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0430\u0445 txt, docx, pdf.",
"additionalFields": {
"appendAttribution": false
}
},
"type": "n8n-nodes-base.telegram",
"typeVersion": 1.2,
"position": [
1340,
-280
],
"id": "619c90b1-af63-4418-89c3-1d03810b4417",
"name": "Format Error",
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"resource": "file",
"fileId": "={{ $json.file_id }}"
},
"type": "n8n-nodes-base.telegram",
"typeVersion": 1.2,
"position": [
2280,
480
],
"id": "79f69f51-9384-43fc-825e-a9490674ee79",
"name": "File",
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"updates": [
"message",
"callback_query"
],
"additionalFields": {}
},
"type": "n8n-nodes-base.telegramTrigger",
"typeVersion": 1.2,
"position": [
-580,
520
],
"id": "9232fbfd-e214-407e-9b17-9be50cf2b179",
"name": "Telegram Trigger",
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"model": {
"__rl": true,
"mode": "list",
"value": "gpt-4o-mini"
},
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
"typeVersion": 1.2,
"position": [
2300,
-1080
],
"id": "ce999632-a114-4397-8334-9a151bdf252b",
"name": "OpenAI Chat Model",
"credentials": {
"openAiApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"descriptionType": "manual",
"toolDescription": "Use MCP client for video transcription",
"operation": "executeTool",
"toolName": "={{ $fromAI(\"tool\", \"the tool selected\") }}",
"toolParameters": "={\n \"videoUrl\": \"{{ $fromAI('url', 'URL of the page to transcribe') }}\"\n }"
},
"type": "n8n-nodes-mcp.mcpClientTool",
"typeVersion": 1,
"position": [
2600,
-1080
],
"id": "ad11b3a5-d7a8-44d6-94b0-bccd9227538f",
"name": "Transcribe",
"credentials": {
"mcpClientApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"options": {
"systemMessage": "You are an AI agent integrated into n8n for YouTube video transcription.\n\n**Task:**\n1. **Process Links:**\n * When you receive one or more YouTube video links, process them **sequentially**, one after the other.\n * For each link, use the `executetool` tool.\n * When calling `executetool`, you **must** use the following JSON structure to specify the tool and its parameters, substituting the actual link into the `videoUrl` field:\n ```json\n {\n \"tool\": \"get-youtube-transcript\",\n \"Tool_Parameters\": {\n \"videoUrl\": \"INSERT_VIDEO_LINK_HERE\"\n }\n }\n ```\n2. **Format Response:**\n * Collect the result from the `executetool` execution (i.e., the output of the `get-youtube-transcript` tool) for each processed link.\n * In the final response, for each link, include the **standard success message and result** (including the transcription text) that is typically generated after using the `get-youtube-transcript` tool.\n * Present these standard outputs for each link sequentially within one message, ensuring they are clearly separated (e.g., by a blank line).\n}"
}
},
"type": "@n8n/n8n-nodes-langchain.agent",
"typeVersion": 1.8,
"position": [
2360,
-1220
],
"id": "5215625f-44a5-45f9-860b-2821f3b560f4",
"name": "Paid transcription"
},
{
"parameters": {
"content": "## VIDEO TRANSCRIPT (PAID, API DUMPLINGAI)\n",
"height": 360,
"width": 520,
"color": 4
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
2240,
-1300
],
"id": "33598b24-6a86-46b3-a2fc-5e8826d5b46f",
"name": "Sticky Note"
},
{
"parameters": {
"executeOnce": false,
"command": "=yt-dlp --cookies \"/home/node/cookies.txt\" -x --audio-format mp3 -o \"{{$json.dynamicAudioPath}}\" '{{ $json.url }}'\n\n"
},
"type": "n8n-nodes-base.executeCommand",
"typeVersion": 1,
"position": [
2440,
-600
],
"id": "237012f8-3399-45fa-b74c-c40380f879dc",
"name": "Audio extract",
"executeOnce": false
},
{
"parameters": {
"fileSelector": "={{ $('Set Names').item.json.dynamicAudioPath }}",
"options": {}
},
"type": "n8n-nodes-base.readWriteFile",
"typeVersion": 1,
"position": [
2660,
-600
],
"id": "81b0060b-2880-4290-b369-284f2b043f81",
"name": "Upload binary",
"executeOnce": false
},
{
"parameters": {
"assignments": {
"assignments": [
{
"id": "9a5bb0d7-223c-48a4-b12e-8fd86657a43d",
"name": "video_id",
"value": "={{$json.idx_in_batch}}",
"type": "string"
},
{
"id": "0b22482a-49f1-46e3-833b-d3230ceeebc7",
"name": "dynamicAudioPath",
"value": "={{`/home/node/temp_audio/${$execution.id}_${$itemIndex}.mp3`}}",
"type": "string"
}
]
},
"includeOtherFields": true,
"options": {}
},
"type": "n8n-nodes-base.set",
"typeVersion": 3.4,
"position": [
2220,
-600
],
"id": "39f663d0-c735-48bd-a1b0-056321a4c130",
"name": "Set Names"
},
{
"parameters": {
"content": "## VIDEO TRANSCRIPT (FREE, LOCAL)\n",
"height": 380,
"width": 1440,
"color": 7
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
2040,
-820
],
"id": "a42a0764-b3c5-4f99-ad01-1aea5fad07f9",
"name": "Sticky Note1"
},
{
"parameters": {
"method": "POST",
"url": "http://whisper-asr:9000/asr?output=srt",
"sendBody": true,
"contentType": "multipart-form-data",
"bodyParameters": {
"parameters": [
{
"parameterType": "formBinaryData",
"name": "audio_file",
"inputDataFieldName": "data"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
2860,
-600
],
"id": "6f8e7897-ca10-4b9d-9de0-30aaba143232",
"name": "Whisper Transcribe",
"executeOnce": false
},
{
"parameters": {
"model": "deepseek/deepseek-chat-v3-0324",
"options": {}
},
"type": "@n8n/n8n-nodes-langchain.lmChatOpenRouter",
"typeVersion": 1,
"position": [
2240,
20
],
"id": "ba1ef08d-e8fb-49ce-8ac2-ca74dd6230ff",
"name": "OpenRouter Chat Model1",
"credentials": {
"openRouterApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"operation": "executeTool",
"toolName": "firecrawl_scrape",
"toolParameters": "={\n \"url\": \"{{ $fromAI('url', 'URL of the page to scrape') }}\",\n \"formats\": [\n \"markdown\"\n ],\n \"onlyMainContent\": true,\n \"waitFor\": 1000,\n \"timeout\": 30000,\n \"mobile\": false,\n \"includeTags\": [\n \"article\",\n \"main\",\n \".DocSearch-content\"\n ],\n \"excludeTags\": [\n \"nav\",\n \"footer\",\n \"header\",\n \"script\",\n \"style\"\n ],\n \"skipTlsVerification\": false\n}"
},
"type": "n8n-nodes-mcp.mcpClientTool",
"typeVersion": 1,
"position": [
2740,
0
],
"id": "e78cbe22-111b-4d6b-b9a5-efbd3cecfb16",
"name": "Firecrawl scrape",
"credentials": {
"mcpClientApi": {
"name": "<your credential>"
}
},
"disabled": true
},
{
"parameters": {
"mode": "runOnceForEachItem",
"jsCode": "// 1. \u0443\u0431\u0440\u0430\u0442\u044c ```json \u2026 ```\nlet raw = $json.output ?? '';\nraw = raw.replace(/^```json\\s*/i,'').replace(/```$/i,'').trim();\n\n// 2. parse\nlet obj;\ntry { obj = JSON.parse(raw); } catch { obj = {}; }\n\n// 3. meta \u0438\u0437 Merge\nconst {\n batch_id,\n idx_in_batch,\n type = 'web',\n url,\n expected_map // \u2190 \u0441\u0447\u0451\u0442\u0447\u0438\u043a\u0438 \u043e\u0442 Split & Tag\n} = $json;\n\n// 4. \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\nreturn {\n json: {\n batch_id,\n idx_in_batch,\n type,\n url: obj.url ?? url,\n language: obj.language ?? null,\n title: obj.title ?? null,\n text: obj.text ?? '',\n expected_map, // \u2190 \u043f\u0440\u043e\u043a\u0438\u0434\u044b\u0432\u0430\u0435\u043c \u0431\u0435\u0437 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u0439\n retrieved_at: new Date().toISOString(),\n error: (obj.text ?? '').startsWith('ERROR')\n }\n};\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
2820,
-160
],
"id": "a4c2fb78-2322-4e6f-95ae-1347ef63917a",
"name": "Scrape Parser"
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "641b9fae-4349-4893-a217-3688c2133df8",
"leftValue": "={{ $json.error }}",
"rightValue": true,
"operator": {
"type": "boolean",
"operation": "equals"
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
3000,
-160
],
"id": "f93be6a9-9997-46fc-a469-d14b135ed1a2",
"name": "Error check"
},
{
"parameters": {
"chatId": "={{ $('Telegram Trigger').item.json.message.from.id }}",
"text": "Scrape error",
"additionalFields": {
"appendAttribution": false
}
},
"type": "n8n-nodes-base.telegram",
"typeVersion": 1.2,
"position": [
3200,
-340
],
"id": "127e87cb-6ae6-4d59-8be5-ebef772cdb8e",
"name": "Telegram",
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"content": "## SCRAPING (PAID, API FIRECRAWL.DEV)\n",
"height": 520,
"width": 1440,
"color": 7
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
2040,
-360
],
"id": "40aa9240-4446-4525-ac72-b72eb4ed833c",
"name": "Sticky Note2"
},
{
"parameters": {
"promptType": "define",
"text": "={{ $json.url }}\nor\n{{ $json.chatInput }}",
"hasOutputParser": true,
"options": {
"systemMessage": "You will receive URL/URL's in the user prompt.\nProcess it with scrape tool once for each url and return a single JSON object\u2014no arrays, no extra text. ONLY ON\nUSE EXACTLY ONE SCRAPETOOL CALL PER URL \u2014 NO MORE, NO LESS. \nTHE NUMBER OF SCRAPE CALLS MUST MATCH THE NUMBER OF URLS. \nDO NOT POST-PROCESS SCRAPE RESULTS WITH THE LANGUAGE MODEL \u2014 JUST COLLECT AND RETURN THEM AS RAW JSON.\nIF YOU RECEIVE TEXT FROM URL - DID NOT SEND IT URL AGAIN TO SCRAPE!!!!!\nSteps\n1. Run `scrapetool` with:\n {\n \"url\": <URL>,\n \"format\": \"markdown\",\n \"cleaned\": true,\n \"renderJs\": true\n }\n\n2. If scraping fails \u2192 return:\n {\n \"url\": <URL>,\n \"language\": null,\n \"title\": null,\n \"text\": \"ERROR: Failed to scrape URL\"\n }\n\n3. If scraping succeeds \u2192\n \u2022 Extract the best possible page title (prefer `result.metadata.title`, otherwise use the first <title>) \u2192 `title`.\n \u2022 Take the Markdown body, remove all boilerplate and irrelevant content:\n - navigation menus, cookie banners, polls, \u201cEdit this page\u201d links, \u201cWas this page helpful?\u201d, mailto: links, user feedback forms, share buttons, site footers, and similar UI elements.\n \u2022 Do NOT translate the title or the text\u2014return them in the original language.\n \u2022 Detect the language of the cleaned `text` (ISO 639-1, e.g., \"en\", \"ru\") and return as `language`.\n \u2022 If the cleaned body is empty \u2192 return:\n \"text\": \"INFO: No relevant content found\"\n \u2022 Normalize the output text: remove meaningless characters like repeated dashes or empty lines, use standard `\\n`, `\\n\\n` for paragraph formatting.\n\nDo not delete url in text if it part of a text (and it is important for the meaning)\n\nReturn exactly that JSON object and nothing else.\n"
}
},
"type": "@n8n/n8n-nodes-langchain.agent",
"typeVersion": 1.8,
"position": [
2280,
-140
],
"id": "fb039661-5af7-4b14-aa00-d242c165ae6d",
"name": "Ai Scraping",
"alwaysOutputData": false
},
{
"parameters": {
"content": "## LOCAL EXTRACTOR TXT/PDF/PDF+IMG",
"height": 420,
"width": 1440,
"color": 7
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
2040,
240
],
"id": "7f6a0213-f157-4e6b-a887-47fcee0d7309",
"name": "Sticky Note3"
},
{
"parameters": {
"content": "# \u041c\u041e\u0414\u0423\u041b\u042c \u0418\u0417\u0412\u041b\u0415\u0427\u0415\u041d\u0418\u042f \u041a\u041e\u041d\u0422\u0415\u041d\u0422\u0410\n",
"height": 1600,
"width": 1620,
"color": 4
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
1940,
-900
],
"id": "7266d7a1-524a-40ca-b5a7-0e7d111a3452",
"name": "Sticky Note4"
},
{
"parameters": {
"jsCode": "// 0. \u0441\u043e\u0431\u0438\u0440\u0430\u0435\u043c \u0432\u0445\u043e\u0434\nconst rows = [];\nfor (const el of $input.all()) {\n const j = el.json;\n if (Array.isArray(j.items)) rows.push(...j.items);\n else rows.push(j);\n}\nif (!rows.length) return []; // \u043d\u0435\u0447\u0435\u0433\u043e \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0442\u044c\n\n// 1. \u0441\u0447\u0438\u0442\u0430\u0435\u043c, \u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043e\u0431\u044a\u0435\u043a\u0442\u043e\u0432 \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u0442\u0438\u043f\u0430 \u0432\u0441\u0442\u0440\u0435\u0442\u0438\u043b\u043e\u0441\u044c\nconst expectedMap = {};\nrows.forEach(r => expectedMap[r.type] = (expectedMap[r.type] || 0) + 1);\n\nconst batchId = $execution.id;\nconst fallback = (id, i) => `file_${id || i}.bin`;\n\n// 2. \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u043c \u0432\u044b\u0445\u043e\u0434\u043d\u044b\u0435 \u044d\u043b\u0435\u043c\u0435\u043d\u0442\u044b\nreturn rows.map((r, i) => {\n const item = { ...r };\n\n if (item.type === 'file') {\n item.file_name = item.file_name || fallback(item.file_id, i);\n }\n\n return {\n json: {\n batch_id: batchId,\n idx_in_batch: i,\n expected_map: expectedMap, // \u2190 { web:3, video:2, file:1 }\n ...item // type, url / file_id / file_name \u2026\n }\n };\n});\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
1560,
-60
],
"id": "ee6c898d-8509-4c1c-abac-2d7f6085a4fc",
"name": "Split & Tag"
},
{
"parameters": {
"aggregate": "aggregateAllItemData",
"destinationFieldName": "items",
"options": {}
},
"type": "n8n-nodes-base.aggregate",
"typeVersion": 1,
"position": [
3960,
-140
],
"id": "bc85b801-7a3f-421b-87d1-d85ef6d4ba3a",
"name": "Aggregate"
},
{
"parameters": {
"jsCode": "const items = $json.items || [];\nif (!items.length) return [];\n\n// \u0441\u043a\u043e\u043b\u044c\u043a\u043e \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c (\u0431\u0435\u0440\u0451\u043c \u0438\u0437 \u043f\u0435\u0440\u0432\u043e\u0433\u043e \u044d\u043b\u0435\u043c\u0435\u043d\u0442\u0430 \u043f\u0430\u0440\u0442\u0438\u0438)\nconst need = items[0].expected_map || {};\n\n// \u0441\u043a\u043e\u043b\u044c\u043a\u043e \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u043f\u0440\u0438\u0448\u043b\u043e\nconst got = {};\nitems.forEach(it => got[it.type] = (got[it.type] || 0) + 1);\n\n// \u0433\u043e\u0442\u043e\u0432\u044b \u043b\u0438 \u0432\u0441\u0435 \u0442\u0438\u043f\u044b \u0432 \u043d\u0443\u0436\u043d\u043e\u043c \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u0435?\nconst ready = Object.keys(need)\n .every(t => (got[t] || 0) >= need[t]);\n\nif (ready) {\n // \u0432\u044b\u043f\u0443\u0441\u043a\u0430\u0435\u043c \u0432\u0441\u044e \u043f\u0430\u0440\u0442\u0438\u044e \u043f\u043e \u043e\u0434\u043d\u043e\u043c\u0443 item\n return items.map(it => ({ json: it }));\n}\n\n// \u0435\u0449\u0451 \u0436\u0434\u0451\u043c \u2014 \u043d\u0438\u0447\u0435\u0433\u043e \u043d\u0435 \u043e\u0442\u0434\u0430\u0451\u043c\nreturn [];\n\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
4120,
-140
],
"id": "45e13cdc-7398-4c57-999d-4532afd9c4a0",
"name": "Gatekeeper"
},
{
"parameters": {
"includeOtherFields": true,
"options": {}
},
"type": "n8n-nodes-base.set",
"typeVersion": 3.4,
"position": [
2380,
-300
],
"id": "ebb7ccf0-2178-422f-8d36-a355ba81a838",
"name": "Keep meta scraping"
},
{
"parameters": {
"includeOtherFields": true,
"options": {}
},
"type": "n8n-nodes-base.set",
"typeVersion": 3.4,
"position": [
2480,
320
],
"id": "4ba43fb1-b51e-455d-8284-361abfeb4155",
"name": "Keep meta file"
},
{
"parameters": {
"mode": "combine",
"combineBy": "combineByPosition",
"options": {}
},
"type": "n8n-nodes-base.merge",
"typeVersion": 3.1,
"position": [
2640,
-160
],
"id": "be91d867-c5a7-456c-a4f3-44bcc3845b89",
"name": "Merge scraping"
},
{
"parameters": {
"mode": "combine",
"combineBy": "combineByPosition",
"options": {}
},
"type": "n8n-nodes-base.merge",
"typeVersion": 3.1,
"position": [
2960,
340
],
"id": "48fa5fe4-fc76-4317-8d68-3a043d8a918f",
"name": "Merge file"
},
{
"parameters": {
"jsCode": "// ---------- 0) \u0423\u043d\u0438\u0432\u0435\u0440\u0441\u0430\u043b\u044c\u043d\u0430\u044f \u0440\u0430\u0441\u043f\u0430\u043a\u043e\u0432\u043a\u0430 \u0432\u0445\u043e\u0434\u0430 ----------\nfunction unwrapText(input) {\n let cur = input;\n for (let i = 0; i < 3; i++) { // \u0441\u043d\u0438\u043c\u0430\u0435\u043c \u0432\u043b\u043e\u0436\u0435\u043d\u043d\u044b\u0435 JSON-\u043e\u0431\u0451\u0440\u0442\u043a\u0438\n if (typeof cur === 'string') {\n try { cur = JSON.parse(cur); continue; } catch { break; }\n }\n break;\n }\n if (Array.isArray(cur)) {\n if (cur[0] && typeof cur[0].text === 'string') return cur[0].text;\n if (typeof cur[0] === 'string') return cur[0];\n }\n if (cur && typeof cur === 'object' && typeof cur.text === 'string') return cur.text;\n return String(cur ?? '');\n}\n\n// ---------- 1) \u0411\u0435\u0440\u0451\u043c \u0441\u044b\u0440\u043e\u0439 \u0442\u0435\u043a\u0441\u0442 ----------\nconst rawInner = unwrapText($json.text).replace(/\\r/g, '');\n\n// ---------- 2) \u0420\u0435\u0436\u0435\u043c \u043f\u043e \u043c\u0430\u0440\u043a\u0435\u0440\u0430\u043c \u0438 \u0441\u043e\u0431\u0438\u0440\u0430\u0435\u043c \u0441\u043e\u0434\u0435\u0440\u0436\u0438\u043c\u043e\u0435 \u043c\u0435\u0436\u0434\u0443 \u043d\u0438\u043c\u0438 ----------\nconst re = /--- (?:Page \\d+|OCR Text from Image on Page \\d+) ---\\n?([\\s\\S]*?)(?=--- (?:Page \\d+|OCR Text from Image on Page \\d+) ---|$)/g;\nlet m, chunks = [];\nwhile ((m = re.exec(rawInner)) !== null) {\n const t = m[1].trim();\n if (t) chunks.push(t);\n}\n\n// ---------- 3) \u041d\u043e\u0440\u043c\u0430\u043b\u0438\u0437\u0443\u0435\u043c \u0434\u043e \u0430\u0431\u0437\u0430\u0446\u0435\u0432 ----------\nlet text = chunks.join('\\n')\n .replace(/[ \\t]+$/gm, '') // \u0445\u0432\u043e\u0441\u0442\u043e\u0432\u044b\u0435 \u043f\u0440\u043e\u0431\u0435\u043b\u044b\n .replace(/\\n{3,}/g, '\\n\\n'); // \u043c\u0430\u043a\u0441\u0438\u043c\u0443\u043c \u0434\u0432\u043e\u0439\u043d\u043e\u0439 \u043f\u0443\u0441\u0442\u043e\u0439\n\nconst paras = text\n .split(/\\n{2,}/)\n .map(p => p\n .replace(/-\\n(?=\\p{L})/gu, '') // \u0441\u043d\u044f\u0442\u044c \u043f\u0435\u0440\u0435\u043d\u043e\u0441\u044b \u0441\u043b\u043e\u0432\n .replace(/\\n+/g, ' ') // \u043e\u0434\u0438\u043d\u043e\u0447\u043d\u044b\u0435 \u043f\u0435\u0440\u0435\u0432\u043e\u0434\u044b -> \u043f\u0440\u043e\u0431\u0435\u043b\n .replace(/[ \\t]+/g, ' ')\n .trim()\n )\n .filter(Boolean);\n\n// ---------- 4) \u0412\u043e\u0437\u0432\u0440\u0430\u0442 \u0432 \u0444\u043e\u0440\u043c\u0430\u0442\u0435 n8n ----------\nreturn [{ json: { text: paras.join('\\n\\n') } }];\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
2660,
480
],
"id": "dcc5d292-e428-4ed2-af1b-836a9793af15",
"name": "Text Structured"
},
{
"parameters": {
"mode": "combine",
"combineBy": "combineByPosition",
"options": {}
},
"type": "n8n-nodes-base.merge",
"typeVersion": 3.1,
"position": [
3160,
-740
],
"id": "b6f0875e-31c4-4551-a038-e1eb17dd594f",
"name": "Merge transcribe"
},
{
"parameters": {
"includeOtherFields": true,
"options": {}
},
"type": "n8n-nodes-base.set",
"typeVersion": 3.4,
"position": [
2220,
-760
],
"id": "de142345-5ce8-4f89-bb58-6936de721ac0",
"name": "Keep meta transcript"
},
{
"parameters": {
"jsCode": "const items = $input.all();\n\nreturn items.map(({ json }) => {\n const {\n batch_id,\n idx_in_batch,\n type = 'video',\n expected_map,\n url,\n data = ''\n } = json;\n\n // 1. \u0420\u0430\u0437\u0431\u043e\u0440 SRT \u2014 \u0443\u0431\u0438\u0440\u0430\u0435\u043c \u043d\u043e\u043c\u0435\u0440 \u0441\u0442\u0440\u043e\u043a\u0438 \u0438 \u0442\u0430\u0439\u043c\u043a\u043e\u0434\u044b\n const lines = data\n .split(/\\n{2,}/)\n .map(block => block.split('\\n').slice(2)) // \u0443\u0431\u0438\u0440\u0430\u0435\u043c \u043d\u043e\u043c\u0435\u0440 \u0438 \u0442\u0430\u0439\u043c\u043a\u043e\u0434\n .flat()\n .filter(Boolean);\n\n const cleanText = lines.join('\\n').trim();\n\n // 2. \u042f\u0437\u044b\u043a\n const lang = /[\u0430-\u044f\u0451]/i.test(cleanText) ? 'ru' : 'en';\n\n // 3. \u0412\u043e\u0437\u0432\u0440\u0430\u0442\n return {\n json: {\n batch_id,\n idx_in_batch,\n type,\n url,\n language: lang,\n text: cleanText,\n expected_map,\n retrieved_at: new Date().toISOString(),\n error: !cleanText\n }\n };\n});\n\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
3340,
-740
],
"id": "646b141c-86a6-4e88-b3de-f0c2f3fda31a",
"name": "Video parser"
},
{
"parameters": {
"executeOnce": false,
"command": "=rm \"{{ $('Set Names').item.json.dynamicAudioPath }}\""
},
"type": "n8n-nodes-base.executeCommand",
"typeVersion": 1,
"position": [
3160,
-600
],
"id": "f1ec86ea-e0ec-49a2-a60e-55894c16e176",
"name": "Temp delete"
},
{
"parameters": {
"method": "POST",
"url": "http://flask-app:5000/process_pdf",
"sendBody": true,
"contentType": "multipart-form-data",
"bodyParameters": {
"parameters": [
{
"parameterType": "formBinaryData",
"name": "pdf",
"inputDataFieldName": "data"
}
]
},
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
2480,
480
],
"id": "f6b6d595-30bf-4527-9311-35f3748c5865",
"name": "Extractor"
},
{
"parameters": {
"mode": "runOnceForEachItem",
"jsCode": "// \u0431\u0430\u0437\u043e\u0432\u044b\u0435 \u043f\u043e\u043b\u044f \u0438\u0437 Merge-\u0443\u0437\u043b\u0430\nconst {\n batch_id,\n idx_in_batch,\n type = 'file',\n expected_map, // \u2190 \u0441\u0447\u0451\u0442\u0447\u0438\u043a\u0438 \u043f\u0440\u0438\u0448\u043b\u0438 \u0438\u0437 Split & Tag\n file_id,\n file_name,\n text\n} = $json;\n\n/* \u043e\u0447\u0435\u043d\u044c \u0433\u0440\u0443\u0431\u043e\u0435 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0438\u0435 \u044f\u0437\u044b\u043a\u0430 */\nconst lang = /[\u0430-\u044f\u0451]/i.test(text) ? 'ru' : 'en';\n\nreturn {\n json: {\n batch_id,\n idx_in_batch,\n type, // \"file\"\n file_name, // \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u0435\u043c\n language: lang,\n text,\n expected_map, // \u2190 \u043f\u0440\u043e\u043a\u0438\u0434\u044b\u0432\u0430\u0435\u043c \u043a\u0430\u043a \u0435\u0441\u0442\u044c\n retrieved_at: new Date().toISOString(),\n error: !text || text.startsWith('ERROR')\n }\n};\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
3200,
340
],
"id": "87d67225-5fb1-4c33-bc6e-74ada56bbc89",
"name": "File Parser"
},
{
"parameters": {
"content": "## \u041c\u041e\u0414\u0423\u041b\u042c \u041f\u041e\u0414\u0413\u041e\u0422\u041e\u0412\u041a\u0418 \u0418\u041d\u0424\u041e\u0420\u041c\u0410\u0426\u0418\u0418 \u0418 \u042d\u041c\u0411\u0415\u0414\u0414\u0418\u041d\u0413\u0410",
"height": 480,
"width": 1180,
"color": 4
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
4300,
-320
],
"id": "05f6f8d8-512c-4328-b0e9-467d84301c4e",
"name": "Sticky Note5"
},
{
"parameters": {
"numberInputs": 3
},
"type": "n8n-nodes-base.merge",
"typeVersion": 3.1,
"position": [
3780,
-140
],
"id": "7ec25deb-9d0a-4cb7-8e86-aab542cfb919",
"name": "Merge all"
},
{
"parameters": {
"jsCode": "// --- \u041d\u0410\u0427\u0410\u041b\u041e \u041a\u041e\u0414\u0410 \u0414\u041b\u042f \u0423\u0417\u041b\u0410 n8n Clean & Format ---\n\n// \u0421\u044e\u0434\u0430 \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u0439\u0442\u0435 \u0432\u0430\u0448\u0438 \u0440\u0435\u0433\u0443\u043b\u044f\u0440\u043d\u044b\u0435 \u0432\u044b\u0440\u0430\u0436\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0443\u0434\u0430\u043b\u0435\u043d\u0438\u044f \"\u043c\u0443\u0441\u043e\u0440\u0430\".\n// \u041e\u043d\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0431\u044b\u0442\u044c \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u043e\u0431\u0449\u0438\u043c\u0438, \u0447\u0442\u043e\u0431\u044b \u0440\u0430\u0431\u043e\u0442\u0430\u0442\u044c \u0434\u043b\u044f \u0440\u0430\u0437\u043d\u044b\u0445 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u043e\u0432.\n// \u041f\u0440\u0438\u043c\u0435\u0440 \u0438\u0437 \u0432\u0430\u0448\u0435\u0433\u043e \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430, \u043d\u043e \u043b\u0443\u0447\u0448\u0435 \u043e\u0431\u043e\u0431\u0449\u0430\u0442\u044c \u0438\u043b\u0438 \u0438\u043c\u0435\u0442\u044c \u0441\u043f\u0438\u0441\u043e\u043a \u0442\u0430\u043a\u0438\u0445 \u043f\u0430\u0442\u0442\u0435\u0440\u043d\u043e\u0432.\nconst BOILERPLATE_PATTERNS = [\n /^back to top$/i,\n /^was this page helpful\\?$/i,\n /^thanks? for your feedback$/i,\n // \u0414\u043e\u0431\u0430\u0432\u044c\u0442\u0435 \u0441\u044e\u0434\u0430 \u0434\u0440\u0443\u0433\u0438\u0435 \u043f\u0430\u0442\u0442\u0435\u0440\u043d\u044b, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440:\n // /^\u0421\u043a\u0430\u0447\u0430\u043d\u043e \u0441 \u0441\u0430\u0439\u0442\u0430 - SuperSliv\\.biz - \u041f\u0440\u0438\u0441\u043e\u0435\u0434\u0438\u043d\u044f\u0439\u0441\u044f!$/gim,\n // /^SuperSliv\\.biz - \u0422\u0432\u043e\u0439 \u043b\u0443\u0447\u0448\u0438\u0439 \u0412\u042b\u0411\u041e\u0420!$/gim,\n // /\\^ \u0423\u0411\u041e\u0419\u041d\u042b\u0415 Oe/g, // \u0415\u0441\u043b\u0438 \u044d\u0442\u043e \u043f\u043e\u0441\u0442\u043e\u044f\u043d\u043d\u044b\u0439 \u0430\u0440\u0442\u0435\u0444\u0430\u043a\u0442\n];\n\nfunction cleanInline(str) {\n // \u0412\u0430\u0448\u0430 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0430\u044f \u0444\u0443\u043d\u043a\u0446\u0438\u044f \u0434\u043b\u044f \u043e\u0447\u0438\u0441\u0442\u043a\u0438 inline-\u0440\u0430\u0437\u043c\u0435\u0442\u043a\u0438\n return str\n .replace(/\\[([^\\\\]]+?)\\]\\((https?:\\/\\/[^\\\\)]+?)\\)/gi, '$1 ($2)')\n .replace(/\\[([^\\\\]]+?)\\]/g, '$1')\n .replace(/(\\*\\*|__)(.*?)\\1/g, '$2')\n .replace(/(\\*|_)(.*?)\\1/g, '$2')\n .replace(/`([^`]+?)`/g, '$1');\n}\n\nfunction needsParagraphBreak(prevLineTrimmed, currentLineTrimmed) {\n // \u0412\u0430\u0448\u0430 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0430\u044f \u0444\u0443\u043d\u043a\u0446\u0438\u044f \u0434\u043b\u044f \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0438\u044f \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u0438 \u0440\u0430\u0437\u0440\u044b\u0432\u0430 \u0430\u0431\u0437\u0430\u0446\u0430\n if (!prevLineTrimmed) return false;\n const lastCharPrev = prevLineTrimmed.slice(-1);\n const firstCharCurr = currentLineTrimmed.charAt(0);\n // \u041f\u0440\u043e\u0432\u0435\u0440\u044f\u0435\u043c, \u0437\u0430\u043a\u0430\u043d\u0447\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043b\u0438 \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0430\u044f \u0441\u0442\u0440\u043e\u043a\u0430 \u043d\u0430 \u043f\u0443\u043d\u043a\u0442\u0443\u0430\u0446\u0438\u044e \u043a\u043e\u043d\u0446\u0430 \u043f\u0440\u0435\u0434\u043b\u043e\u0436\u0435\u043d\u0438\u044f\n // \u0438 \u043d\u0430\u0447\u0438\u043d\u0430\u0435\u0442\u0441\u044f \u043b\u0438 \u0442\u0435\u043a\u0443\u0449\u0430\u044f \u0441 \u0437\u0430\u0433\u043b\u0430\u0432\u043d\u043e\u0439 \u0431\u0443\u043a\u0432\u044b (\u043f\u0440\u043e\u0441\u0442\u043e\u0439 \u044d\u0432\u0440\u0438\u0441\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0439 \u043f\u043e\u0434\u0445\u043e\u0434)\n return '.!?:\uff1a'.includes(lastCharPrev) && firstCharCurr === firstCharCurr.toUpperCase() && /[A-Z\u0410-\u042f\u0401]/.test(firstCharCurr);\n}\n\nfunction cleanText(raw = '') {\n raw = raw.replace(/\\r\\n?/g, '\\n'); // \u0423\u043d\u0438\u0444\u0438\u043a\u0430\u0446\u0438\u044f \u043a\u043e\u043d\u0446\u043e\u0432 \u0441\u0442\u0440\u043e\u043a\n const lines = raw.split('\\n');\n\n const paras = []; // \u041c\u0430\u0441\u0441\u0438\u0432 \u0434\u043b\u044f \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0433\u043e\u0442\u043e\u0432\u044b\u0445 \u0430\u0431\u0437\u0430\u0446\u0435\u0432\n let buf = ''; // \u0411\u0443\u0444\u0435\u0440 \u0434\u043b\u044f \u043d\u0430\u043a\u043e\u043f\u043b\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0443\u0449\u0435\u0433\u043e \u0430\u0431\u0437\u0430\u0446\u0430\n\n const flush = () => {\n const trimmedBuf = buf.trim();\n if (trimmedBuf) { // \u0414\u043e\u0431\u0430\u0432\u043b\u044f\u0435\u043c \u0432 paras, \u0442\u043e\u043b\u044c\u043a\u043e \u0435\u0441\u043b\u0438 \u0431\u0443\u0444\u0435\u0440 \u043d\u0435 \u043f\u0443\u0441\u0442\u043e\u0439 \u043f\u043e\u0441\u043b\u0435 trim\n paras.push(trimmedBuf);\n }\n buf = ''; // \u041e\u0447\u0438\u0449\u0430\u0435\u043c \u0431\u0443\u0444\u0435\u0440\n };\n\n let previousLineTrimmed = ''; // \u0425\u0440\u0430\u043d\u0438\u043c \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0443\u044e \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u0443\u044e \u0441\u0442\u0440\u043e\u043a\u0443 \u0434\u043b\u044f needsParagraphBreak\n\n for (let i = 0; i < lines.length; i++) {\n let currentLineRawTrimmed = lines[i].trim(); // \u0422\u0435\u043a\u0443\u0449\u0430\u044f \u0441\u0442\u0440\u043e\u043a\u0430, \u043e\u0447\u0438\u0449\u0435\u043d\u043d\u0430\u044f \u043e\u0442 \u043f\u0440\u043e\u0431\u0435\u043b\u043e\u0432 \u043f\u043e \u043a\u0440\u0430\u044f\u043c\n\n // 1. \u041f\u0440\u043e\u043f\u0443\u0441\u043a \u043f\u0443\u0441\u0442\u044b\u0445 \u0441\u0442\u0440\u043e\u043a \u0438 \u0441\u0431\u0440\u043e\u0441 \u0431\u0443\u0444\u0435\u0440\u0430 (\u043d\u0430\u0447\u0430\u043b\u043e \u043d\u043e\u0432\u043e\u0433\u043e \u0430\u0431\u0437\u0430\u0446\u0430)\n if (!currentLineRawTrimmed) {\n flush();\n previousLineTrimmed = ''; // \u0421\u0431\u0440\u0430\u0441\u044b\u0432\u0430\u0435\u043c \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0443\u044e \u0441\u0442\u0440\u043e\u043a\u0443, \u0442\u0430\u043a \u043a\u0430\u043a \u0431\u044b\u043b \u0440\u0430\u0437\u0440\u044b\u0432\n continue;\n }\n\n // 2. \u041f\u0440\u043e\u043f\u0443\u0441\u043a \"\u043c\u0443\u0441\u043e\u0440\u043d\u044b\u0445\" \u0441\u0442\u0440\u043e\u043a\n if (BOILERPLATE_PATTERNS.some(rx => rx.test(currentLineRawTrimmed)) || currentLineRawTrimmed.startsWith('![')) {\n flush(); // \u0421\u0431\u0440\u0430\u0441\u044b\u0432\u0430\u0435\u043c \u0431\u0443\u0444\u0435\u0440 \u043f\u0435\u0440\u0435\u0434 \u043f\u0440\u043e\u043f\u0443\u0441\u043a\u043e\u043c \u043c\u0443\u0441\u043e\u0440\u043d\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u0438\n previousLineTrimmed = ''; // \u0421\u0431\u0440\u0430\u0441\u044b\u0432\u0430\u0435\u043c \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0443\u044e \u0441\u0442\u0440\u043e\u043a\u0443\n continue;\n }\n\n const markerRx = /^\\s*(#{1,6}|[-*\u2022]|\\d+\\.)\\s+/; // \u041c\u0430\u0440\u043a\u0435\u0440\u044b \u0437\u0430\u0433\u043e\u043b\u043e\u0432\u043a\u043e\u0432 \u0438 \u0441\u043f\u0438\u0441\u043a\u043e\u0432\n const isMarkerLine = markerRx.test(currentLineRawTrimmed);\n let textToProcess = currentLineRawTrimmed;\n\n if (isMarkerLine) {\n // \u0415\u0441\u043b\u0438 \u0442\u0435\u043a\u0443\u0449\u0430\u044f \u0441\u0442\u0440\u043e\u043a\u0430 - \u044d\u0442\u043e \u043d\u043e\u0432\u044b\u0439 \u043c\u0430\u0440\u043a\u0435\u0440 (\u0437\u0430\u0433\u043e\u043b\u043e\u0432\u043e\u043a/\u044d\u043b\u0435\u043c\u0435\u043d\u0442 \u0441\u043f\u0438\u0441\u043a\u0430),\n // \u0438 \u0432 \u0431\u0443\u0444\u0435\u0440\u0435 \u0443\u0436\u0435 \u0447\u0442\u043e-\u0442\u043e \u0435\u0441\u0442\u044c (\u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0438\u0439 \u0430\u0431\u0437\u0430\u0446 \u0438\u043b\u0438 \u044d\u043b\u0435\u043c\u0435\u043d\u0442 \u0441\u043f\u0438\u0441\u043a\u0430), \u0441\u0431\u0440\u0430\u0441\u044b\u0432\u0430\u0435\u043c \u0431\u0443\u0444\u0435\u0440.\n flush();\n textToProcess = currentLineRawTrimmed.replace(markerRx, '').trim(); // \u0423\u0434\u0430\u043b\u044f\u0435\u043c \u043c\u0430\u0440\u043a\u0435\u0440\n }\n\n textToProcess = cleanInline(textToProcess); // \u041f\u0440\u0438\u043c\u0435\u043d\u044f\u0435\u043c inline \u043e\u0447\u0438\u0441\u0442\u043a\u0443\n\n // 3. \u041b\u043e\u0433\u0438\u043a\u0430 \u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0430\u0431\u0437\u0430\u0446\u0435\u0432\n if (!buf) { // \u0415\u0441\u043b\u0438 \u0431\u0443\u0444\u0435\u0440 \u043f\u0443\u0441\u0442 (\u043d\u0430\u0447\u0430\u043b\u043e \u043d\u043e\u0432\u043e\u0433\u043e \u0430\u0431\u0437\u0430\u0446\u0430)\n buf = textToProcess;\n } else {\n // \u0411\u0443\u0444\u0435\u0440 \u043d\u0435 \u043f\u0443\u0441\u0442, \u0440\u0435\u0448\u0430\u0435\u043c, \u043a\u0430\u043a \u0434\u043e\u0431\u0430\u0432\u0438\u0442\u044c \u0442\u0435\u043a\u0443\u0449\u0438\u0439 \u0442\u0435\u043a\u0441\u0442:\n // \u043a\u0430\u043a \u043f\u0440\u043e\u0434\u043e\u043b\u0436\u0435\u043d\u0438\u0435 \u0442\u0435\u043a\u0443\u0449\u0435\u0433\u043e \u0430\u0431\u0437\u0430\u0446\u0430 \u0438\u043b\u0438 \u043d\u0430\u0447\u0430\u0442\u044c \u043d\u043e\u0432\u044b\u0439.\n // needsParagraphBreak \u0441\u0440\u0430\u0432\u043d\u0438\u0432\u0430\u0435\u0442 \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0443\u044e \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u043d\u0443\u044e \u0432 \u0431\u0443\u0444\u0435\u0440 \u0441\u0442\u0440\u043e\u043a\u0443 (\u0438\u043b\u0438 \u0435\u0435 \u043a\u043e\u043d\u0435\u0446)\n // \u0441 \u0442\u0435\u043a\u0443\u0449\u0435\u0439 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u043c\u043e\u0439 \u0441\u0442\u0440\u043e\u043a\u043e\u0439 textToProcess.\n // \u0412\u0430\u0436\u043d\u043e: `previousLineTrimmed` \u0437\u0434\u0435\u0441\u044c \u0434\u043e\u043b\u0436\u043d\u0430 \u0431\u044b\u0442\u044c \u043f\u043e\u0441\u043b\u0435\u0434\u043d\u0435\u0439 *\u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u043d\u043e\u0439* \u0441\u0442\u0440\u043e\u043a\u043e\u0439.\n if (needsParagraphBreak(previousLineTrimmed, textToProcess) && !isMarkerLine) {\n // \u0415\u0441\u043b\u0438 \u043d\u0443\u0436\u0435\u043d \u0440\u0430\u0437\u0440\u044b\u0432 \u0430\u0431\u0437\u0430\u0446\u0430 \u0418 \u0442\u0435\u043a\u0443\u0449\u0430\u044f \u0441\u0442\u0440\u043e\u043a\u0430 \u043d\u0435 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043d\u0430\u0447\u0430\u043b\u043e\u043c \u043d\u043e\u0432\u043e\u0433\u043e \u044d\u043b\u0435\u043c\u0435\u043d\u0442\u0430 \u0441\u043f\u0438\u0441\u043a\u0430\n // (\u0447\u0442\u043e\u0431\u044b \u043d\u0435 \u0440\u0430\u0437\u0440\u044b\u0432\u0430\u0442\u044c \u0441\u0440\u0430\u0437\u0443 \u043f\u043e\u0441\u043b\u0435 \u043c\u0430\u0440\u043a\u0435\u0440\u0430, \u0435\u0441\u043b\u0438 \u043e\u043d \u0441\u0430\u043c \u043f\u043e \u0441\u0435\u0431\u0435 \u043d\u0435 \u0442\u0440\u0435\u0431\u0443\u0435\u0442 \u0440\u0430\u0437\u0440\u044b\u0432\u0430)\n flush();\n buf = textToProcess;\n } else {\n // \u041f\u0440\u043e\u0434\u043e\u043b\u0436\u0430\u0435\u043c \u0442\u0435\u043a\u0443\u0449\u0438\u0439 \u0430\u0431\u0437\u0430\u0446/\u044d\u043b\u0435\u043c\u0435\u043d\u0442 \u0441\u043f\u0438\u0441\u043a\u0430, \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044f \u043f\u0440\u043e\u0431\u0435\u043b\n buf += ' ' + textToProcess;\n }\n }\n previousLineTrimmed = textToProcess.split('\\n').pop()?.trim() || ''; // \u041e\u0431\u043d\u043e\u0432\u043b\u044f\u0435\u043c previousLineTrimmed \u043f\u043e\u0441\u043b\u0435\u0434\u043d\u0435\u0439 \u0447\u0430\u0441\u0442\u044c\u044e \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u043e\u0433\u043e \u0442\u0435\u043a\u0441\u0442\u0430\n }\n\n flush(); // \u0421\u0431\u0440\u043e\u0441 \u043e\u0441\u0442\u0430\u0442\u043a\u043e\u0432 \u0438\u0437 \u0431\u0443\u0444\u0435\u0440\u0430 \u043f\u043e\u0441\u043b\u0435 \u043e\u043a\u043e\u043d\u0447\u0430\u043d\u0438\u044f \u0446\u0438\u043a\u043b\u0430\n\n return paras.join('\\n\\n'); // \u0421\u043e\u0435\u0434\u0438\u043d\u044f\u0435\u043c \u0432\u0441\u0435 \u0430\u0431\u0437\u0430\u0446\u044b \u0434\u0432\u043e\u0439\u043d\u044b\u043c \u043f\u0435\u0440\u0435\u043d\u043e\u0441\u043e\u043c \u0441\u0442\u0440\u043e\u043a\u0438\n}\n\n// \u2500\u2500 n8n mapping (\u0432\u0430\u0448\u0430 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0430\u044f \u043b\u043e\u0433\u0438\u043a\u0430 \u043c\u0430\u043f\u043f\u0438\u043d\u0433\u0430) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nreturn items.map(({ json }) => {\n const {\n idx_in_batch,\n type = 'web',\n text = '',\n url,\n file_name,\n title,\n language = null,\n retrieved_at = new Date().toISOString()\n } = json;\n\n const cleaned = cleanText(text); // \u041f\u0440\u0438\u043c\u0435\u043d\u044f\u0435\u043c \u043e\u0431\u043d\u043e\u0432\u043b\u0435\u043d\u043d\u0443\u044e \u0444\u0443\u043d\u043a\u0446\u0438\u044e \u043e\u0447\u0438\u0441\u0442\u043a\u0438\n\n const out = {\n type,\n text: cleaned,\n language,\n retrieved_at,\n };\n\n // \u0421\u043e\u0445\u0440\u0430\u043d\u044f\u0435\u043c \u043e\u0441\u0442\u0430\u043b\u044c\u043d\u044b\u0435 \u043f\u043e\u043b\u044f, \u043a\u0430\u043a \u0432 \u0432\u0430\u0448\u0435\u043c \u043e\u0440\u0438\u0433\u0438\u043d\u0430\u043b\u044c\u043d\u043e\u043c \u043a\u043e\u0434\u0435\n if (idx_in_batch !== undefined) out.idx_in_batch = idx_in_batch;\n if (type === 'file' && file_name) out.file_name = file_name;\n if (type === 'web' && title) out.title = title;\n if ((type === 'web' || type === 'video') && url) out.url = url;\n\n return { json: out };\n});\n// --- \u041a\u041e\u041d\u0415\u0426 \u041a\u041e\u0414\u0410 \u0414\u041b\u042f \u0423\u0417\u041b\u0410 n8n Clean & Format ---\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
4340,
-140
],
"id": "d33fd7ce-0964-4a07-95f3-4430a7efe15e",
"name": "Clean & Format"
},
{
"parameters": {
"jsCode": "// \u041f\u043e\u043b\u0443\u0447\u0430\u0435\u043c \u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 \u0432\u0445\u043e\u0434\u0430\nconst results = Array.isArray($json.results) ? $json.results[0] : $json.results;\n\n// \u041c\u0435\u0442\u0430-\u0434\u0430\u043d\u043d\u044b\u0435 \u0438\u0437 Clean & Format\nconst meta = $('Clean & Format').first().json;\n\nconst sourceType = meta.type || '';\nconst title = meta.title || '';\nconst url = meta.url || '';\nconst fileName = meta.file_name || '';\nconst language = meta.language || '';\nconst retrievedAt = meta.retrieved_at || new Date().toISOString();\n\n// \u0424\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u043c \u043e\u0431\u044a\u0435\u043a\u0442\u044b \u0434\u043b\u044f \u043e\u0442\u043f\u0440\u0430\u0432\u043a\u0438 \u0432 \u0411\u0414\nconst items = (results.texts || []).map((text, i) => ({\n type: sourceType,\n language: language,\n retrieved_at: retrievedAt,\n paragraph_idx: i,\n paragraph_text: text || '',\n embedding: results.embeddings?.[i] || null,\n title: title,\n url: url,\n file_name: fileName\n}));\n\nreturn { json: { items } };\n\n"
},
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [
5340,
-180
],
"id": "e3ab0a5a-e67d-4ca4-9b72-16634dbe2927",
"name": "Results"
},
{
"parameters": {
"method": "POST",
"url": "https://uhfliwtnkedtzbepfshw.supabase.co/rest/v1/documents_paragraphs",
"authentication": "predefinedCredentialType",
"nodeCredentialType": "supabaseApi",
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ $json.items }}",
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
5740,
-180
],
"id": "3f27d20f-c61b-4058-bbcf-1907183c02b5",
"name": "PUSH TO DB",
"alwaysOutputData": true,
"credentials": {
"supabaseApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"content": "### \u0421\u0411\u041e\u0420 \u0418\u041d\u0424\u041e\u0420\u041c\u0410\u0426\u0418\u0418 \u0418 \u041e\u0416\u0418\u0414\u0410\u041d\u0418\u0415 \u0412\u0421\u0415\u0425 \u0414\u041e\u041a\u0423\u041c\u0415\u041d\u0422\u041e\u0412",
"height": 220,
"width": 540,
"color": 5
},
"type": "n8n-nodes-base.stickyNote",
"typeVersion": 1,
"position": [
3720,
-200
],
"id": "f44ae631-c193-4a3e-bb3f-741bfdc420e7",
"name": "Sticky Note6"
},
{
"parameters": {
"chatId": "={{$json[\"message\"][\"chat\"][\"id\"]}}",
"text": "\u0420\u0435\u0436\u0438\u043c \u0440\u0430\u0431\u043e\u0442\u044b:",
"replyMarkup": "inlineKeyboard",
"inlineKeyboard": {
"rows": [
{
"row": {
"buttons": [
{
"text": "\u0427\u0430\u0442",
"additionalFields": {
"callback_data": "chat"
}
},
{
"text": "\u0411\u0430\u0437\u0430",
"additionalFields": {
"callback_data": "documents"
}
},
{
"text": "(Clean chat)",
"additionalFields": {
"callback_data": "clean_chat"
}
},
{
"text": "(Clean base)",
"additionalFields": {
"callback_data": "clean_base"
}
}
]
}
}
]
},
"additionalFields": {
"appendAttribution": false
}
},
"type": "n8n-nodes-base.telegram",
"typeVersion": 1.2,
"position": [
1060,
440
],
"id": "fa4bb1b2-3ef1-4d44-b4ff-634cf47fccde",
"name": "Mode menu",
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "468aee87-e709-4059-946c-1ae0f8ce972d",
"leftValue": "={{$json[\"callback_query\"]}}",
"rightValue": "",
"operator": {
"type": "object",
"operation": "notExists",
"singleValue": true
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
600,
520
],
"id": "33b7b193-fffd-45b5-872e-9d1bfd3b0826",
"name": "Button/Text?"
},
{
"parameters": {
"chatId": "={{ $('Telegram Trigger').item.json.message.chat.id }}",
"text": "\u0412\u044b\u0431\u0435\u0440\u0438\u0442\u0435 \u0440\u0435\u0436\u0438\u043c \u0440\u0430\u0431\u043e\u0442\u044b:",
"replyMarkup": "inlineKeyboard",
"inlineKeyboard": {
"rows": [
{
"row": {
"buttons": [
{
"text": "\u0427\u0430\u0442",
"additionalFields": {
"callback_data": "chat"
}
},
{
"text": "\u0411\u0430\u0437\u0430",
"additionalFields": {
"callback_data": "documents"
}
}
]
}
}
]
},
"additionalFields": {
"appendAttribution": false
}
},
"type": "n8n-nodes-base.telegram",
"typeVersion": 1.2,
"position": [
1660,
440
],
"id": "1405fcdb-d91a-4da3-a053-21f285229aac",
"name": "Mode menu1",
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "c0b93697-8388-4239-88c5-c116725df95e",
"leftValue": "={{ $json.current_mode }}",
"rightValue": "",
"operator": {
"type": "string",
"operation": "empty",
"singleValue": true
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
1480,
460
],
"id": "9c429c44-7c3d-4afa-b0bd-d7d9bf69600a",
"name": "Mode selected?"
},
{
"parameters": {
"conditions": {
"options": {
"caseSensitive": true,
"leftValue": "",
"typeValidation": "strict",
"version": 2
},
"conditions": [
{
"id": "6acf73f7-2da5-404c-b810-3949ad71be6e",
"leftValue": "={{ $('Telegram Trigger').item.json.callback_query.data }}",
"rightValue": "chat",
"operator": {
"type": "string",
"operation": "equals",
"name": "filter.operator.equals"
}
}
],
"combinator": "and"
},
"options": {}
},
"type": "n8n-nodes-base.if",
"typeVersion": 2.2,
"position": [
1300,
840
],
"id": "abe6ae2a-caef-459e-b5ea-d3dc81a44242",
"name": "Chat/Base?"
},
{
"parameters": {
"chatId": "={{ $('Telegram Trigger').item.json.callback_query.message.chat.id }}",
"text": "=*\u0412\u044b\u0431\u0440\u0430\u043d \u0440\u0435\u0436\u0438\u043c \\\"\u0417\u0430\u0433\u0440\u0443\u0437\u043a\u0430 \u0432 \u0431\u0430\u0437\u0443\\\"\\:* \n\u2022 \u041c\u043e\u0436\u043d\u043e \u043e\u0442\u043f\u0440\u0430\u0432\u0438\u0442\u044c \u0432\u0435\u0431\\-\u0441\u0441\u044b\u043b\u043a\u0443, \u0432\u0438\u0434\u0435\u043e \u0438\u043b\u0438 \u0444\u0430\u0439\u043b \n\u2022 \u0412\u0441\u0451 \u0431\u0443\u0434\u0435\u0442 \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u043e \u0432 \u0431\u0430\u0437\u0443 \n \n_\u041f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f_\\: \n\u2022 \u0412\u0435\u0431\\-\u0441\u0441\u044b\u043b\u043a\u0430 \u2014 \u043b\u044e\u0431\u0430\u044f\\, \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0449\u0430\u044f \u0442\u0435\u043a\u0441\u0442\\. \u0414\u043e 3 \u0448\u0442 \u0437\u0430 \u0440\u0430\u0437\\. \n\u2022 \u0412\u0438\u0434\u0435\u043e \u2014 youtube\\.com \u0438 youtu\\.be\\. \u0414\u043e 3\u0448\u0442 \u0437\u0430 \u0440\u0430\u0437\\. \n\u2022 \u0424\u0430\u0439\u043b\u044b \u2014 pdf \\/ docx \\/ txt\\, \u0442\u0430\u043a \u0436\u0435 \u0438\u0437\u0432\u043b\u0435\u043a\u0430\u0435\u0442\u0441\u044f \u0442\u0435\u043a\u0441\u0442 \u0438\u0437 \u0438\u0437\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0438\u0439 \u0432\u043d\u0443\u0442\u0440\u0438 \u0444\u0430\u0439\u043b\u043e\u0432\\. 1\u0448\u0442 \u0437\u0430 \u0440\u0430\u0437\\.",
"additionalFields": {
"parse_mode": "MarkdownV2"
}
},
"type": "n8n-nodes-base.telegram",
"typeVersion": 1.2,
"position": [
1560,
980
],
"id": "7ec4c3f9-34b7-407b-9e8d-58eb76c13d27",
"name": "Mode Base",
"executeOnce": false,
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"chatId": "={{ $('Telegram Trigger').item.json.callback_query.message.chat.id }}",
"text": "=*\u0412\u044b\u0431\u0440\u0430\u043d \u0440\u0435\u0436\u0438\u043c \"\u0427\u0430\u0442\":* \n\u2022 \u041e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u0435 \u043f\u0430\u043c\u044f\u0442\u0438 \u2014 30 \u0441\u043e\u043e\u0431\u0449\u0435\u043d\u0438\u0439 \n\u2022 \u041e\u0431\u0440\u0430\u0449\u0435\u043d\u0438\u0435 \u043a \u0432\u0430\u0448\u0435\u0439 _RAG_ \u0431\u0430\u0437\u0435 \u2014 \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u043e",
"additionalFields": {
"parse_mode": "MarkdownV2"
}
},
"type": "n8n-nodes-base.telegram",
"typeVersion": 1.2,
"position": [
1560,
720
],
"id": "19c4ac86-35f9-42e7-995f-1f2a97d10475",
"name": "Mode Chat",
"executeOnce": false,
"credentials": {
"telegramApi": {
"name": "<your credential>"
}
}
},
{
"parameters": {
"method": "POST",
"url": "https://uhfliwtnkedtzbepfshw.supabase.co/rest/v1/user_state",
"authentication": "predefinedCredentialType",
"nodeCredentialType": "supabaseApi",
"sendHeaders": true,
"specifyHeaders": "json",
"jsonHeaders": "{\n \"Prefer\": \"resolution=merge-duplicates\"\n}",
"sendBody": true,
"contentType": "raw",
"rawContentType": "application/json",
"body": "={\n \"user_id\": \"{{ $json.callback_query.from.id }}\",\n \"current_mode\": \"{{ $json.callback_query.data }}\",\n \"updated_at\": \"{{ $now }}\"\n}",
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [
1080,
840
],
"id": "091de72d-eb78-434f-ae32-7a7937128ba7",
"name": "Mode set",
"credentials": {
"supabaseApi": {
"name": "<your
Credentials you'll need
Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.
mcpClientApiopenAiApiopenRouterApisupabaseApitelegramApi
For the full experience including quality scoring and batch install features for each workflow upgrade to Pro
How this works
This RAG chatbot delivers instant, context-aware answers to users through Telegram by retrieving relevant information from connected data sources before generating responses. It suits teams and individuals who need a reliable conversational interface for internal knowledge or customer support without building a full custom application. The core step is the retrieval-augmented generation loop that combines OpenAI’s language model with the MCP tool to ground every reply in actual documents.
Use it when you require a persistent Telegram-based assistant that handles both text and voice messages; avoid it for one-off queries or when strict data residency rules prohibit cloud AI models. Common variations include swapping the knowledge base connector or adding approval steps before external actions.
About this workflow
RAG CHATBOT Main. Uses telegram, telegramTrigger, lmChatOpenAi, n8n-nodes-mcp. Event-driven trigger; 87 nodes.
Source: https://github.com/sibneurosnab/RAG-chatbot/blob/802b6f4b564553b499417179ef73e10b58cbc528/workflows/RAG_CHATBOT_Main.json — original creator credit. Request a take-down →
Related workflows
Workflows that share integrations, category, or trigger type with this one. All free to copy and import.
Generate AI viral videos with NanoBanana & VEO3, shared on socials via Blotato 2. Uses @blotato/n8n-nodes-blotato, googleSheets, lmChatOpenAi, toolThink. Event-driven trigger; 94 nodes.
Digital marketers, content creators, social media managers, and businesses who want to use AI marketing automation for YouTube Shorts without spending hours on production. This AI workflow helps anyon
This template is designed for marketers, content creators, and e-commerce brands who want to automate the creation of professional ad videos at scale. It’s ideal for teams looking to generate consiste
This automation is designed to help you generate AI-powered music tracks, cover art, and fully rendered music videos — all triggered from a simple Telegram chat and managed via Google Sheets.
This workflow helps to automatically discover undocumented API endpoints by analysing JavaScript files from the website's HTML code.