AutomationFlowsAI & RAG › Convert PDF Documents to AI Podcasts with Google Gemini and Text-to-speech

Convert PDF Documents to AI Podcasts with Google Gemini and Text-to-speech

ByMathis @iamthis on n8n.io

Transform any PDF document into an engaging, natural-sounding podcast using Google's Gemini AI and advanced Text-to-Speech technology. This automated workflow extracts text content, generates conversational scripts, and produces high-quality audio files.

Event trigger★★★★☆ complexityAI-powered10 nodesChain LlmGoogle Gemini ChatHTTP RequestWrite Binary File
AI & RAG Trigger: Event Nodes: 10 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #4883 — we link there as the canonical source.

This workflow follows the Chainllm → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "nodes": [
    {
      "id": "37a9addb-a90e-4491-8b7a-a42fad57329a",
      "name": "\ud83c\udfac Start: Upload PDF File",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        -340,
        -60
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "dd475138-a5fa-4818-aeaa-27e8e0969367",
      "name": "\ud83d\udcc4 Extract Text from PDF",
      "type": "n8n-nodes-base.extractFromFile",
      "position": [
        -120,
        -60
      ],
      "parameters": {
        "options": {},
        "operation": "pdf",
        "binaryPropertyName": "data0"
      },
      "typeVersion": 1
    },
    {
      "id": "9a24fae8-ac8c-48a0-a985-69e0b4a3e0b8",
      "name": "\ud83e\udd16 Generate Podcast Script",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        100,
        -160
      ],
      "parameters": {
        "text": "={{ $json.text }}",
        "batching": {},
        "messages": {
          "messageValues": [
            {
              "message": "=Create an engaging podcast script based on the following content. The script should be exactly 5-7 minutes when read aloud (approximately 750-1000 words).  CONTENT TO DISCUSS: {content}  STYLE: {style}  PODCAST SCRIPT REQUIREMENTS: 1. Start with: \"Welcome to our AI-generated podcast. Today we're diving into...\" 2. Structure: Introduction, 2-3 main discussion points, conclusion 3. Use natural conversational language 4. Include smooth transitions between topics 5. Add brief pauses with periods and commas for natural speech 6. Avoid complex punctuation that might confuse TTS 7. End with: \"Thanks for listening! This has been your AI podcast.\" 8. Write ONLY the spoken content (no stage directions) 9. Keep sentences moderate length for clear speech 10. Use engaging storytelling techniques"
            }
          ]
        },
        "promptType": "define"
      },
      "typeVersion": 1.7
    },
    {
      "id": "87891658-777b-47bf-9a86-9c23076a7477",
      "name": "Google Gemini Flash 2.0",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        188,
        60
      ],
      "parameters": {
        "options": {},
        "modelName": "models/gemini-2.0-flash-001"
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "8ebba81a-a42e-48cf-aeaa-1263fe59ed90",
      "name": "\u2699\ufe0f Prepare TTS Request",
      "type": "n8n-nodes-base.code",
      "position": [
        480,
        -60
      ],
      "parameters": {
        "jsCode": "const script = $input.first().json.text;\nconst voiceName = $input.first().json.voiceName || \"Kore\";\n\nreturn {\n  json: {\n    requestBody: {\n      contents: [{\n        parts: [{\n          text: script\n        }]\n      }],\n      generationConfig: {\n        responseModalities: [\"AUDIO\"],\n        speechConfig: {\n          voiceConfig: {\n            prebuiltVoiceConfig: {\n              voiceName: voiceName\n            }\n          }\n        }\n      }\n    }\n  }\n};"
      },
      "typeVersion": 2
    },
    {
      "id": "78e3f968-ec21-4d85-a802-688f1d33b48c",
      "name": "\ud83c\udf99\ufe0f Convert Text to Speech With Gemini",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        696,
        -60
      ],
      "parameters": {
        "url": "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-tts:generateContent",
        "method": "POST",
        "options": {},
        "jsonBody": "={{ $json.requestBody }}",
        "sendBody": true,
        "sendHeaders": true,
        "specifyBody": "json",
        "authentication": "predefinedCredentialType",
        "headerParameters": {
          "parameters": [
            {
              "name": "Content-Type",
              "value": "application/json"
            }
          ]
        },
        "nodeCredentialType": "googlePalmApi"
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        },
        "httpCustomAuth": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "1baddfca-84c4-4ffd-beb8-5acb919112fe",
      "name": "\ud83d\udd27 Process Audio Response",
      "type": "n8n-nodes-base.code",
      "position": [
        916,
        -60
      ],
      "parameters": {
        "jsCode": "const response = $input.all()[0].json;\n\nconsole.log('=== GEMINI TTS DEBUG ===');\nconsole.log('Response status:', response.candidates?.[0]?.finishReason);\n\nif (response.candidates?.[0]?.finishReason === 'STOP') {\n  try {\n    // Get the base64 audio data\n    const base64AudioData = response.candidates[0].content.parts[0].inlineData.data;\n    const mimeType = response.candidates[0].content.parts[0].inlineData.mimeType;\n    \n    console.log('Audio data length:', base64AudioData.length);\n    console.log('Mime type:', mimeType);\n    \n    // CRITICAL FIX: Properly decode base64 to actual audio bytes\n    const actualAudioBytes = Buffer.from(base64AudioData, 'base64');\n    \n    console.log('Decoded audio size:', actualAudioBytes.length);\n    \n    // Create WAV file with proper PCM16 format\n    const sampleRate = 24000;\n    const channels = 1;\n    const bitsPerSample = 16;\n    \n    // Create WAV header\n    const wavHeader = createWavHeader(actualAudioBytes.length, sampleRate, channels, bitsPerSample);\n    const wavFile = Buffer.concat([wavHeader, actualAudioBytes]);\n    \n    const filename = `tts_fixed_${Date.now()}.wav`;\n    \n    return {\n      json: {\n        success: true,\n        filename: filename,\n        audioSize: wavFile.length,\n        audioSizeMB: (wavFile.length / 1024 / 1024).toFixed(2),\n        originalBase64Length: base64AudioData.length,\n        decodedAudioLength: actualAudioBytes.length\n      },\n      binary: {\n        audio: {\n          data: wavFile,\n          mimeType: 'audio/wav',\n          fileName: filename\n        }\n      }\n    };\n    \n  } catch (error) {\n    return {\n      json: {\n        success: false,\n        error: error.message,\n        step: 'audio_processing'\n      }\n    };\n  }\n} else {\n  return {\n    json: {\n      success: false,\n      error: 'TTS generation failed',\n      finishReason: response.candidates?.[0]?.finishReason || 'unknown',\n      response: response\n    }\n  };\n}\n\n// Helper function to create WAV header\nfunction createWavHeader(dataLength, sampleRate, channels, bitsPerSample) {\n  const byteRate = sampleRate * channels * bitsPerSample / 8;\n  const blockAlign = channels * bitsPerSample / 8;\n  const header = Buffer.alloc(44);\n  \n  // \"RIFF\" chunk descriptor\n  header.write('RIFF', 0);\n  header.writeUInt32LE(36 + dataLength, 4);\n  header.write('WAVE', 8);\n  \n  // \"fmt \" sub-chunk\n  header.write('fmt ', 12);\n  header.writeUInt32LE(16, 16); // Sub-chunk size\n  header.writeUInt16LE(1, 20);  // Audio format (PCM)\n  header.writeUInt16LE(channels, 22);\n  header.writeUInt32LE(sampleRate, 24);\n  header.writeUInt32LE(byteRate, 28);\n  header.writeUInt16LE(blockAlign, 32);\n  header.writeUInt16LE(bitsPerSample, 34);\n  \n  // \"data\" sub-chunk\n  header.write('data', 36);\n  header.writeUInt32LE(dataLength, 40);\n  \n  return header;\n}"
      },
      "typeVersion": 2
    },
    {
      "id": "36978e98-5d80-4859-868f-5f8e6ddcb569",
      "name": "\ud83d\udcbe Save Podcast Audio",
      "type": "n8n-nodes-base.writeBinaryFile",
      "position": [
        1136,
        -60
      ],
      "parameters": {
        "options": {},
        "fileName": "={{ $json.filename }}",
        "dataPropertyName": "audio"
      },
      "typeVersion": 1
    },
    {
      "id": "0deff03c-60f7-4fd0-84e2-96527caba405",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -360,
        -680
      ],
      "parameters": {
        "width": 320,
        "height": 400,
        "content": "## \ud83c\udfaf Main Template Overview\nWhat this workflow does:\n- Extracts text from uploaded PDF files\n- Generates conversational podcast script using AI\n- Converts script to natural-sounding audio\n- Saves final podcast as WAV file\n\nSetup required:\n- Google Gemini API key (for AI script generation)\n- Google PaLM API credentials (for TTS)\n- File upload capability"
      },
      "typeVersion": 1
    },
    {
      "id": "b867911c-8add-450c-89d8-906cace883ad",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -20,
        -680
      ],
      "parameters": {
        "color": 5,
        "width": 300,
        "height": 400,
        "content": "## \ud83d\udd27 Setup Instructions\nBefore You Start:\n\nGet API Keys:\n- Gemini API: https://aistudio.google.com/\n- Add to n8n credentials as \"Google Gemini(PaLM) Api account\"\n\n\nTest the workflow:\n- Use the manual trigger\n- Upload a PDF file\n- Check that audio file is generated"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Google Gemini Flash 2.0": {
      "ai_languageModel": [
        [
          {
            "node": "\ud83e\udd16 Generate Podcast Script",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "\u2699\ufe0f Prepare TTS Request": {
      "main": [
        [
          {
            "node": "\ud83c\udf99\ufe0f Convert Text to Speech With Gemini",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ud83d\udcc4 Extract Text from PDF": {
      "main": [
        [
          {
            "node": "\ud83e\udd16 Generate Podcast Script",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ud83c\udfac Start: Upload PDF File": {
      "main": [
        [
          {
            "node": "\ud83d\udcc4 Extract Text from PDF",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ud83d\udd27 Process Audio Response": {
      "main": [
        [
          {
            "node": "\ud83d\udcbe Save Podcast Audio",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ud83e\udd16 Generate Podcast Script": {
      "main": [
        [
          {
            "node": "\u2699\ufe0f Prepare TTS Request",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "\ud83c\udf99\ufe0f Convert Text to Speech With Gemini": {
      "main": [
        [
          {
            "node": "\ud83d\udd27 Process Audio Response",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Transform any PDF document into an engaging, natural-sounding podcast using Google's Gemini AI and advanced Text-to-Speech technology. This automated workflow extracts text content, generates conversational scripts, and produces high-quality audio files.

Source: https://n8n.io/workflows/4883/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This workflow creates a multi-talented AI assistant named Simran that interacts with users via Telegram. It can handle text and voice messages, understand the user's intent, and perform various tasks.

MongoDB, Chain Llm, Google Gemini Chat +11
AI & RAG

Content - Newsletter Agent. Uses formTrigger, chainLlm, outputParserStructured, httpRequest. Event-driven trigger; 91 nodes.

Form Trigger, Chain Llm, Output Parser Structured +8
AI & RAG

A Telegram bot that converts natural-language work descriptions into detailed cost estimates using AI parsing, vector search, and the open-source DDC CWICR database with 55,000+ construction work item

HTTP Request, Telegram, Telegram Trigger +6
AI & RAG

Content - Newsletter Agent. Uses formTrigger, chainLlm, outputParserStructured, httpRequest. Event-driven trigger; 87 nodes.

Form Trigger, Chain Llm, Output Parser Structured +7
AI & RAG

This template attempts to replicate OpenAI's DeepResearch feature which, at time of writing, is only available to their pro subscribers.

Output Parser Structured, OpenAI Chat, Form Trigger +8