AutomationFlowsAI & RAG › Extract Invoice Data From Pdfs to JSON with Gemini AI and XML Transformation

Extract Invoice Data From Pdfs to JSON with Gemini AI and XML Transformation

ByMauricio Perera @rckflr on n8n.io

This n8n workflow converts invoices in PDF format into a structured, ready-to-use JSON, using AI and XML transformation — without writing any code. Upload form → The user uploads a PDF file. Text extraction → The PDF content is extracted as plain text. XML schema definition → A…

Event trigger★★★★☆ complexityAI-powered10 nodesForm TriggerGoogle GeminiXML
AI & RAG Trigger: Event Nodes: 10 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #8460 — we link there as the canonical source.

This workflow follows the Form Trigger → Googlegemini recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "nodes": [
    {
      "id": "3a0d9a6f-6e6e-44a3-9eb0-1755b01fed0c",
      "name": "On form submission",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        672,
        -480
      ],
      "parameters": {
        "options": {},
        "formTitle": "Test",
        "formFields": {
          "values": [
            {
              "fieldType": "file",
              "fieldLabel": "data"
            }
          ]
        }
      },
      "typeVersion": 2.3
    },
    {
      "id": "d510fda8-ceaa-4d57-8946-39a97b23f3e1",
      "name": "Extract from File",
      "type": "n8n-nodes-base.extractFromFile",
      "position": [
        832,
        -480
      ],
      "parameters": {
        "options": {},
        "operation": "pdf"
      },
      "typeVersion": 1
    },
    {
      "id": "e070def8-b13a-49fa-ae4a-e366d1f474da",
      "name": "Message a model",
      "type": "@n8n/n8n-nodes-langchain.googleGemini",
      "position": [
        704,
        -240
      ],
      "parameters": {
        "modelId": {
          "__rl": true,
          "mode": "list",
          "value": "models/gemma-3n-e4b-it",
          "cachedResultName": "models/gemma-3n-e4b-it"
        },
        "options": {},
        "messages": {
          "values": [
            {
              "content": "=Considera la transcripcion del invoice adjunta, reescribela como un XML siguiendo este esquema:\n\n{{ $json.estructuraXML }}\n\nInvoice:\n\n{{ $json.text_limpio }}"
            }
          ]
        }
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "4e435b5b-95da-4b6a-a888-c2f74cd96cd1",
      "name": "Limpio data",
      "type": "n8n-nodes-base.set",
      "position": [
        1104,
        -480
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "ad0e7b3d-4011-4bfb-851e-c049883dc00a",
              "name": "text_limpio",
              "type": "string",
              "value": "={{ $json.text.replace(/\\n/g, ' ') }}"
            },
            {
              "id": "e0b6ea3e-17d6-4c18-a5f5-1b2cf98b4ddb",
              "name": "estructuraXML",
              "type": "string",
              "value": "<invoice>\n    <invoice_number>[invoice_number]</invoice_number>\n    <date_of_issue>[date_of_issue]</date_of_issue>\n    <due_date>[due_date]</due_date>\n\n    <billed_to>\n        <company_name>[billed_to.company_name]</company_name>\n        <contact_name>[billed_to.contact_name]</contact_name>\n        <address>[billed_to.address]</address>\n        <postal_code>[billed_to.postal_code]</postal_code>\n        <city>[billed_to.city]</city>\n        <state>[billed_to.state]</state>\n        <country>[billed_to.country]</country>\n        <rfc>[billed_to.rfc]</rfc>\n    </billed_to>\n\n    <from>\n        <company_name>[from.company_name]</company_name>\n        <address>[from.address]</address>\n        <postal_code>[from.postal_code]</postal_code>\n        <city>[from.city]</city>\n        <state>[from.state]</state>\n        <country>[from.country]</country>\n        <rfc>[from.rfc]</rfc>\n    </from>\n\n    <purchase_order>[purchase_order]</purchase_order>\n\n    <items>\n        <item>\n            <description>[item.description]</description>\n            <unit_cost>[item.unit_cost]</unit_cost>\n            <quantity>[item.quantity]</quantity>\n            <amount>[item.amount]</amount>\n        </item>\n        </items>\n\n    <bank_account_details>\n        <account_holder_name>[bank_account_details.account_holder_name]</account_holder_name>\n        <account_number>[bank_account_details.account_number]</account_number>\n        <routing_number>[bank_account_details.routing_number]</routing_number>\n        <swift_code>[bank_account_details.swift_code]</swift_code>\n        <bank_name>[bank_account_details.bank_name]</bank_name>\n        <currency>[bank_account_details.currency]</currency>\n    </bank_account_details>\n\n    <financials>\n        <subtotal>[subtotal]</subtotal>\n        <tax_rate>[tax_rate]</tax_rate>\n        <tax_amount>[tax_amount]</tax_amount>\n        <shipping_cost>[shipping_cost]</shipping_cost>\n        <invoice_total>[invoice_total]</invoice_total>\n    </financials>\n</invoice>"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "93fd56a6-33f9-4ac2-88b2-72157beb871f",
      "name": "Limpio XML",
      "type": "n8n-nodes-base.set",
      "position": [
        1040,
        -240
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "ddaad091-c54e-44d9-bf05-604e3bf43caa",
              "name": "factura_limpia",
              "type": "string",
              "value": "={{ $json.content.parts[0].text.replace('```xml', '').replace('```', '').replace(/(\\n|\\s{2,})/g, '').replace(/(\\s<)/g, '<').replace(/(>\\s)/g, '>') }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "9d96dd97-9048-4a6f-b11c-52c30a6d3fa3",
      "name": "XML to JSON",
      "type": "n8n-nodes-base.xml",
      "position": [
        1200,
        -240
      ],
      "parameters": {
        "options": {
          "trim": false,
          "normalize": false,
          "normalizeTags": false
        },
        "dataPropertyName": "factura_limpia"
      },
      "typeVersion": 1
    },
    {
      "id": "ee4365f4-08b5-42de-afb7-6a187272fabb",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        624,
        -544
      ],
      "parameters": {
        "color": 4,
        "width": 352,
        "height": 240,
        "content": "## PDF to text"
      },
      "typeVersion": 1
    },
    {
      "id": "e6bdaed7-1cee-4412-86c8-c7409ac1231e",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        976,
        -544
      ],
      "parameters": {
        "color": 2,
        "width": 368,
        "height": 240,
        "content": "## Clean data and XML structure definition"
      },
      "typeVersion": 1
    },
    {
      "id": "26faacbb-3464-46fe-8e1f-cd105942d179",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        624,
        -304
      ],
      "parameters": {
        "color": 3,
        "width": 352,
        "height": 256,
        "content": "## Generate XML string"
      },
      "typeVersion": 1
    },
    {
      "id": "33493f4d-a615-4a80-8727-7ebba208f215",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        976,
        -304
      ],
      "parameters": {
        "color": 5,
        "width": 368,
        "height": 256,
        "content": "## String to XML to Json"
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Limpio XML": {
      "main": [
        [
          {
            "node": "XML to JSON",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Limpio data": {
      "main": [
        [
          {
            "node": "Message a model",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Message a model": {
      "main": [
        [
          {
            "node": "Limpio XML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract from File": {
      "main": [
        [
          {
            "node": "Limpio data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "On form submission": {
      "main": [
        [
          {
            "node": "Extract from File",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This n8n workflow converts invoices in PDF format into a structured, ready-to-use JSON, using AI and XML transformation — without writing any code. Upload form → The user uploads a PDF file. Text extraction → The PDF content is extracted as plain text. XML schema definition → A…

Source: https://n8n.io/workflows/8460/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This n8n template retrieves verbal brand identity markers from any web site.

HTTP Request, Google Gemini, Form Trigger +1
AI & RAG

This workflow is built for finance teams, operations managers, founders, and businesses that process invoices regularly and want to eliminate manual document handling. It’s especially useful for teams

Form Trigger, AWS S3, AWS Textract +2
AI & RAG

This n8n template automates scraping content from Skool communities using the Olostep API. It collects structured data from Skool pages and stores it in a clean format, making it easy to analyze commu

N8N Nodes Olostep, Form Trigger, HTTP Request +3
AI & RAG

automation_financial_recording. Uses telegramTrigger, telegram, googleGemini, lmChatGoogleGemini. Event-driven trigger; 35 nodes.

Telegram Trigger, Telegram, Google Gemini +4
AI & RAG

automation_financial_recording. Uses telegramTrigger, telegram, googleGemini, lmChatGoogleGemini. Event-driven trigger; 35 nodes.

Telegram Trigger, Telegram, Google Gemini +4