AutomationFlowsWeb Scraping › Extract and Structure Invoice Data with Docsumo and Export to Excel

Extract and Structure Invoice Data with Docsumo and Export to Excel

ByAnurag @aiautoeye on n8n.io

This workflow automates the extraction of structured data from invoices or similar documents using Docsumo's API. Users can upload a PDF via an n8n form trigger, which is then sent to Docsumo for processing and structured parsing. The workflow fetches key document metadata and…

Event trigger★★★★☆ complexity7 nodesForm TriggerHTTP Request
Web Scraping Trigger: Event Nodes: 7 Complexity: ★★★★☆ Added:

This workflow corresponds to n8n.io template #6195 — we link there as the canonical source.

This workflow follows the Form Trigger → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "5s26gAyizRhbNI15",
  "name": "DocSumo Invoice Parser",
  "tags": [],
  "nodes": [
    {
      "id": "a35d21e3-7139-4a75-9deb-04a61e21ab20",
      "name": "On form submission",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        40,
        -100
      ],
      "parameters": {
        "options": {},
        "formTitle": "file",
        "formFields": {
          "values": [
            {
              "fieldType": "file",
              "fieldLabel": "file",
              "multipleFiles": false
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "c5ebb140-2292-424a-81e9-3438d7142673",
      "name": "HTTP Request",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        260,
        -100
      ],
      "parameters": {
        "url": "https://app.docsumo.com/api/v1/eevee/apikey/upload/",
        "method": "POST",
        "options": {
          "redirect": {
            "redirect": {
              "followRedirects": false
            }
          }
        },
        "sendBody": true,
        "contentType": "multipart-form-data",
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "type",
              "value": "invoice"
            },
            {
              "name": "file",
              "parameterType": "formBinaryData",
              "inputDataFieldName": "file"
            },
            {
              "name": "skip_review",
              "value": "true"
            }
          ]
        },
        "genericAuthType": "httpHeaderAuth",
        "headerParameters": {
          "parameters": [
            {}
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "b76f3d26-c582-4c87-b2f7-9f973ecf6dd0",
      "name": "HTTP Request1",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        440,
        -100
      ],
      "parameters": {
        "url": "=https://app.docsumo.com/api/v1/eevee/apikey/documents/detail/{{ $json.data.document[0].doc_id }}",
        "options": {},
        "authentication": "genericCredentialType",
        "genericAuthType": "httpHeaderAuth"
      },
      "typeVersion": 4.2
    },
    {
      "id": "78a90dab-233f-42f9-9d97-2a8220c77ceb",
      "name": "HTTP Request2",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        620,
        -100
      ],
      "parameters": {
        "url": "=https://app.docsumo.com/api/v1/eevee/apikey/data/simplified/{{ $json.data.document.doc_id }}/",
        "options": {
          "response": {
            "response": {}
          }
        },
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "genericAuthType": "httpHeaderAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "d5f56d5b-b7e6-41f1-9e7a-81c97eb2c191",
      "name": "Code",
      "type": "n8n-nodes-base.code",
      "position": [
        780,
        -100
      ],
      "parameters": {
        "jsCode": "// Process each document item\nconst processInvoice = (doc) => {\n  // Extract header information (excluding Document Title and Processed At)\n  const headerData = {\n    \"Invoice Number\": doc?.data?.[\"Basic Information\"]?.[\"Invoice Number\"]?.value || \"\",\n    \"Issue Date\": doc?.data?.[\"Basic Information\"]?.[\"Issue Date\"]?.value || \"\",\n    \"Order ID\": doc?.data?.[\"Basic Information\"]?.[\"Order Id/Tracking No\"]?.value || \"\",\n    \"Payment Terms\": doc?.data?.[\"Basic Information\"]?.[\"Terms\"]?.value || \"\",\n    \"Buyer Name\": doc?.data?.[\"Buyer Detail\"]?.[\"Name\"]?.value || \"\",\n    \"Buyer Address\": doc?.data?.[\"Buyer Detail\"]?.[\"Address\"]?.value || \"\",\n    \"Buyer GST\": doc?.data?.[\"Buyer Detail\"]?.[\"GST/ VAT Number\"]?.value || \"\",\n    \"Seller Name\": doc?.data?.[\"Seller Detail\"]?.[\"Name\"]?.value || \"\",\n    \"Seller Address\": doc?.data?.[\"Seller Detail\"]?.[\"Address\"]?.value || \"\",\n    \"Seller GST\": doc?.data?.[\"Seller Detail\"]?.[\"GST/ VAT Number\"]?.value || \"\",\n    \"Subtotal\": doc?.data?.[\"GST & Amount\"]?.[\"Subtotal\"]?.value || 0,\n    \"Tax Total\": doc?.data?.[\"GST & Amount\"]?.[\"Tax Total\"]?.value || 0,\n    \"Total Due\": doc?.data?.[\"GST & Amount\"]?.[\"Total Due\"]?.value || 0\n  };\n\n  // Process line items with all original columns\n  return doc?.data?.Table?.[\"Line Items\"]?.map(item => ({\n    ...headerData,  // Include all header fields\n    \"Sr No.\": item?.[\"Sr No.\"]?.value || \"\",\n    \"Item Code\": item?.[\"Item Code\"]?.value || \"\",\n    \"Item Desc\": item?.Description?.value || \"\",\n    \"HSN/SAC Code\": item?.HSN?.value || \"\",\n    \"Qty\": item?.Quantity?.value || 0,\n    \"Unit Price (INR)\": item?.[\"Unit Price\"]?.value || 0,\n    \"Per\": item?.[\"Per\"]?.value || \"\",\n    \"UoM\": item?.UoM?.value || \"\",\n    \"Net Amount (INR)\": item?.[\"Subtotal Line\"]?.value || 0,\n    \"Tax Rate\": item?.[\"Tax Rate Line\"]?.value || \"\",\n    \"TGST\": item?.TGST?.value || 0,\n    \"CGST\": item?.CGST?.value || 0,\n    \"SGST\": item?.SGST?.value || 0,\n    \"TCS\": item?.TCS?.value || 0,\n    \"Gross Amt (INR)\": item?.[\"Gross Amount\"]?.value || 0,\n    \"Delivery Date\": item?.[\"Delivery Date\"]?.value || \"\"\n  })) || [];\n};\n\n// Main processing\nconst allResults = [];\nfor (const item of items) {\n  try {\n    const doc = Array.isArray(item.json) ? item.json[0] : item.json;\n    if (doc?.data) {\n      allResults.push(...processInvoice(doc));\n    }\n  } catch (error) {\n    console.log(`Error processing item: ${error.message}`);\n  }\n}\n\n// Return formatted results\nreturn allResults.length > 0 \n  ? allResults.map(result => ({ json: result }))\n  : [{ json: { error: \"No valid data processed\", raw: items[0]?.json }}];"
      },
      "typeVersion": 2
    },
    {
      "id": "ca39a45f-5ea1-40a2-a68f-49e62ead1973",
      "name": "Convert to File",
      "type": "n8n-nodes-base.convertToFile",
      "position": [
        980,
        -100
      ],
      "parameters": {
        "options": {},
        "operation": "xls",
        "binaryPropertyName": "="
      },
      "typeVersion": 1.1
    },
    {
      "id": "01c14879-d3f3-4185-a30c-609de127c0d7",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        720,
        40
      ],
      "parameters": {
        "height": 100,
        "content": "PLease adjust Code ou can customize header or line item extraction by editing the Code node as needed."
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "2c8d10c9-6226-47ef-ad9d-9690fed6df9b",
  "connections": {
    "Code": {
      "main": [
        [
          {
            "node": "Convert to File",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP Request": {
      "main": [
        [
          {
            "node": "HTTP Request1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP Request1": {
      "main": [
        [
          {
            "node": "HTTP Request2",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTTP Request2": {
      "main": [
        [
          {
            "node": "Code",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "On form submission": {
      "main": [
        [
          {
            "node": "HTTP Request",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

This workflow automates the extraction of structured data from invoices or similar documents using Docsumo's API. Users can upload a PDF via an n8n form trigger, which is then sent to Docsumo for processing and structured parsing. The workflow fetches key document metadata and…

Source: https://n8n.io/workflows/6195/ — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This workflow allows you to import any workflow from a file or another n8n instance and map the credentials easily. A multi-form setup guides you through the entire process At the beginning you have t

Execute Command, Read Write File, HTTP Request +3
Web Scraping

[n8n] Advanced URL Parsing and Shortening Workflow - Switchy.io Integration. Uses splitInBatches, stickyNote, httpRequest, html. Event-driven trigger; 56 nodes.

HTTP Request, GitHub, Stop And Error +1
Web Scraping

[](https://youtu.be/c7yCZhmMjtI)

HTTP Request, GitHub, Stop And Error +1
Web Scraping

N8n recently introduced folders and it has been a big improvement on workflow management on top of the tags.

HTTP Request, n8n, Form Trigger +1
Web Scraping

This workflow automates the creation of press releases for music artists releasing a new single. Upload your MP3, fill in basic info, and receive a publication-ready press release saved as a Google Do

Form Trigger, HTTP Request, Google Docs