AutomationFlowsAI & RAG › Automate Data Extraction From Faxes & Pdfs Using Google Gemini and Google Sheets

Automate Data Extraction From Faxes & Pdfs Using Google Gemini and Google Sheets

ByIntuz @intuz on n8n.io

It uses the power of Google Gemini's multimodal capabilities to read the document, identify key fields, and organize the data into a structured format, saving it directly to a Google Sheet. Healthcare Administrators Medical Billing Teams Legal Assistants Data Entry Professionals…

Event trigger★★★★☆ complexityAI-powered18 nodesGoogle DriveHTTP RequestGoogle SheetsChain LlmGoogle Gemini ChatOutput Parser StructuredForm Trigger
AI & RAG Trigger: Event Nodes: 18 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #8939 — we link there as the canonical source.

This workflow follows the Chainllm → Form Trigger recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "CACB2vgfyxBJE30X",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Development Done - FAX Content Extraction",
  "tags": [
    {
      "id": "m1paRrANi4GrQXxX",
      "name": "AI Internal",
      "createdAt": "2025-09-18T04:05:25.667Z",
      "updatedAt": "2025-09-18T04:05:25.667Z"
    }
  ],
  "nodes": [
    {
      "id": "5be36f47-e7ed-464a-bf73-6573abed5fc6",
      "name": "Extract from File",
      "type": "n8n-nodes-base.extractFromFile",
      "position": [
        -336,
        160
      ],
      "parameters": {
        "options": {},
        "operation": "binaryToPropery"
      },
      "typeVersion": 1
    },
    {
      "id": "4e9b17d5-0ded-4edb-baa5-b7eae5d0ecfa",
      "name": "Google Drive",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        -528,
        160
      ],
      "parameters": {
        "fileId": {
          "__rl": true,
          "mode": "id",
          "value": "={{ $('Upload file').item.json.id }}"
        },
        "options": {},
        "operation": "download"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "579a0a77-34bd-4f38-a22a-dfc68421b0b9",
      "name": "Call Gemini 2.0 Flash with PDF Capabilities",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -96,
        160
      ],
      "parameters": {
        "url": "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent",
        "method": "POST",
        "options": {},
        "jsonBody": "={\n  \"contents\": [\n    {\n      \"parts\": [\n        {\n          \"inline_data\": {\n            \"mime_type\": \"application/pdf\",\n            \"data\": \"{{ $json.data }}\"\n          }\n        },\n        {\n          \"text\": \"{{ $('Define Prompt').item.json.prompt }}\"\n        }\n      ]\n    }\n  ]\n}",
        "sendBody": true,
        "specifyBody": "json",
        "authentication": "predefinedCredentialType",
        "nodeCredentialType": "googlePalmApi"
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "07953a5f-115a-4697-9961-f5d7046ec6db",
      "name": "Define Prompt",
      "type": "n8n-nodes-base.set",
      "position": [
        -768,
        160
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "dba23ef5-95df-496a-8e24-c7c1544533d2",
              "name": "prompt",
              "type": "string",
              "value": "You are an expert document analyzer.   Your task is to read a fax document that may contain a medical or administrative form with multiple fields, tables, and notes. The document may contain both filled and unfilled fields.    Instructions:   1. Extract only the filled or relevant fields from the form.   2. Ignore empty fields, irrelevant boilerplate text, transmission details, or formatting artifacts.   3. Present the extracted information in a clean structured format.   4. Capture key details such as:    - Patient Information (ID, Name, DOB, Sex, Address, Diagnosis, Allergies, Functional Limitations, Mental Status, etc.)      - Provider Information (Provider Name, Address, Physician Name, Certification Dates, Provider No.)      - Medical Data (Diagnosis codes, Medications, Procedures, Treatments, Orders, Prognosis, Safety Measures, Activities Permitted, Goals)      - Certification / Signatures (Physician signature, Date signed, Certification statement)      - Any explicit instructions, precautions, or urgent notes.   5. If the document contains critical health conditions, precautions, or deadlines, highlight them separately.   6. Do not alter or interpret the medical data\u2014extract it exactly as written. "
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "e570291a-6ca2-49f4-be02-a837be4444f6",
      "name": "Upload file",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        -1072,
        160
      ],
      "parameters": {
        "name": "Fax test",
        "driveId": {
          "__rl": true,
          "mode": "list",
          "value": "My Drive",
          "cachedResultUrl": "https://drive.google.com/drive/my-drive",
          "cachedResultName": "My Drive"
        },
        "options": {},
        "folderId": {
          "__rl": true,
          "mode": "list",
          "value": "{YOUR_GOOGLE_DRIVE_FOLDER_ID}",
          "cachedResultUrl": "https://drive.google.com/drive/folders/{YOUR_GOOGLE_DRIVE_FOLDER_ID}",
          "cachedResultName": "FAX Test"
        },
        "inputDataFieldName": "Upload_File"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "37801239-6daf-4445-9004-01523094018a",
      "name": "Append row in sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        848,
        160
      ],
      "parameters": {
        "columns": {
          "value": {
            "Gender": "={{ $json.output.Gender }}",
            "SOC Date": "={{ $json.output.SOCDate }}",
            "Allergies": "={{ $json.output.Allergies }}",
            "Prognosis": "={{ $json.output.Prognosis }}",
            "Patient ID": "={{ $json.output.PatientID }}",
            "Date Signed": "={{ $json.output.DateSigned }}",
            "Patient Name": "={{ $json.output.PatientName }}",
            "Provider No.": "={{ $json.output.ProviderNo }}",
            "Date of Birth": "={{ $json.output.DateOfBirth }}",
            "Mental Status": "={{ $json.output['Mental Status'] }}",
            "Medical Record": "={{ $json.output.MedicalRecord }}",
            "Patient Address": "={{ $json.output.PatientAddress }}",
            "Safety Measures": "={{ $json.output.SafetyMeasures }}"
          },
          "schema": [
            {
              "id": "Patient ID",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Patient ID",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "SOC Date",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "SOC Date",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Medical Record",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Medical Record",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Provider No.",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Provider No.",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Patient Name",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Patient Name",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Patient Address",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Patient Address",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Date of Birth",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Date of Birth",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Safety Measures",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Safety Measures",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Allergies",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Allergies",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Gender",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Gender",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Mental Status",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Mental Status",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Prognosis",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Prognosis",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "Date Signed",
              "type": "string",
              "display": true,
              "removed": false,
              "required": false,
              "displayName": "Date Signed",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [
            "Information"
          ],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/{YOUR_GOOGLE_SHEET_ID}/edit#gid=0",
          "cachedResultName": "Sheet1"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "{YOUR_GOOGLE_SHEET_ID}",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/{YOUR_GOOGLE_SHEET_ID}/edit?usp=drivesdk",
          "cachedResultName": "Fax Test Extracted"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "ca632b40-621b-4359-9e80-31706f374831",
      "name": "Basic LLM Chain",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        384,
        160
      ],
      "parameters": {
        "text": "==You are a strict JSON parser.\nReturn ONLY valid JSON that matches this schema. \nNo markdown, no comments, no extra text.\n\nSchema:\n{\n  \"PatientID\": \"string or null\",\n  \"SOCDate\": \"string or null\",\n  \"MedicalRecord\": \"string or null\",\n  \"ProviderNo\": \"string or null\",\n  \"PatientName\": \"string or null\",\n  \"PatientAddress\": \"string or null\",\n  \"DateOfBirth\": \"string or null\",\n  \"SafetyMeasures\": \"string or null\",\n  \"Allergies\": \"string or null\",\n  \"Gender\": \"string or null\",\n  \"Mental Status\": \"string or null\",\n  \"Prognosis\": \"string or null\",\n  \"DateSigned\": \"string or null\"\n}\n\nRules:\n- Include **all** keys, even if null.\n- Do not add markdown (like ```json).\n- Do not add extra text.\n\nExtracted Fax Text:\n\"\"\"\n{{ $json.extractedText }}\n\"\"\"\n",
        "batching": {},
        "promptType": "define",
        "hasOutputParser": true
      },
      "retryOnFail": true,
      "typeVersion": 1.7
    },
    {
      "id": "0330fa20-6915-42c5-8769-0f2658f5b035",
      "name": "Google Gemini Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini",
      "position": [
        368,
        336
      ],
      "parameters": {
        "options": {}
      },
      "credentials": {
        "googlePalmApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "d022685e-fe91-4980-b257-0ec2e7989491",
      "name": "Structured Output Parser",
      "type": "@n8n/n8n-nodes-langchain.outputParserStructured",
      "position": [
        544,
        336
      ],
      "parameters": {
        "jsonSchemaExample": "{\n  \"PatientID\": \"ID - 123\",\n  \"SOCDate\": \"10/12/2026\",\n  \"MedicalRecord\": \"No\",\n  \"ProviderNo\": \"MR - 000\",\n  \"PatientName\": \"Tom Cruise\",\n  \"PatientAddress\": \"F-19 Lamine Street, Barcelona, Spain\",\n  \"DateOfBirth\": \"09/11/2020\",\n  \"SafetyMeasures\": \"Precaution\",\n  \"Allergies\": \"Nose Allergy\",\n  \"Gender\": \"F\",\n  \"Mental Status\": \"Agitated\",\n  \"Prognosis\": \"Fair\",\n  \"DateSigned\": \"25/09/2025\"\n}\n"
      },
      "typeVersion": 1.3
    },
    {
      "id": "bea9b647-0e25-47ee-9eab-c952f512b03f",
      "name": "On form submission",
      "type": "n8n-nodes-base.formTrigger",
      "position": [
        -1376,
        160
      ],
      "parameters": {
        "options": {},
        "formTitle": "Fax File",
        "formFields": {
          "values": [
            {
              "fieldType": "file",
              "fieldLabel": "Upload File",
              "multipleFiles": false,
              "requiredField": true
            }
          ]
        },
        "formDescription": "User can Upload the Fax PDF directly here!"
      },
      "typeVersion": 2.2
    },
    {
      "id": "a452a474-38ee-426b-b225-fd8f65bbd878",
      "name": "Code",
      "type": "n8n-nodes-base.code",
      "position": [
        176,
        160
      ],
      "parameters": {
        "jsCode": "// Function to clean and extract plain text\nreturn $input.all().map(item => {\n  const candidates = item.json.candidates || [];\n  const firstCandidate = candidates[0] || {};\n  const parts = firstCandidate.content?.parts || [];\n  let extractedText = parts[0]?.text || \"\";\n\n  // Clean markdown-like artifacts\n  extractedText = extractedText\n    .replace(/\\*\\*/g, \"\")   // remove bold markers\n    .replace(/\\*/g, \"\")     // remove list markers\n    .trim();\n\n  return {\n    json: {\n      extractedText\n    }\n  };\n});\n"
      },
      "typeVersion": 2
    },
    {
      "id": "26f7cfb0-39bb-47f8-8f6c-4d5e87dd999c",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1456,
        -16
      ],
      "parameters": {
        "color": 4,
        "width": 272,
        "height": 320,
        "content": "## 1. Start Here: Form Trigger \n* This is the entry point for the workflow.\n* It creates a web form where a user can upload the FAX PDF file."
      },
      "typeVersion": 1
    },
    {
      "id": "43885ab6-67d3-4d46-8202-83e3228d68b2",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1152,
        -32
      ],
      "parameters": {
        "color": 5,
        "width": 288,
        "height": 336,
        "content": "## 2. Configure: Upload to Google Drive\n* Select your Google Drive account from the 'Credentials' dropdown.\n* Change the Folder ID to the specific Google Drive folder where you want to save the faxes."
      },
      "typeVersion": 1
    },
    {
      "id": "c0e958e2-f39b-4463-8ef1-a46ae0a129d7",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -832,
        -16
      ],
      "parameters": {
        "color": 6,
        "width": 224,
        "height": 320,
        "content": "## 3. Customize AI Brain (Part 1)\n* You can edit the *prompt* value here to change the extraction rules or ask for different information."
      },
      "typeVersion": 1
    },
    {
      "id": "8c973931-b16b-4c81-888d-5869b52544e8",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -176,
        -16
      ],
      "parameters": {
        "color": 2,
        "width": 272,
        "height": 320,
        "content": "## 4. First AI Call: PDF Extraction\n* **ACTION REQUIRED**: You must select your \"Google Palm API\" credentials from the dropdown for this node to work."
      },
      "typeVersion": 1
    },
    {
      "id": "1bc9982f-4880-4552-88c6-b5f2f296d1d7",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        320,
        -48
      ],
      "parameters": {
        "color": 3,
        "width": 352,
        "height": 320,
        "content": "## 5. Customize AI Brain (Part 2)\nThis second AI call structures the cleaned text into a strict JSON format.\n* **ACTION REQUIRED**: In the connected Google Gemini Chat Model node, select your Gemini API credentials.\n* To change the final output fields, edit the JSON schema inside the Prompt field of this node."
      },
      "typeVersion": 1
    },
    {
      "id": "c4e2b238-0152-46f2-8195-d25ad26d5882",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        704,
        -48
      ],
      "parameters": {
        "color": 4,
        "width": 384,
        "height": 352,
        "content": "## Final Step: Save to Google Sheets\n* **ACTION REQUIRED**: Configure your Google Sheets credentials.\n* **ACTION REQUIRED**: Update the Document ID with the ID of your own Google Sheet.\n* Ensure the column names in this node match the header row in your target Google Sheet."
      },
      "typeVersion": 1
    },
    {
      "id": "5e10a3b1-73f2-4263-8ebd-2f9e0b7db9bf",
      "name": "Sticky Note6",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        448,
        -416
      ],
      "parameters": {
        "color": 5,
        "width": 480,
        "height": 336,
        "content": "## Extraction Values\n* Once the workflow is set up and all nodes are configured, define the fields you want to extract from the *fax* content.\n\n* Add corresponding column fields in *Google Sheets* for each data point that needs to be extracted.\n\n* In the *LLM Chain* node prompt, specify the same fields as keys and provide example values to guide the extraction.\n\n* Update the *Structured Output Parser* with these keys and expected values to ensure the output is consistently structured and aligned with your Google Sheets columns."
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "7c15e905-123e-4655-bc9f-b712a70ddda2",
  "connections": {
    "Code": {
      "main": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Upload file": {
      "main": [
        [
          {
            "node": "Define Prompt",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Drive": {
      "main": [
        [
          {
            "node": "Extract from File",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Define Prompt": {
      "main": [
        [
          {
            "node": "Google Drive",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Basic LLM Chain": {
      "main": [
        [
          {
            "node": "Append row in sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract from File": {
      "main": [
        [
          {
            "node": "Call Gemini 2.0 Flash with PDF Capabilities",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "On form submission": {
      "main": [
        [
          {
            "node": "Upload file",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Google Gemini Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Structured Output Parser": {
      "ai_outputParser": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_outputParser",
            "index": 0
          }
        ]
      ]
    },
    "Call Gemini 2.0 Flash with PDF Capabilities": {
      "main": [
        [
          {
            "node": "Code",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

It uses the power of Google Gemini's multimodal capabilities to read the document, identify key fields, and organize the data into a structured format, saving it directly to a Google Sheet. Healthcare Administrators Medical Billing Teams Legal Assistants Data Entry Professionals…

Source: https://n8n.io/workflows/8939/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This workflow is perfect for graphic designers, creative agencies, marketing teams, or freelancers who regularly use AI-generated images in their projects. It's specifically beneficial for teams that

Google Sheets, Google Drive, HTTP Request +5
AI & RAG

Automate your lead intake, scoring, and outreach pipeline. This workflow collects leads from forms, enriches and scores them using Relevance AI, routes them by quality, and triggers the right follow-u

Form Trigger, HTTP Request, Chain Llm +6
AI & RAG

This n8n template demonstrates how to automatically generate authentic User-Generated Content (UGC) style marketing videos for eCommerce products using AI. Simply upload a product image, and the workf

Form Trigger, OpenAI, Chain Llm +5
AI & RAG

The Recap AI - eCommerce UGC Video Generator. Uses formTrigger, openAi, chainLlm, outputParserStructured. Event-driven trigger; 24 nodes.

Form Trigger, OpenAI, Chain Llm +5
AI & RAG

🚀 AI Resume Screener (n8n Workflow Template)

Form Trigger, Information Extractor, Chain Summarization +5