AutomationFlowsWeb Scraping › PDF to Markdown Converter with Llamacloud Parser

PDF to Markdown Converter with Llamacloud Parser

ByPatrick Campbell @therealpjc014 on n8n.io

PDF to Markdown Converter (LlamaCloud) Description: How it works This workflow extracts structured content from complex PDFs using LlamaCloud's advanced parsing engine:

Manual trigger★★★☆☆ complexity12 nodesHTTP RequestGoogle Drive
Web Scraping Trigger: Manual Nodes: 12 Complexity: ★★★☆☆ Added:

This workflow corresponds to n8n.io template #11811 — we link there as the canonical source.

This workflow follows the Google Drive → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "vbQQcRqfFKBOs4ug",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "PDF Parse Using LlamaCloud",
  "tags": [],
  "nodes": [
    {
      "id": "5bd2a596-f055-46aa-ae21-742d9e57ca79",
      "name": "Workflow Info & Setup",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -3472,
        192
      ],
      "parameters": {
        "color": 4,
        "width": 560,
        "height": 1328,
        "content": "## \ud83d\udcc4 PDF Parse with LlamaCloud\n\n### How it works\n\nThis workflow extracts and converts PDF content into clean markdown format using LlamaCloud's parsing API:\n\n1. **Download PDF** \u2013 Retrieves a PDF file from Google Drive\n2. **Upload to LlamaCloud** \u2013 Sends the PDF to LlamaCloud's parsing service and receives a job ID\n3. **Wait & Poll Status** \u2013 Waits 1 second, then checks if parsing is complete\n4. **Loop Until Complete** \u2013 If still processing, waits 30 seconds and checks again\n5. **Retrieve Markdown** \u2013 Once complete, fetches the parsed content in markdown format\n\n### Key Features\n- Handles complex PDFs with tables, images, and multi-column layouts\n- Returns clean, structured markdown output\n- Automatic retry logic for long processing times\n- Ready for AI processing or content transformation\n\n---\n\n### Set Up Steps (~5 minutes)\n\n**1. Get LlamaCloud API Key**\n- Go to [cloud.llamaindex.ai](https://cloud.llamaindex.ai)\n- Sign up or log in to your account\n- Navigate to **API Keys** section\n- Create a new API key and copy it\n\n**2. Configure LlamaCloud Credentials in n8n**\n- In n8n, create a **Generic Header Auth** credential\n- Set the credential name (e.g., \"LlamaCloud API\")\n- Configure:\n  - **Name**: `Authorization`\n  - **Value**: `Bearer YOUR_TOKEN_HERE`\n- Apply this credential to all HTTP Request nodes that call LlamaCloud\n\n**3. Set Up Google Drive (Optional)**\n- If using the Google Drive node:\n  - Create OAuth2 credentials in n8n\n  - Connect your Google account\n  - Update the File ID to point to your PDF\n- **Alternative**: Replace the Google Drive node with any file input method\n\n**4. Test the Workflow**\n- Click \"Execute Workflow\" to test\n- The parsed markdown will appear in the final \"Get Data\" node\n- Processing time varies based on PDF complexity (typically 30-60 seconds)\n\n---\n\n### Notes\n- Large or complex PDFs may take 1-2 minutes to process\n- The workflow automatically retries every 30 seconds until complete\n- Output is in markdown format, perfect for AI processing\n- You can connect AI nodes after \"Get Data\" to analyze or transform the content"
      },
      "typeVersion": 1
    },
    {
      "id": "8697a2ba-d399-464c-8542-695dacf74aba",
      "name": "Step 1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2896,
        336
      ],
      "parameters": {
        "color": 5,
        "width": 320,
        "height": 280,
        "content": "## Step 1: Source PDF\n\nDownload your PDF from Google Drive.\n\n**Alternative sources:**\n- HTTP Request node (download from URL)\n- Binary File node (local upload)\n- Webhook (receive via API)\n\nThe PDF is passed as binary data to the next step."
      },
      "typeVersion": 1
    },
    {
      "id": "f70b7b43-8b78-435f-bba4-688fd1b099d2",
      "name": "Step 2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2560,
        304
      ],
      "parameters": {
        "color": 6,
        "width": 320,
        "height": 280,
        "content": "## Step 2: Upload to LlamaCloud\n\nSends the PDF binary data to LlamaCloud's parsing API.\n\n**Returns:** Job ID for tracking\n\n**Credential needed:** Bearer YOUR_TOKEN_HERE with your LlamaCloud API key"
      },
      "typeVersion": 1
    },
    {
      "id": "b7c3a9eb-7e0d-4bfb-a2ea-9b6388c742a6",
      "name": "Step 3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -2192,
        288
      ],
      "parameters": {
        "color": 7,
        "width": 480,
        "height": 280,
        "content": "## Step 3: Wait & Check\n\nWaits 1 second, then checks the parsing job status.\n\nIf **SUCCESS** \u2192 retrieves markdown\nIf **PENDING** \u2192 waits 30s and checks again\n\nThis loop continues until parsing completes."
      },
      "typeVersion": 1
    },
    {
      "id": "4e9a7d2c-16b0-4d80-b824-b57c6a2a9caa",
      "name": "Step 4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -1648,
        176
      ],
      "parameters": {
        "color": 3,
        "width": 320,
        "height": 344,
        "content": "## Step 4: Get Markdown\n\nOnce parsing is complete, this retrieves the final markdown output.\n\n**Output:** Clean markdown text with:\n- Extracted text content\n- Table structures preserved\n- Image references\n- Proper formatting\n\nReady for AI analysis or further processing!"
      },
      "typeVersion": 1
    },
    {
      "id": "311c4821-1754-4bca-907a-90af8bdfdc31",
      "name": "Wait1",
      "type": "n8n-nodes-base.wait",
      "position": [
        -2288,
        672
      ],
      "parameters": {
        "amount": 1
      },
      "typeVersion": 1.1
    },
    {
      "id": "3323aec4-86f8-4d83-8a15-01cbecf65824",
      "name": "Check Status1",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -2064,
        672
      ],
      "parameters": {
        "url": "=https://api.cloud.llamaindex.ai/api/v1/parsing/job/{{ $json.id }}",
        "options": {},
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "genericAuthType": "httpHeaderAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        }
      },
      "credentials": {
        "httpHeaderAuth": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "b104278d-c061-4bbb-a787-8f10cbeecac8",
      "name": "Send Data To Llama Cloud1",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -2512,
        672
      ],
      "parameters": {
        "url": "https://api.cloud.llamaindex.ai/api/v1/parsing/upload",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "contentType": "multipart-form-data",
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "bodyParameters": {
          "parameters": [
            {
              "name": "file",
              "parameterType": "formBinaryData",
              "inputDataFieldName": "data"
            }
          ]
        },
        "genericAuthType": "httpHeaderAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        }
      },
      "credentials": {
        "httpHeaderAuth": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "e7daf635-0e3c-4003-9c67-d193e89a10f3",
      "name": "Download File From Drive1",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        -2736,
        672
      ],
      "parameters": {
        "fileId": {
          "__rl": true,
          "mode": "list",
          "value": "1TPAxQq0fXVrgr7VbP7_RVV33v6r0sYk9",
          "cachedResultUrl": "https://drive.google.com/file/d/1TPAxQq0fXVrgr7VbP7_RVV33v6r0sYk9/view?usp=drivesdk",
          "cachedResultName": "lama_parse_example.pdf"
        },
        "options": {},
        "operation": "download"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "f4b50e47-7c77-4b65-8365-5449e3997552",
      "name": "Check Job Status1",
      "type": "n8n-nodes-base.if",
      "position": [
        -1840,
        608
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "fd51e991-57cc-490b-b0cb-cc4d3d9de54d",
              "operator": {
                "name": "filter.operator.equals",
                "type": "string",
                "operation": "equals"
              },
              "leftValue": "={{ $json.status }}",
              "rightValue": "SUCCESS"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "f2a852cc-36d7-4c9a-bad8-67d776b16cb3",
      "name": "Wait3",
      "type": "n8n-nodes-base.wait",
      "position": [
        -1616,
        768
      ],
      "parameters": {
        "amount": 30
      },
      "typeVersion": 1.1
    },
    {
      "id": "a4ef35f2-c1db-4c7e-8146-d89fd026f3c8",
      "name": "Get Data1",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        -1616,
        560
      ],
      "parameters": {
        "url": "=https://api.cloud.llamaindex.ai/api/v1/parsing/job/{{ $json.id }}/result/markdown",
        "options": {},
        "sendHeaders": true,
        "authentication": "genericCredentialType",
        "genericAuthType": "httpHeaderAuth",
        "headerParameters": {
          "parameters": [
            {
              "name": "accept",
              "value": "application/json"
            }
          ]
        }
      },
      "credentials": {
        "httpHeaderAuth": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.2
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "4598139a-5320-413d-981c-22b93d9982cc",
  "connections": {
    "Wait1": {
      "main": [
        [
          {
            "node": "Check Status1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Wait3": {
      "main": [
        [
          {
            "node": "Check Status1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Status1": {
      "main": [
        [
          {
            "node": "Check Job Status1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Check Job Status1": {
      "main": [
        [
          {
            "node": "Get Data1",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Wait3",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Download File From Drive1": {
      "main": [
        [
          {
            "node": "Send Data To Llama Cloud1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Send Data To Llama Cloud1": {
      "main": [
        [
          {
            "node": "Wait1",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

PDF to Markdown Converter (LlamaCloud) Description: How it works This workflow extracts structured content from complex PDFs using LlamaCloud's advanced parsing engine:

Source: https://n8n.io/workflows/11811/ — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This is an enterprise-grade solution designed for complex finance departments. It automates the entire accounts payable lifecycle by combining secure document handling, intelligent vendor mapping, 3-w

Gmail, Google Sheets, Google Drive +4
Web Scraping

A lean, 3-node automation that turns voice memos into tweets — so creators can capture ideas on the go and publish fast, without typing.

Google Drive, HTTP Request, X
Web Scraping

The Sora 2 API allows seamless generation of CGI ads, turning text prompts into stunning videos. This workflow automates the entire process from video generation to upload, notification, and file shar

Form Trigger, HTTP Request, Email Send +1
Web Scraping

Formtrigger Workflow. Uses formTrigger, googleDrive, httpRequest, stopAndError. Event-driven trigger; 28 nodes.

Form Trigger, Google Drive, HTTP Request +1
Web Scraping

Formtrigger Workflow. Uses formTrigger, googleDrive, httpRequest, stopAndError. Event-driven trigger; 28 nodes.

Form Trigger, Google Drive, HTTP Request +1