AutomationFlowsWeb Scraping › Caas - Async Document to Markdown with Polling

Caas - Async Document to Markdown with Polling

CAAS - Async Document to Markdown with Polling. Uses ReadWriteFile, HttpRequest. Scheduled trigger; 8 nodes.

Cron / scheduled trigger★★★★☆ complexity8 nodesRead Write FileHTTP Request
Web Scraping Trigger: Cron / scheduled Nodes: 8 Complexity: ★★★★☆ Added:

This workflow follows the HTTP Request → Readwritefile recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "name": "CAAS - Async Document to Markdown with Polling",
  "nodes": [
    {
      "id": "6a1b2c3d-0001",
      "name": "Schedule Trigger",
      "type": "n8n-nodes-base.ScheduleTrigger",
      "typeVersion": 1.2,
      "position": [
        250,
        300
      ],
      "parameters": {
        "rule": {
          "interval": [
            {
              "field": "minutes",
              "minutesInterval": 1
            }
          ]
        }
      }
    },
    {
      "parameters": {
        "operation": "fromUrl",
        "url": "={{ $env.PDF_URL || 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf' }}",
        "options": {}
      },
      "id": "6a1b2c3d-0002",
      "name": "Download File",
      "type": "n8n-nodes-base.ReadWriteFile",
      "typeVersion": 1,
      "position": [
        450,
        300
      ]
    },
    {
      "parameters": {
        "method": "POST",
        "url": "={{ $env.CAAS_URL || 'http://localhost:8000' }}/convert?async=true",
        "sendFiles": true,
        "fileKey": "file",
        "options": {
          "timeout": 30000
        }
      },
      "id": "6a1b2c3d-0003",
      "name": "Submit to CAAS",
      "type": "n8n-nodes-base.HttpRequest",
      "typeVersion": 4.2,
      "position": [
        650,
        300
      ]
    },
    {
      "parameters": {
        "method": "GET",
        "url": "={{ $env.CAAS_URL || 'http://localhost:8000' }}/task/{{ $json.task_id }}",
        "options": {
          "timeout": 30000
        }
      },
      "id": "6a1b2c3d-0004",
      "name": "Poll Task Status",
      "type": "n8n-nodes-base.HttpRequest",
      "typeVersion": 4.2,
      "position": [
        850,
        300
      ]
    },
    {
      "parameters": {
        "content": "={{ $('Submit to CAAS').first().json.task_id }}",
        "assignKey": "task_id"
      },
      "id": "6a1b2c3d-0005",
      "name": "Extract Task ID",
      "type": "n8n-nodes-base.Set",
      "typeVersion": 3.4,
      "position": [
        750,
        300
      ]
    },
    {
      "parameters": {
        "conditions": {
          "string": [
            {
              "value1": "={{ $json.status }}",
              "value2": "Completed"
            }
          ]
        }
      },
      "id": "6a1b2c3d-0006",
      "name": "Task Complete?",
      "type": "n8n-nodes-base.If",
      "typeVersion": 2,
      "position": [
        1050,
        300
      ]
    },
    {
      "parameters": {
        "content": "## Async Conversion Workflow\n\nThis workflow demonstrates asynchronous conversion via CAAS:\n\n### Supported Formats:\nPDF, DOCX, ODT, ODP, ODS, HTML, XLSX, PPTX\n\n### Steps:\n1. **Download File**: Downloads a file from a URL\n2. **Submit to CAAS**: POST /convert?async=true \u2192 returns a task_id\n3. **Extract Task ID**: Extracts the task_id from the response\n4. **Poll Task Status**: GET /task/{task_id} to check progress\n5. **Task Complete?**: Condition \u2014 if \"Completed\", retrieves the result\n\n### Required Environment:\n- `CAAS_URL`: CAAS API URL (e.g., http://localhost:8000)\n- `PDF_URL`: URL of the file to convert (optional)",
        "height": 320,
        "width": 420
      },
      "id": "6a1b2c3d-0007",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.StickyNote",
      "typeVersion": 1,
      "position": [
        250,
        80
      ]
    },
    {
      "parameters": {
        "operation": "set",
        "version": 2,
        "fields": {
          "content": {
            "mode": "name",
            "name": "markdown",
            "value": "={{ $json.result.markdown }}"
          },
          "filename": {
            "mode": "name",
            "name": "original_filename",
            "value": "={{ $json.result.filename }}"
          },
          "pages": {
            "mode": "name",
            "name": "page_count",
            "value": "={{ $json.result.pages }}"
          }
        },
        "options": {}
      },
      "id": "6a1b2c3d-0008",
      "name": "Extract Markdown",
      "type": "n8n-nodes-base.Set",
      "typeVersion": 3.4,
      "position": [
        1250,
        300
      ]
    }
  ],
  "connections": {
    "Schedule Trigger": {
      "main": [
        [
          {
            "node": "Download File",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Download File": {
      "main": [
        [
          {
            "node": "Submit to CAAS",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Submit to CAAS": {
      "main": [
        [
          {
            "node": "Extract Task ID",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Task ID": {
      "main": [
        [
          {
            "node": "Poll Task Status",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Poll Task Status": {
      "main": [
        [
          {
            "node": "Task Complete?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Task Complete?": {
      "main": [
        [
          {
            "node": "Extract Markdown",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "1",
  "id": "caas-workflow-async",
  "tags": []
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

CAAS - Async Document to Markdown with Polling. Uses ReadWriteFile, HttpRequest. Scheduled trigger; 8 nodes.

Source: https://github.com/digifac/caas/blob/f2dede396d2086a32e619505a4bedf6b9aa1baf0/examples/n8n/caas-workflow-async.json — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This template runs two scheduled workflows to govern Microsoft Entra ID (Azure AD) guest accounts by detecting stale users via Microsoft Graph, staging deletions in SharePoint with a 72-hour window, n

Microsoft SharePoint, Microsoft Teams, Microsoft Entra +1
Web Scraping

Spotify-Sync-Surrealdb-V1. Uses httpRequest, n8n-nodes-surrealdb, spotify. Scheduled trigger; 62 nodes.

HTTP Request, N8N Nodes Surrealdb, Spotify
Web Scraping

As n8n instances scale, teams often lose track of sub-workflows—who uses them, where they are referenced, and whether they can be safely updated. This leads to inefficiencies like unnecessary copies o

HTTP Request, n8n, N8N Trigger +1
Web Scraping

This workflow is an improvement of this workflow by Greg Brzezinka.

HTTP Request, Email Send, XML +1
Web Scraping

N8N-Workflow-Github-Manager. Uses github, httpRequest, n8n. Scheduled trigger; 38 nodes.

GitHub, HTTP Request, n8n