AutomationFlowsWeb Scraping › PDF Upload to Embeddings Workflow

PDF Upload to Embeddings Workflow

Original n8n title: Wf2 - Upload Manual | Jurisai

WF2 - Upload Manual | JurisAI. Uses httpRequest, emailSend. Webhook trigger; 15 nodes.

Webhook trigger★★★★☆ complexity15 nodesHTTP RequestEmail Send
Web Scraping Trigger: Webhook Nodes: 15 Complexity: ★★★★☆ Added:
PDF Upload to Embeddings Workflow — n8n workflow card showing HTTP Request, Email Send integration

This workflow follows the Emailsend → HTTP Request recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "name": "WF2 - Upload Manual | JurisAI",
  "nodes": [
    {
      "id": "sticky_wf2",
      "name": "\ud83d\udccb WF2 - Upload Manual",
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [
        -220,
        -100
      ],
      "parameters": {
        "content": "## WF2 \u2014 Upload Manual de Documentos\n\n**Endpoint:** `POST /webhook/jurisai/upload`  \n**Roles Permitidos:** `curador`, `admin`  \n**Content-Type:** `multipart/form-data` ou JSON com base64\n\n**Campos esperados:**\n- `file`: arquivo PDF/texto (base64 ou bin\u00e1rio)\n- `doc_type`: tipo do documento\n- `legal_area[]`: \u00e1reas jur\u00eddicas\n- `territory`: escopo territorial\n- `uf`: UF (se aplic\u00e1vel)\n- `issuer`: \u00f3rg\u00e3o emissor\n- `published_at`: data de publica\u00e7\u00e3o\n- `source_url`: URL da fonte original\n\n**Fluxo:**\n1. JWT + role check\n2. Extra\u00e7\u00e3o de texto do PDF\n3. Chunking (1500 chars, overlap 200)\n4. Embeddings OpenAI (text-embedding-3-large)\n5. Upsert no Qdrant\n6. INSERT no Supabase (status=pendente_validacao)\n7. Notificar curador\n\n\u2699\ufe0f **Ap\u00f3s importar:** Configure credenciais SMTP para e-mail.",
        "height": 360,
        "width": 440,
        "color": 5
      }
    },
    {
      "id": "webhook_wf2",
      "name": "Webhook Upload",
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 2,
      "position": [
        250,
        300
      ],
      "parameters": {
        "path": "jurisai/upload",
        "httpMethod": "POST",
        "responseMode": "responseNode",
        "options": {
          "allowedOrigins": "*"
        }
      }
    },
    {
      "id": "validate_jwt_wf2",
      "name": "Validar JWT e Role",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        500,
        300
      ],
      "parameters": {
        "jsCode": "const authHeader = $('Webhook Upload').first().json.headers?.authorization || '';\nconst token = authHeader.replace(/^Bearer\\s+/i, '').trim();\n\nif (!token) throw new Error('401:Token de autentica\u00e7\u00e3o n\u00e3o fornecido');\n\ntry {\n  const parts = token.split('.');\n  if (parts.length !== 3) throw new Error('401:Token JWT inv\u00e1lido');\n  \n  const payloadB64 = parts[1].replace(/-/g, '+').replace(/_/g, '/');\n  const padded = payloadB64 + '='.repeat((4 - (payloadB64.length % 4)) % 4);\n  const payload = JSON.parse(Buffer.from(padded, 'base64').toString('utf8'));\n  \n  if (payload.exp && payload.exp < Math.floor(Date.now() / 1000)) {\n    throw new Error('401:Token expirado');\n  }\n  \n  const userId = payload.sub;\n  const role = payload.user_metadata?.role || payload.role || 'estagiario';\n  const allowedRoles = ['curador', 'admin'];\n  \n  if (!allowedRoles.includes(role)) {\n    throw new Error('403:Acesso restrito a curadores e administradores');\n  }\n  \n  const body = $('Webhook Upload').first().json.body || {};\n  \n  if (!body.doc_type) throw new Error('400:Campo doc_type \u00e9 obrigat\u00f3rio');\n  if (!body.issuer) throw new Error('400:Campo issuer \u00e9 obrigat\u00f3rio');\n  if (!body.published_at) throw new Error('400:Campo published_at \u00e9 obrigat\u00f3rio');\n  \n  const validDocTypes = ['lei', 'decreto', 'portaria', 'resolucao', 'instrucao_normativa', 'acordao', 'sumula', 'parecer_autoral', 'nota_tecnica', 'doutrina'];\n  if (!validDocTypes.includes(body.doc_type)) {\n    throw new Error('400:doc_type inv\u00e1lido. Valores aceitos: ' + validDocTypes.join(', '));\n  }\n\n  return [{ json: { user_id: userId, role, ...body } }];\n} catch(e) {\n  const msg = e.message || 'Erro desconhecido';\n  if (msg.startsWith('400:') || msg.startsWith('401:') || msg.startsWith('403:')) {\n    throw new Error(msg);\n  }\n  throw new Error('401:Falha na valida\u00e7\u00e3o do token');\n}"
      }
    },
    {
      "id": "extract_text",
      "name": "Extrair Texto do PDF",
      "type": "n8n-nodes-base.extractFromFile",
      "typeVersion": 1,
      "position": [
        750,
        300
      ],
      "parameters": {
        "operation": "pdf",
        "binaryPropertyName": "data",
        "options": {}
      },
      "continueOnFail": true
    },
    {
      "id": "check_extraction",
      "name": "Extra\u00e7\u00e3o OK?",
      "type": "n8n-nodes-base.if",
      "typeVersion": 2,
      "position": [
        1000,
        300
      ],
      "parameters": {
        "conditions": {
          "options": {
            "caseSensitive": false,
            "leftValue": "",
            "typeValidation": "strict"
          },
          "conditions": [
            {
              "id": "cond_text",
              "leftValue": "={{ $json.text || $json.body?.text || '' }}",
              "rightValue": "",
              "operator": {
                "type": "string",
                "operation": "isNotEmpty"
              }
            }
          ],
          "combinator": "and"
        }
      }
    },
    {
      "id": "use_body_text",
      "name": "Usar Texto do Body",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        1250,
        400
      ],
      "parameters": {
        "jsCode": "// Fallback: usar texto enviado no body JSON\nconst body = $('Validar JWT e Role').first().json;\nconst text = body.text || body.content || '';\nif (!text || text.length < 50) {\n  throw new Error('400:Nenhum texto p\u00f4de ser extra\u00eddo. Envie um PDF v\u00e1lido ou inclua o campo text no body.');\n}\nreturn [{ json: { ...body, extractedText: text } }];"
      }
    },
    {
      "id": "chunk_text",
      "name": "Chunking do Texto",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        1500,
        300
      ],
      "parameters": {
        "jsCode": "const meta = $('Validar JWT e Role').first().json;\nconst text = $json.text || $json.extractedText || '';\n\nconst CHUNK_SIZE = 1500;\nconst OVERLAP = 200;\nconst chunks = [];\n\nlet start = 0;\nwhile (start < text.length) {\n  const end = Math.min(start + CHUNK_SIZE, text.length);\n  const chunk = text.slice(start, end).trim();\n  if (chunk.length > 100) {\n    chunks.push({\n      text: chunk,\n      chunk_index: chunks.length,\n      char_start: start,\n      char_end: end\n    });\n  }\n  start += CHUNK_SIZE - OVERLAP;\n}\n\nif (chunks.length === 0) throw new Error('400:Texto muito curto para processamento');\n\n// Gerar ID \u00fanico para o documento\nconst docId = `upload_${Date.now()}_${Math.random().toString(36).slice(2, 9)}`;\n\nreturn chunks.map(chunk => ({\n  json: {\n    doc_id: docId,\n    user_id: meta.user_id,\n    doc_type: meta.doc_type,\n    legal_area: Array.isArray(meta.legal_area) ? meta.legal_area : [meta.legal_area || 'administrativo'],\n    territory: meta.territory || 'federal',\n    uf: meta.uf || null,\n    issuer: meta.issuer,\n    published_at: meta.published_at,\n    source_url: meta.source_url || null,\n    title: meta.title || `Documento ${meta.doc_type} - ${meta.issuer}`,\n    total_chunks: chunks.length,\n    ...chunk\n  }\n}));"
      }
    },
    {
      "id": "embed_chunks",
      "name": "Gerar Embeddings (OpenAI)",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        1750,
        300
      ],
      "parameters": {
        "method": "POST",
        "url": "https://api.openai.com/v1/embeddings",
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "Authorization",
              "value": "=Bearer {{ $env.OPENAI_API_KEY }}"
            },
            {
              "name": "Content-Type",
              "value": "application/json"
            }
          ]
        },
        "sendBody": true,
        "contentType": "json",
        "body": "={{ JSON.stringify({ model: $env.EMBEDDING_MODEL, input: $json.text }) }}"
      }
    },
    {
      "id": "upsert_qdrant",
      "name": "Upsert no Qdrant",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        2000,
        300
      ],
      "parameters": {
        "method": "PUT",
        "url": "={{ $env.QDRANT_URL }}/collections/{{ $env.QDRANT_COLLECTION }}/points",
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "api-key",
              "value": "={{ $env.QDRANT_API_KEY }}"
            },
            {
              "name": "Content-Type",
              "value": "application/json"
            }
          ]
        },
        "sendBody": true,
        "contentType": "json",
        "body": "={{ JSON.stringify({ points: [{ id: $('Chunking do Texto').first().json.doc_id + '_chunk_' + $json.chunk_index, vector: $json.data?.[0]?.embedding || [], payload: { document_id: $('Chunking do Texto').first().json.doc_id, chunk_index: $json.chunk_index, text: $('Chunking do Texto').first().json.text, doc_type: $('Chunking do Texto').first().json.doc_type, legal_area: $('Chunking do Texto').first().json.legal_area, territory: $('Chunking do Texto').first().json.territory, uf: $('Chunking do Texto').first().json.uf, issuer: $('Chunking do Texto').first().json.issuer, published_at: $('Chunking do Texto').first().json.published_at, status: 'pendente_validacao' } }] }) }}"
      }
    },
    {
      "id": "aggregate_chunks",
      "name": "Agregar Resultados",
      "type": "n8n-nodes-base.aggregate",
      "typeVersion": 1,
      "position": [
        2250,
        300
      ],
      "parameters": {
        "aggregate": "aggregateIndividualFields",
        "fieldsToAggregate": {
          "fieldToAggregate": [
            {
              "fieldToAggregate": "doc_id",
              "renameField": false
            }
          ]
        },
        "options": {
          "mergeLists": false
        }
      }
    },
    {
      "id": "insert_supabase_doc",
      "name": "Registrar no Supabase",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.2,
      "position": [
        2500,
        300
      ],
      "parameters": {
        "method": "POST",
        "url": "={{ $env.SUPABASE_URL }}/rest/v1/documents",
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "apikey",
              "value": "={{ $env.SUPABASE_SERVICE_KEY }}"
            },
            {
              "name": "Authorization",
              "value": "=Bearer {{ $env.SUPABASE_SERVICE_KEY }}"
            },
            {
              "name": "Content-Type",
              "value": "application/json"
            },
            {
              "name": "Prefer",
              "value": "return=representation"
            }
          ]
        },
        "sendBody": true,
        "contentType": "json",
        "body": "={{ (() => { const meta = $('Validar JWT e Role').first().json; const chunkData = $('Chunking do Texto').first().json; return JSON.stringify({ id: chunkData.doc_id, title: chunkData.title, doc_type: chunkData.doc_type, legal_area: chunkData.legal_area, territory: chunkData.territory, uf: chunkData.uf, issuer: chunkData.issuer, published_at: chunkData.published_at, source_url: chunkData.source_url, status: 'pendente_validacao', uploaded_by: meta.user_id, ingested_at: new Date().toISOString() }); })() }}"
      },
      "continueOnFail": true
    },
    {
      "id": "notify_upload",
      "name": "Notificar Curador (Upload)",
      "type": "n8n-nodes-base.emailSend",
      "typeVersion": 2.1,
      "position": [
        2750,
        300
      ],
      "parameters": {
        "fromEmail": "jurisai@{{ $env.ORG_NAME || 'jurisai.app' }}",
        "toEmail": "={{ $env.CURATOR_EMAIL || 'curador@suaorganizacao.com' }}",
        "subject": "[JurisAI] Novo documento aguarda valida\u00e7\u00e3o",
        "emailType": "text",
        "message": "=Novo Documento \u2014 Pendente de Valida\u00e7\u00e3o\n\nT\u00edtulo: {{ $('Chunking do Texto').first().json.title }}\nTipo: {{ $('Chunking do Texto').first().json.doc_type }}\nEmissor: {{ $('Chunking do Texto').first().json.issuer }}\nPublicado em: {{ $('Chunking do Texto').first().json.published_at }}\nEnviado por: {{ $('Validar JWT e Role').first().json.user_id }}\n\nAcesse o painel para revisar e aprovar.\n\n---\nJurisAI - {{ $env.ORG_NAME }}"
      },
      "credentials": {
        "smtp": {
          "name": "<your credential>"
        }
      },
      "continueOnFail": true
    },
    {
      "id": "respond_success_wf2",
      "name": "Responder Sucesso",
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1,
      "position": [
        3000,
        300
      ],
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ JSON.stringify({ success: true, message: 'Documento enviado com sucesso e aguarda valida\u00e7\u00e3o', doc_id: $('Chunking do Texto').first().json.doc_id, status: 'pendente_validacao', chunks_processed: $('Chunking do Texto').all().length }) }}",
        "options": {
          "responseCode": 202
        }
      }
    },
    {
      "id": "error_handler_wf2",
      "name": "Tratar Erro",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        750,
        500
      ],
      "parameters": {
        "jsCode": "const error = $input.first().json.error?.message || 'Erro interno';\nconst match = error.match(/^(\\d{3}):(.*)/);\nconst code = match ? parseInt(match[1]) : 500;\nconst msg = match ? match[2] : error;\nreturn [{ json: { statusCode: code, message: msg } }];"
      }
    },
    {
      "id": "respond_error_wf2",
      "name": "Responder Erro",
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1,
      "position": [
        1000,
        500
      ],
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ JSON.stringify({ success: false, error: $json.message }) }}",
        "options": {
          "responseCode": "={{ $json.statusCode }}"
        }
      }
    }
  ],
  "connections": {
    "Webhook Upload": {
      "main": [
        [
          {
            "node": "Validar JWT e Role",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Validar JWT e Role": {
      "main": [
        [
          {
            "node": "Extrair Texto do PDF",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extrair Texto do PDF": {
      "main": [
        [
          {
            "node": "Extra\u00e7\u00e3o OK?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extra\u00e7\u00e3o OK?": {
      "main": [
        [
          {
            "node": "Chunking do Texto",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Usar Texto do Body",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Usar Texto do Body": {
      "main": [
        [
          {
            "node": "Chunking do Texto",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Chunking do Texto": {
      "main": [
        [
          {
            "node": "Gerar Embeddings (OpenAI)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Gerar Embeddings (OpenAI)": {
      "main": [
        [
          {
            "node": "Upsert no Qdrant",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Upsert no Qdrant": {
      "main": [
        [
          {
            "node": "Agregar Resultados",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Agregar Resultados": {
      "main": [
        [
          {
            "node": "Registrar no Supabase",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Registrar no Supabase": {
      "main": [
        [
          {
            "node": "Notificar Curador (Upload)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Notificar Curador (Upload)": {
      "main": [
        [
          {
            "node": "Responder Sucesso",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {
    "executionOrder": "v1",
    "saveManualExecutions": false,
    "errorWorkflow": ""
  },
  "meta": {
    "templateCredsSetupCompleted": false
  },
  "tags": [
    {
      "id": "jurisai",
      "name": "JurisAI"
    },
    {
      "id": "upload",
      "name": "upload-manual"
    }
  ]
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

WF2 - Upload Manual | JurisAI. Uses httpRequest, emailSend. Webhook trigger; 15 nodes.

Source: https://github.com/JeffersonMFti/JurisAI/blob/f883fa2f162a02b1a2aa83da7d6d5f8fed2964b3/workflows/WF2_upload_manual.json — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

세미나 데모 용 워크플로우. Uses httpRequest, emailSend. Webhook trigger; 17 nodes.

HTTP Request, Email Send
Web Scraping

worklow_doc. Uses httpRequest, readBinaryFile, n8n-nodes-docxtemplater, emailSend. Webhook trigger; 15 nodes.

HTTP Request, Read Binary File, N8N Nodes Docxtemplater +1
Web Scraping

Deliver personalized files instantly after PayPal transactions using n8n – without writing a single backend line.

HTTP Request, Email Send
Web Scraping

This workflow automates real-time student tracking using iOS Shortcuts and geolocation data, notifying both teachers and parents based on geofenced logic.

HTTP Request, Email Send
Web Scraping

This is the workflow that I presented at the April 9, 2021 n8n Meetup.

Move Binary Data, HTTP Request, Email Send