This tutorial covers error handling techniques in n8n workflows, including try/catch nodes, error workflows, retry logic, dead-letter queues, alerting on failures, partial recovery, and ensuring idempotency. It's for technical users already familiar with n8n basics who are building automations and need concrete steps to implement these features.
Why this matters
In automation workflows, failures like API downtime or invalid data can halt entire processes, leading to data loss, missed notifications, or cascading errors that waste time debugging. Proper error handling in n8n prevents these issues by isolating faults, retrying transient problems, and enabling recovery, ensuring workflows remain reliable even under real-world conditions where perfection is impossible.
Step-by-step
- Open your n8n instance and create a new workflow. Drag in a
Schedule Triggernode to start the flow, then connect it to anHTTP Requestnode configured to fetch data from an API endpoint, such ashttps://api.example.com/datawith methodGET. This sets up a basic flow prone to network errors. - To add basic error catching, insert a
Switchnode after theHTTP Requestto branch on success or failure. In theSwitchnode, set Mode toRules, add a rule forBooleancondition where{{$json.error}}equalsfalsefor the success path, and route errors to a separate branch. Expect the error branch to activate if the API returns a 4xx or 5xx status. - Implement try/catch logic by wrapping risky operations. Add an
IFnode before theHTTP Requestto simulate a condition, but for true try/catch, use n8n's error workflow feature: go to Workflow Settings (click the gear icon), enableSave Manual Executions, and define an error workflow by creating a sub-workflow that triggers on errors from the main one. Link it via theOn Errorsetting in the main workflow's settings. - For retry logic, edit the
HTTP Requestnode: underOptions, enableRetry on Fail, setMax Triesto3, andWait Between Triesto5 seconds. Test by pointing to a temporarily unavailable endpoint; the node should attempt retries automatically, logging each in the execution history. - Set up a dead-letter queue for unrecoverable errors. After the error branch, add a
Setnode to prepare data with keys likeoriginalPayloadanderrorMessagefrom{{$json}}, then connect to aPostgresorMongoDBnode to insert into a dedicated error table. Use SQL likeINSERT INTO dead_letter_queue (payload, error, timestamp) VALUES ({{JSON.stringify($json.originalPayload)}}, '{{$json.errorMessage}}', NOW()); this queues failures for later review without blocking the workflow. - Add alerting on failure: in the error branch, insert a
Send EmailorSlacknode. ForSlack, select your credential, set Channel to#alerts, and Message toWorkflow failed: {{$json.errorMessage}} at {{$now}}. Configure it to trigger only on the error path; test by forcing an error to confirm the alert fires. - Enable partial recovery by using a
Mergenode after parallel branches. Have the success path proceed to aFunctionnode for processing, while the error path logs viaSetand merges back with ModeWait for All. In theFunction, add logic likeif (items[0].json.error) { return [{json: {status: 'partial', recoveredData: items[0].json}}]; }to salvage what you can. - Ensure idempotency in operations: for nodes like
HTTP Requestthat update resources, include a uniqueidempotencyKeyin headers, generated viaFunctionnode withreturn [{json: {key: $now + '-' + $execution.id}}];. Servers supporting idempotency (e.g., Stripe APIs) will ignore duplicates on retries, preventing double-charges. - Finally, test the full workflow: execute manually, introduce an error (e.g., invalid URL), and verify retries, alerting, queuing, and recovery in the executions panel. Adjust thresholds based on logs to fine-tune resilience.
Worked example
Consider a common pattern: syncing customer data from a CRM like HubSpot to a database, which can fail due to rate limits or invalid records. The workflow starts with a Schedule Trigger running daily, connected to a HubSpot node set to Get All contacts with a limit of 100. This feeds into a Loop Over Items node to process each contact individually.
Inside the loop, a Function node validates data (e.g., checks email format with if (!item.json.email.includes('@')) throw new Error('Invalid email');), followed by an HTTP Request to your database API for upserting the contact, with idempotency via a header X-Idempotency-Key: {{ $json.contactId + '-' + $execution.id }} and retry enabled for 3 attempts on 429 errors.
If validation or API fails, it routes to an error branch: a Switch node checks error type—if transient (e.g., rate limit), it waits 1 minute via Wait node and retries; otherwise, it logs to a dead-letter queue using a PostgreSQL node inserting {contactId: $json.contactId, error: $error.message, timestamp: $now}, then sends a Slack alert with details. The main path merges via Merge node, allowing partial syncs to continue (e.g., 95/100 contacts succeed). End-to-end, if 5 fail, the workflow completes with a summary Set node outputting {success: 95, failures: 5, queued: 5}, ensuring data integrity without full halts.
Common pitfalls
- Symptom: Workflows silently fail without logs, making debugging impossible. Fix: Always enable
Save Data Error Executionsin workflow settings and add aNoOpnode in error branches to force execution history capture; this ensures errors are visible in the UI for quick triage. - Symptom: Retries exacerbate issues like rate limiting, causing more bans. Fix: Use exponential backoff in a
Functionnode before retries, calculating wait time asMath.pow(2, attempt) * 1000milliseconds, and check error codes to skip retries on permanent failures like 400 Bad Request. - Symptom: Partial failures lead to inconsistent data states across systems. Fix: Implement transactions where possible (e.g., via database nodes with
autocommit: false) or use compensating actions in error branches, like a cleanupHTTP Requestto rollback changes if the main operation fails midway. - Symptom: Alerts flood channels during testing or bursts. Fix: Add deduplication in the alerting
Functionnode using a cache like Redis to track recent errors, only notifying if the same issue persists beyond a threshold (e.g., 3 in 5 minutes). - Symptom: Idempotency keys collide on high-volume workflows, causing unexpected duplicates. Fix: Generate keys with sufficient entropy, combining UUIDs via
const { v4: uuidv4 } = require('uuid'); return [{json: {key: uuidv4() + '-' + $json.id}}];in aCodenode, and store used keys temporarily to detect reuse.
Related workflows in the catalog
Explore the n8n workflow catalog for importable templates like "Error Handling with Retry and Notifications," which demonstrates try/catch with Slack alerts, or "Dead Letter Queue for API Syncs," showing PostgreSQL queuing for failed integrations. With over 14,000+ workflows available, search for "error handling" or "retry logic" to find patterns for alerting via email or partial recovery in ETL processes. These can be imported directly into your instance and customised to fit your automations.