AutomationFlowsWeb Scraping › Cleanup Duplicates

Cleanup Duplicates

Cleanup Duplicates. Uses httpRequest. Webhook trigger; 6 nodes.

Webhook trigger★★★★☆ complexity6 nodesHTTP Request
Web Scraping Trigger: Webhook Nodes: 6 Complexity: ★★★★☆ Added:

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "name": "Cleanup Duplicates",
  "nodes": [
    {
      "parameters": {
        "httpMethod": "POST",
        "path": "cleanup-duplicates",
        "responseMode": "responseNode"
      },
      "id": "webhook-trigger",
      "name": "Webhook Trigger",
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 1.1,
      "position": [
        50,
        300
      ]
    },
    {
      "parameters": {
        "method": "POST",
        "url": "http://host.docker.internal:6333/collections/successful_schemas/points/scroll",
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={{ {\n  limit: 1000,\n  with_payload: true\n} }}",
        "options": {}
      },
      "id": "get-all-points",
      "name": "Get All Points",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.1,
      "position": [
        250,
        300
      ]
    },
    {
      "parameters": {
        "jsCode": "// Find and identify duplicate domains\nconst data = $input.item.json;\nconst points = data.result.points;\n\n// Group points by domain name\nconst domainGroups = {};\npoints.forEach(point => {\n  const domain = point.payload.user_prompt;\n  if (!domainGroups[domain]) {\n    domainGroups[domain] = [];\n  }\n  domainGroups[domain].push(point);\n});\n\n// Find duplicates (domains with more than 1 entry)\nconst duplicates = {};\nconst toDelete = [];\nconst toKeep = [];\n\nObject.keys(domainGroups).forEach(domain => {\n  const group = domainGroups[domain];\n  if (group.length > 1) {\n    duplicates[domain] = group.length;\n    \n    // Sort by timestamp (newest first) and keep the most recent\n    group.sort((a, b) => new Date(b.payload.timestamp) - new Date(a.payload.timestamp));\n    \n    // Keep the first (newest) one\n    toKeep.push(group[0]);\n    \n    // Mark the rest for deletion\n    for (let i = 1; i < group.length; i++) {\n      toDelete.push(group[i].id);\n    }\n  } else {\n    // Single entry, keep it\n    toKeep.push(group[0]);\n  }\n});\n\nreturn [{\n  json: {\n    totalPoints: points.length,\n    uniqueDomains: Object.keys(domainGroups).length,\n    duplicateDomains: Object.keys(duplicates),\n    duplicateCounts: duplicates,\n    toDelete: toDelete,\n    toKeep: toKeep,\n    deleteCount: toDelete.length,\n    keepCount: toKeep.length\n  }\n}];"
      },
      "id": "find-duplicates",
      "name": "Find Duplicates",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        450,
        300
      ]
    },
    {
      "parameters": {
        "method": "POST",
        "url": "http://host.docker.internal:6333/collections/successful_schemas/points/delete",
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={{ {\n  points: $json.toDelete\n} }}",
        "options": {}
      },
      "id": "delete-duplicates",
      "name": "Delete Duplicates",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.1,
      "position": [
        650,
        300
      ]
    },
    {
      "parameters": {
        "jsCode": "const processedData = $('Find Duplicates').item.json;\nconst deleteResult = $input.item.json;\n\nreturn [{\n  json: {\n    success: true,\n    message: 'Cleanup completed successfully',\n    totalPoints: processedData.totalPoints,\n    uniqueDomains: processedData.uniqueDomains,\n    duplicateDomains: processedData.duplicateDomains,\n    duplicateCounts: processedData.duplicateCounts,\n    deletedCount: processedData.deleteCount,\n    keptCount: processedData.keepCount,\n    timestamp: new Date().toISOString()\n  }\n}];"
      },
      "id": "summary",
      "name": "Summary",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [
        850,
        300
      ]
    },
    {
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ $('Summary').item.json }}",
        "options": {}
      },
      "id": "respond-to-webhook",
      "name": "Respond to Webhook",
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1,
      "position": [
        1050,
        300
      ]
    }
  ],
  "connections": {
    "Webhook Trigger": {
      "main": [
        [
          {
            "node": "Get All Points",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Get All Points": {
      "main": [
        [
          {
            "node": "Find Duplicates",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Find Duplicates": {
      "main": [
        [
          {
            "node": "Delete Duplicates",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Delete Duplicates": {
      "main": [
        [
          {
            "node": "Summary",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Summary": {
      "main": [
        [
          {
            "node": "Respond to Webhook",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {},
  "versionId": "1"
}
Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

Cleanup Duplicates. Uses httpRequest. Webhook trigger; 6 nodes.

Source: https://github.com/shazily/syntheticdatagen/blob/821a0a45e9851752555869c9d8cec0ec580287d5/n8n-workflows/cleanup-duplicates.json — original creator credit. Request a take-down →

More Web Scraping workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

Web Scraping

This n8n template provides enterprise-level version control for your workflows using GitHub integration. Stop losing hours to broken workflows and manual exports – get proper commit history, visual di

n8n, Execute Workflow Trigger, HTTP Request +1
Web Scraping

This flow creates dummy files for every item added in your *Arrs (Radarr/Sonarr) with the tag .

HTTP Request, Ssh
Web Scraping

This workflow acts as a central API gateway for all technical indicator agents in the Binance Spot Market Quant AI system. It listens for incoming webhook requests and dynamically routes them to the c

HTTP Request
Web Scraping

Sign PDF documents with legally-compliant digital signatures using X.509 certificates. Supports multiple PAdES signature levels (B, T, LT, LTA) with optional visible stamps.

Execute Command, HTTP Request, Read Write File +1
Web Scraping

📡 This workflow serves as the central Alpha Vantage API fetcher for Tesla trading indicators, delivering cleaned 20-point JSON outputs for three timeframes: , , and . It is required by the following a

HTTP Request