{
  "id": "WhZiGSdO9IICm2Y5",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Multi-Site Web Scraper with Source Routing",
  "tags": [],
  "nodes": [
    {
      "id": "0bbe2f45-e7a6-485c-8fea-1df1d31239c7",
      "name": "Sticky Note - Introduction",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -416,
        368
      ],
      "parameters": {
        "width": 504,
        "height": 904,
        "content": "## Multi-Site Web Scraper with Source Routing\n\nIntelligent web scraper that routes URLs to different extraction logic based on source domain.\n\n### How it works  \n- **Trigger**: Starts manually, on a schedule (every 4 hours), or from another workflow.  \n- **Read URLs**: Fetches URLs from Google Sheets (\"URLs to Process\") with source identifiers.  \n- **Rate Limiting**: Adds a 3-second delay between requests to avoid overwhelming servers.  \n- **Source Routing**: Routes each URL to a specific extraction logic based on the source (e.g., Site A, Site B).  \n- **Extraction**: Extracts content using site-specific CSS selectors or fallback logic.  \n- **Freshness Filter**: Validates article age (defaults to 45 days), marks outdated articles as \"Outdated\".  \n- **Normalization**: Cleans and standardizes the extracted data.  \n- **Save & Log**: Saves extracted data to the \"Article Feed\" and updates URL status in Google Sheets.  \n- **Status Updates**: Tracks success or failure per URL and updates the status accordingly.\n\n### Setup steps  \n1. **Google Sheets Integration**: Connect your Google Sheets account.  \n2. **Configure Sheets**: Set sheet names for \"URLs to Process\" and \"Article Feed\".  \n3. **Customize Extraction**: Define CSS selectors for each site's extractor.  \n4. **Configure Freshness Filter**: Set the article age threshold (default: 45 days).  \n5. **Run Workflow**: Trigger manually or set a schedule to scrape data regularly.\n\n### Adding New Sources:\n1. Add a new output to the Switch node\n2. Create an HTML or Code node with site-specific selectors\n3. Connect to the Freshness Filter"
      },
      "typeVersion": 1
    },
    {
      "id": "41d85cb2-a70f-4f83-8e84-4e56467c554c",
      "name": "Sticky Note - Input",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        384,
        640
      ],
      "parameters": {
        "color": 7,
        "width": 228,
        "height": 648,
        "content": "## Reads URLs with source identifiers."
      },
      "typeVersion": 1
    },
    {
      "id": "24d476ac-54cf-4271-9464-3d4196a33b24",
      "name": "Sticky Note - Router",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1408,
        640
      ],
      "parameters": {
        "color": 7,
        "width": 296,
        "height": 644,
        "content": "## Source Router (Switch)"
      },
      "typeVersion": 1
    },
    {
      "id": "63470f2c-d372-46ca-b2f7-3a944e02bddf",
      "name": "Sticky Note - Extractors",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1856,
        448
      ],
      "parameters": {
        "color": 7,
        "width": 616,
        "height": 832,
        "content": "## Site-Specific Extractors\nCustom CSS selectors per publisher."
      },
      "typeVersion": 1
    },
    {
      "id": "3d735620-b4c4-4ab8-85a7-4921a4d80a52",
      "name": "Sticky Note - Freshness",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2640,
        640
      ],
      "parameters": {
        "color": 7,
        "width": 280,
        "height": 640,
        "content": "## Freshness Filter\nFilters articles by publication date."
      },
      "typeVersion": 1
    },
    {
      "id": "7fa04235-06b1-474b-bfa6-9d8fbbdb323d",
      "name": "Sticky Note - Output",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3280,
        640
      ],
      "parameters": {
        "color": 7,
        "width": 488,
        "height": 640,
        "content": "## Output & Status Tracking\nSaves extracted data and updates source status."
      },
      "typeVersion": 1
    },
    {
      "id": "894dc531-b7d0-44ca-b031-7c22c925adf1",
      "name": "Manual Trigger",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        144,
        928
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "df99475b-e36e-4d8b-b8e8-6e252ced9f42",
      "name": "Schedule (Every 4 Hours)",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        144,
        1120
      ],
      "parameters": {
        "rule": {
          "interval": [
            {
              "field": "hours",
              "hoursInterval": 4
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "a0b9999e-860a-4d1a-8f52-c942aa3fcc81",
      "name": "Read Pending URLs",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        448,
        1024
      ],
      "parameters": {
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "",
          "cachedResultUrl": "",
          "cachedResultName": "URLs to Process"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "",
          "cachedResultUrl": "",
          "cachedResultName": "YOUR_SPREADSHEET"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "7219c3b5-785c-4ee9-bc72-20c382d8f6d2",
      "name": "Loop Over URLs",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        720,
        1024
      ],
      "parameters": {
        "options": {
          "reset": false
        }
      },
      "typeVersion": 3
    },
    {
      "id": "72bdf7e2-bb39-4de3-bb71-634c7163d5be",
      "name": "Rate Limit (3s)",
      "type": "n8n-nodes-base.wait",
      "position": [
        944,
        800
      ],
      "parameters": {
        "amount": 3
      },
      "typeVersion": 1.1
    },
    {
      "id": "896b1bb9-4d3c-4ea5-93fd-d3a1e019c029",
      "name": "Fetch HTML",
      "type": "n8n-nodes-base.httpRequest",
      "onError": "continueRegularOutput",
      "position": [
        1184,
        800
      ],
      "parameters": {
        "url": "={{ $json.URL }}",
        "options": {
          "timeout": 30000,
          "response": {
            "response": {
              "fullResponse": true,
              "responseFormat": "text"
            }
          }
        },
        "sendHeaders": true,
        "headerParameters": {
          "parameters": [
            {
              "name": "User-Agent",
              "value": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/0.0.0.0 Safari/537.36"
            },
            {
              "name": "Accept",
              "value": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
            },
            {
              "name": "Accept-Language",
              "value": "en-US,en;q=0.5"
            }
          ]
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "aa9fac82-5076-45ac-90d1-da65faf5c206",
      "name": "Source Router",
      "type": "n8n-nodes-base.switch",
      "position": [
        1504,
        752
      ],
      "parameters": {
        "rules": {
          "values": [
            {
              "outputKey": "Site A",
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "dd700c7f-06c4-4a76-93ba-adaa51b1814e",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $('Loop Over URLs').item.json.Source }}",
                    "rightValue": "Site A"
                  }
                ]
              },
              "renameOutput": true
            },
            {
              "outputKey": "Site B",
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "6da5c75a-2057-414e-ac16-cee607861b83",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $('Loop Over URLs').item.json.Source }}",
                    "rightValue": "Site B"
                  }
                ]
              },
              "renameOutput": true
            },
            {
              "outputKey": "Site C",
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "7d437ee9-fb8a-48be-8d00-2ae46934a8eb",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $('Loop Over URLs').item.json.Source }}",
                    "rightValue": "Site C"
                  }
                ]
              },
              "renameOutput": true
            },
            {
              "outputKey": "Site D",
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "a2abaacb-55c7-4304-b92c-ef2714d10557",
                    "operator": {
                      "type": "string",
                      "operation": "equals"
                    },
                    "leftValue": "={{ $('Loop Over URLs').item.json.Source }}",
                    "rightValue": "Site D"
                  }
                ]
              },
              "renameOutput": true
            },
            {
              "outputKey": "fallback",
              "conditions": {
                "options": {
                  "version": 2,
                  "leftValue": "",
                  "caseSensitive": true,
                  "typeValidation": "strict"
                },
                "combinator": "and",
                "conditions": [
                  {
                    "id": "14362d61-d2e8-4d40-8edf-12eccbea7b00",
                    "operator": {
                      "type": "string",
                      "operation": "exists"
                    },
                    "leftValue": "={{ $('Loop Over URLs').item.json.Source }}",
                    "rightValue": ""
                  }
                ]
              },
              "renameOutput": true
            }
          ]
        },
        "options": {
          "allMatchingOutputs": false
        }
      },
      "typeVersion": 3.2
    },
    {
      "id": "4bbec03a-1f1f-4e1a-9b1d-2d14ae6620c3",
      "name": "Extract: Fallback (Universal)",
      "type": "n8n-nodes-base.code",
      "position": [
        2064,
        1136
      ],
      "parameters": {
        "jsCode": "// Universal fallback extractor for unknown sources\nconst results = [];\n\nfor (const item of items) {\n  const html = item.json.data || item.json.body || \"\";\n  const requestUrl = item.json.URL || item.json.url || \"\";\n  const source = item.json.Source || \"unknown\";\n  \n  let title = null;\n  let description = null;\n  let author = null;\n  let datePublished = null;\n  let imageUrl = null;\n  let canonicalUrl = null;\n\n  // --- TITLE ---\n  const titlePatterns = [\n    /<h1[^>]*>([\\s\\S]*?)<\\/h1>/i,\n    /<meta[^>]+property=[\"']og:title[\"'][^>]+content=[\"']([^\"']+)[\"']/i,\n    /<title[^>]*>([\\s\\S]*?)<\\/title>/i\n  ];\n  for (const pattern of titlePatterns) {\n    const match = html.match(pattern);\n    if (match) {\n      title = match[1].replace(/<[^>]+>/g, \"\").trim();\n      if (title) break;\n    }\n  }\n\n  // --- DESCRIPTION ---\n  const descPatterns = [\n    /<meta[^>]+name=[\"']description[\"'][^>]+content=[\"']([^\"']+)[\"']/i,\n    /<meta[^>]+property=[\"']og:description[\"'][^>]+content=[\"']([^\"']+)[\"']/i\n  ];\n  for (const pattern of descPatterns) {\n    const match = html.match(pattern);\n    if (match) {\n      description = match[1].trim();\n      if (description) break;\n    }\n  }\n  \n  if (!description) {\n    const paragraphs = [...html.matchAll(/<p[^>]*>([\\s\\S]*?)<\\/p>/gi)]\n      .map(m => m[1].replace(/<[^>]+>/g, \"\").trim())\n      .filter(t => t && t.length > 50);\n    if (paragraphs.length) {\n      description = paragraphs.slice(0, 2).join(\" \").substring(0, 500);\n    }\n  }\n\n  // --- AUTHOR ---\n  const authorPatterns = [\n    /<meta[^>]+name=[\"']author[\"'][^>]+content=[\"']([^\"']+)[\"']/i,\n    /by\\s+([A-Z][a-z]+\\s+[A-Z][a-z]+)/i,\n    /<a[^>]+rel=[\"']author[\"'][^>]*>([^<]+)<\\/a>/i\n  ];\n  for (const pattern of authorPatterns) {\n    const match = html.match(pattern);\n    if (match) {\n      author = match[1].trim();\n      if (author) break;\n    }\n  }\n  \n  if (!author) {\n    const ldMatch = html.match(/<script[^>]+application\\/ld\\+json[^>]*>([\\s\\S]*?)<\\/script>/i);\n    if (ldMatch) {\n      try {\n        const ld = JSON.parse(ldMatch[1]);\n        const a = ld.author;\n        if (a) {\n          author = typeof a === \"string\" ? a : (a.name || (Array.isArray(a) ? a[0].name : null));\n        }\n      } catch (e) {}\n    }\n  }\n\n  // --- DATE PUBLISHED ---\n  const datePatterns = [\n    /<time[^>]+datetime=[\"']([^\"']+)[\"']/i,\n    /<meta[^>]+property=[\"']article:published_time[\"'][^>]+content=[\"']([^\"']+)[\"']/i,\n    /(\\d{4}-\\d{2}-\\d{2})/\n  ];\n  for (const pattern of datePatterns) {\n    const match = html.match(pattern);\n    if (match) {\n      datePublished = match[1].trim();\n      if (datePublished) break;\n    }\n  }\n  \n  if (!datePublished) {\n    const ldMatch = html.match(/<script[^>]+application\\/ld\\+json[^>]*>([\\s\\S]*?)<\\/script>/i);\n    if (ldMatch) {\n      try {\n        const ld = JSON.parse(ldMatch[1]);\n        datePublished = ld.datePublished || ld.dateCreated || null;\n      } catch (e) {}\n    }\n  }\n\n  // --- IMAGE URL ---\n  const imgPatterns = [\n    /<meta[^>]+property=[\"']og:image[\"'][^>]+content=[\"']([^\"']+)[\"']/i,\n    /<meta[^>]+name=[\"']twitter:image[\"'][^>]+content=[\"']([^\"']+)[\"']/i\n  ];\n  for (const pattern of imgPatterns) {\n    const match = html.match(pattern);\n    if (match) {\n      imageUrl = match[1].trim();\n      if (imageUrl) break;\n    }\n  }\n\n  // --- CANONICAL URL ---\n  const canonicalMatch = html.match(/<link[^>]+rel=[\"']canonical[\"'][^>]+href=[\"']([^\"']+)[\"']/i);\n  if (canonicalMatch) {\n    canonicalUrl = canonicalMatch[1].trim();\n  } else {\n    const ogUrlMatch = html.match(/<meta[^>]+property=[\"']og:url[\"'][^>]+content=[\"']([^\"']+)[\"']/i);\n    if (ogUrlMatch) canonicalUrl = ogUrlMatch[1].trim();\n  }\n\n  results.push({\n    json: {\n      title,\n      description,\n      author,\n      datePublished,\n      imageUrl,\n      canonicalUrl: canonicalUrl || requestUrl,\n      source,\n      sourceUrl: requestUrl\n    }\n  });\n}\n\nreturn results;"
      },
      "typeVersion": 2
    },
    {
      "id": "63b8461f-a33a-4b58-8c67-6eea1f4ef492",
      "name": "Normalize Extracted Data",
      "type": "n8n-nodes-base.code",
      "position": [
        2336,
        784
      ],
      "parameters": {
        "jsCode": "// Normalize extracted data from site-specific extractors\nconst results = [];\n\nfor (const item of items) {\n  const input = item.json;\n  const loopData = $('Loop Over URLs').item.json;\n  \n  results.push({\n    json: {\n      title: (input.title && input.title.trim()) ? input.title.trim() : null,\n      description: (input.description && input.description.trim()) ? input.description.trim().substring(0, 1000) : null,\n      author: (input.author && input.author.trim()) ? input.author.trim() : null,\n      datePublished: (input.datePublished && input.datePublished.trim()) ? input.datePublished.trim() : null,\n      imageUrl: (input.imageUrl && input.imageUrl.trim()) ? input.imageUrl.trim() : null,\n      canonicalUrl: (input.canonicalUrl && input.canonicalUrl.trim()) ? input.canonicalUrl.trim() : loopData.URL,\n      source: loopData.Source || \"unknown\",\n      sourceUrl: loopData.URL\n    }\n  });\n}\n\nreturn results;"
      },
      "typeVersion": 2
    },
    {
      "id": "6e67b344-df32-47c5-b04e-a626b607ee30",
      "name": "Freshness Filter (45 days)",
      "type": "n8n-nodes-base.if",
      "position": [
        2736,
        912
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "loose"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "freshness-check",
              "operator": {
                "type": "boolean",
                "operation": "equals"
              },
              "leftValue": "={{ (function() {\n  var dateStr = $json.datePublished;\n  if (!dateStr) return true;\n  \n  var date;\n  if (dateStr.match(/^\\d{4}-\\d{2}-\\d{2}/)) {\n    date = new Date(dateStr);\n  } else {\n    var cleaned = dateStr.replace(/(\\d+)(st|nd|rd|th)/, '$1');\n    date = new Date(cleaned);\n  }\n  \n  if (isNaN(date.getTime())) return true;\n  \n  var cutoffDate = new Date();\n  cutoffDate.setDate(cutoffDate.getDate() - 45);\n  return date >= cutoffDate;\n})() }}",
              "rightValue": true
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "24a813c5-0431-4151-ae17-fd81d4854b44",
      "name": "Calculate Tier & Status",
      "type": "n8n-nodes-base.code",
      "position": [
        3040,
        832
      ],
      "parameters": {
        "jsCode": "// Calculate tier based on article age\nconst results = [];\n\nfor (const item of items) {\n  const input = item.json;\n  let tier = \"Unknown\";\n  let freshnessStatus = \"Fresh\";\n  \n  if (input.datePublished) {\n    const dateStr = input.datePublished;\n    let articleDate;\n    \n    if (dateStr.match(/^\\d{4}-\\d{2}-\\d{2}/)) {\n      articleDate = new Date(dateStr);\n    } else {\n      const cleaned = dateStr.replace(/(\\d+)(st|nd|rd|th)/, '$1');\n      articleDate = new Date(cleaned);\n    }\n    \n    if (!isNaN(articleDate.getTime())) {\n      const now = new Date();\n      const daysDiff = Math.floor((now - articleDate) / (1000 * 60 * 60 * 24));\n      \n      if (daysDiff <= 7) {\n        tier = \"Tier 1\";\n        freshnessStatus = \"Fresh\";\n      } else if (daysDiff <= 14) {\n        tier = \"Tier 2\";\n        freshnessStatus = \"Fresh\";\n      } else if (daysDiff <= 30) {\n        tier = \"Tier 3\";\n        freshnessStatus = \"Fresh\";\n      } else {\n        tier = \"Archive\";\n        freshnessStatus = \"Fresh\";\n      }\n    }\n  }\n  \n  results.push({\n    json: {\n      ...input,\n      tier,\n      freshnessStatus,\n      extractedAt: new Date().toISOString()\n    }\n  });\n}\n\nreturn results;"
      },
      "typeVersion": 2
    },
    {
      "id": "6641b358-6d38-4a9a-8bb3-1e00c83b12f7",
      "name": "Mark as Outdated",
      "type": "n8n-nodes-base.set",
      "position": [
        3392,
        1120
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "status-outdated",
              "name": "freshnessStatus",
              "type": "string",
              "value": "Outdated"
            },
            {
              "id": "reason",
              "name": "reason",
              "type": "string",
              "value": "Article older than 45 days"
            },
            {
              "id": "sourceUrl",
              "name": "sourceUrl",
              "type": "string",
              "value": "={{ $json.sourceUrl }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "e43b32e6-0b16-4b0c-a51f-8684d6509817",
      "name": "Save to Article Feed",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3392,
        832
      ],
      "parameters": {
        "operation": "appendOrUpdate",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "",
          "cachedResultUrl": "",
          "cachedResultName": "Article Feed"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "",
          "cachedResultUrl": "",
          "cachedResultName": "YOUR_SPREADSHEET"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "9e0a7cac-7b50-4c9e-a1fb-7ce47a9b1355",
      "name": "Update URL Status",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        3600,
        1120
      ],
      "parameters": {
        "operation": "update",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "",
          "cachedResultUrl": "",
          "cachedResultName": "URLs to Process"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "",
          "cachedResultUrl": "",
          "cachedResultName": "YOUR_SPREADSHEET"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "af00d978-f194-4a53-a5ec-49861ea66452",
      "name": "Completion Summary",
      "type": "n8n-nodes-base.set",
      "position": [
        944,
        1104
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "count",
              "name": "articlesProcessed",
              "type": "number",
              "value": "={{ $items().length }}"
            },
            {
              "id": "timestamp",
              "name": "completedAt",
              "type": "string",
              "value": "={{ $now.toISO() }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "ec75d39d-4457-4e09-9163-9e4169b0dfdc",
      "name": "Extract: Site B",
      "type": "n8n-nodes-base.html",
      "onError": "continueRegularOutput",
      "position": [
        2064,
        704
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "title",
              "cssSelector": "h1, article h1"
            },
            {
              "key": "description",
              "attribute": "content",
              "cssSelector": "meta[name='description']",
              "returnValue": "attribute"
            },
            {
              "key": "author",
              "cssSelector": "a[data-testid='authorName'], .author-info a, a[rel='author']"
            },
            {
              "key": "datePublished",
              "cssSelector": "span[data-testid='storyPublishDate'], time[datetime]"
            },
            {
              "key": "imageUrl",
              "attribute": "content",
              "cssSelector": "meta[property='og:image']",
              "returnValue": "attribute"
            },
            {
              "key": "canonicalUrl",
              "attribute": "href",
              "cssSelector": "link[rel='canonical']",
              "returnValue": "attribute"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "91ad55da-5b9d-4f47-b622-2cc674247b47",
      "name": "Extract: Site C",
      "type": "n8n-nodes-base.html",
      "onError": "continueRegularOutput",
      "position": [
        2064,
        848
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "title",
              "cssSelector": "h1.entry-title, .post-title, h1.wp-block-post-title"
            },
            {
              "key": "description",
              "attribute": "content",
              "cssSelector": "meta[name='description']",
              "returnValue": "attribute"
            },
            {
              "key": "author",
              "cssSelector": ".author-name, .byline a, span.author a, .entry-author a"
            },
            {
              "key": "datePublished",
              "attribute": "datetime",
              "cssSelector": "time.entry-date, .post-date time, time[datetime]",
              "returnValue": "attribute"
            },
            {
              "key": "imageUrl",
              "attribute": "content",
              "cssSelector": "meta[property='og:image']",
              "returnValue": "attribute"
            },
            {
              "key": "canonicalUrl",
              "attribute": "href",
              "cssSelector": "link[rel='canonical']",
              "returnValue": "attribute"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "5e9d8787-971b-45b1-8b4d-450c59e4e5c3",
      "name": "Extract: Site D",
      "type": "n8n-nodes-base.html",
      "onError": "continueRegularOutput",
      "position": [
        2064,
        992
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "title",
              "cssSelector": "#hs_cos_wrapper_name, h1.blog-post__title, .post-header h1"
            },
            {
              "key": "description",
              "cssSelector": "#hs_cos_wrapper_post_body p:first-of-type, meta[name='description']"
            },
            {
              "key": "author",
              "cssSelector": "p[data-hubspot-name='Blog Author'] a, .author-info a, .blog-author__name"
            },
            {
              "key": "datePublished",
              "cssSelector": "span.blog--single--meta--date, time[datetime], .post-date"
            },
            {
              "key": "imageUrl",
              "attribute": "content",
              "cssSelector": "meta[property='og:image']",
              "returnValue": "attribute"
            },
            {
              "key": "canonicalUrl",
              "attribute": "href",
              "cssSelector": "link[rel='canonical']",
              "returnValue": "attribute"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "cdd20c9c-cd91-4ccf-8e6e-7ea95e4289ab",
      "name": "Sticky Note - Input1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1120,
        640
      ],
      "parameters": {
        "color": 7,
        "width": 246,
        "height": 648,
        "content": "## Fetch the HTML content"
      },
      "typeVersion": 1
    },
    {
      "id": "bd580a4c-9c60-4b1c-8f55-f2d11973fc79",
      "name": "Extract: Site A",
      "type": "n8n-nodes-base.html",
      "onError": "continueRegularOutput",
      "position": [
        2064,
        560
      ],
      "parameters": {
        "options": {},
        "operation": "extractHtmlContent",
        "extractionValues": {
          "values": [
            {
              "key": "title",
              "cssSelector": "h1.article__title, h1[data-testid='ContentHeader'], .post-title h1"
            },
            {
              "key": "description",
              "attribute": "content",
              "cssSelector": "meta[name='description']",
              "returnValue": "attribute"
            },
            {
              "key": "author",
              "cssSelector": ".article__byline a, a[rel='author'], .author-card__name"
            },
            {
              "key": "datePublished",
              "attribute": "datetime",
              "cssSelector": "time[datetime]",
              "returnValue": "attribute"
            },
            {
              "key": "imageUrl",
              "attribute": "content",
              "cssSelector": "meta[property='og:image']",
              "returnValue": "attribute"
            },
            {
              "key": "canonicalUrl",
              "attribute": "href",
              "cssSelector": "link[rel='canonical']",
              "returnValue": "attribute"
            }
          ]
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "bb20578e-1c3d-4848-b857-3082264298c7",
      "name": "Sticky Note - Freshness1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2944,
        640
      ],
      "parameters": {
        "color": 7,
        "width": 280,
        "height": 640,
        "content": "## Tier Status\nCalculate Tier Status based on content "
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "87886f32-8c5a-4dcd-93aa-2e2109b99cee",
  "connections": {
    "Fetch HTML": {
      "main": [
        [
          {
            "node": "Source Router",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Source Router": {
      "main": [
        [
          {
            "node": "Extract: Site A",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Extract: Site B",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Extract: Site C",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Extract: Site D",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Extract: Fallback (Universal)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Loop Over URLs": {
      "main": [
        [
          {
            "node": "Completion Summary",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Rate Limit (3s)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Manual Trigger": {
      "main": [
        [
          {
            "node": "Read Pending URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract: Site A": {
      "main": [
        [
          {
            "node": "Normalize Extracted Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract: Site B": {
      "main": [
        [
          {
            "node": "Normalize Extracted Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract: Site C": {
      "main": [
        [
          {
            "node": "Normalize Extracted Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract: Site D": {
      "main": [
        [
          {
            "node": "Normalize Extracted Data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Rate Limit (3s)": {
      "main": [
        [
          {
            "node": "Fetch HTML",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Mark as Outdated": {
      "main": [
        [
          {
            "node": "Update URL Status",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Read Pending URLs": {
      "main": [
        [
          {
            "node": "Loop Over URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Update URL Status": {
      "main": [
        [
          {
            "node": "Loop Over URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Save to Article Feed": {
      "main": [
        [
          {
            "node": "Update URL Status",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Calculate Tier & Status": {
      "main": [
        [
          {
            "node": "Save to Article Feed",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Normalize Extracted Data": {
      "main": [
        [
          {
            "node": "Freshness Filter (45 days)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Schedule (Every 4 Hours)": {
      "main": [
        [
          {
            "node": "Read Pending URLs",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Freshness Filter (45 days)": {
      "main": [
        [
          {
            "node": "Calculate Tier & Status",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Mark as Outdated",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract: Fallback (Universal)": {
      "main": [
        [
          {
            "node": "Freshness Filter (45 days)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}