{
  "id": "E7W5wh1CRWDsRWWN",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Reddit Industry Digest with ScrapeOps and Google Sheets",
  "tags": [
    {
      "id": "EYCu0h4UjINqJjNC",
      "name": "Industry Digest",
      "createdAt": "2026-03-10T07:51:48.024Z",
      "updatedAt": "2026-03-10T07:51:48.024Z"
    },
    {
      "id": "jiBunnRJASi9V3kB",
      "name": "Reddit Scraper",
      "createdAt": "2026-03-10T07:51:32.325Z",
      "updatedAt": "2026-03-10T07:51:32.325Z"
    },
    {
      "id": "lZKSh2IoxHklnOUw",
      "name": "ScrapeOps",
      "createdAt": "2025-10-20T20:27:13.410Z",
      "updatedAt": "2025-10-20T20:27:13.410Z"
    },
    {
      "id": "oSx6yEPAYlnqqBNC",
      "name": "Weekly Newsletter Automa",
      "createdAt": "2026-03-10T07:51:54.577Z",
      "updatedAt": "2026-03-10T07:51:54.577Z"
    },
    {
      "id": "yzylwxvLF3YwGBRm",
      "name": "Google Sheets Automation",
      "createdAt": "2026-03-10T07:03:25.329Z",
      "updatedAt": "2026-03-10T07:03:25.329Z"
    }
  ],
  "nodes": [
    {
      "id": "db080f2b-cb69-4222-84a1-691b012d6bde",
      "name": "Overview (Sticky)",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        0,
        -176
      ],
      "parameters": {
        "width": 600,
        "height": 904,
        "content": "# \ud83d\udcf0 Reddit Industry Digest (Weekly) \u2192 Google Sheets\n\nThis workflow builds a weekly industry digest by collecting top posts from selected subreddits \u2014 no Reddit API needed. It scrapes public Reddit pages via **ScrapeOps Proxy**, enriches each post with full text using Reddit's JSON endpoint, deduplicates against your Google Sheet, and generates a weekly summary that can optionally be emailed.\n\n### How it works\n1. \u23f0 **Weekly Schedule Trigger** fires automatically once a week.\n2. \u2699\ufe0f **Configure Subreddits & Week Range** sets the subreddit list, week range, and Sheet IDs.\n3. \ud83d\udce6 **Split Subreddits Into Batches** processes each subreddit one at a time.\n4. \ud83c\udf10 **ScrapeOps: Fetch Subreddit Listing** scrapes the top-of-week page from `old.reddit.com`.\n5. \u23f3 **Polite Delay** adds a 1\u20133s pause between requests.\n6. \ud83d\udd0d **Parse Listing HTML** extracts title, URL, score, comments, author, and timestamps.\n7. \ud83d\udce1 **ScrapeOps: Fetch Post Details** retrieves each post as JSON to extract `selftext`.\n8. \ud83d\udd00 **Merge & Normalize** combines listing data with post body text into a final record.\n9. \ud83e\uddf9 **Deduplicate New Posts** filters posts already in the Sheet by hash and URL.\n10. \ud83d\udcbe **Append New Posts** saves only new posts to the `posts` tab.\n11. \ud83d\udcca **Build Weekly Digest** generates topic clusters and top post summaries.\n12. \ud83d\udce7 **Send Digest Email** optionally emails the weekly summary.\n\n### Setup steps\n- Register for a free ScrapeOps API key: https://scrapeops.io/app/register/n8n\n- Add ScrapeOps credentials in n8n. Docs: https://scrapeops.io/docs/n8n/overview/\n- Duplicate [this sheet](https://docs.google.com/spreadsheets/d/1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI/edit?usp=sharing) to copy Columns and Spreadsheet ID.\n- Connect Google Sheets and set your Spreadsheet ID in the Sheet nodes.\n- Update your subreddit list in **Configure Subreddits & Week Range**.\n- Optional: enable **Send Digest Email** and configure credentials.\n\n### Customization\n- Add or remove subreddits in the configure node.\n- Change timeframe from `week` to `month` in the fetch URL.\n- Add a Slack node to post the digest to a channel."
      },
      "typeVersion": 1
    },
    {
      "id": "6b2932d3-1c23-42d1-8d12-f6bb66f2442e",
      "name": "Section: Trigger & Inputs",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        656,
        48
      ],
      "parameters": {
        "color": 7,
        "width": 436,
        "height": 344,
        "content": "## 1. Trigger & Configuration\nFires weekly and sets runtime config \u2014 subreddit list, week range, batch size, and Google Sheet IDs."
      },
      "typeVersion": 1
    },
    {
      "id": "2f094883-7d84-4677-bc49-ea5b64b8552c",
      "name": "Section: Scrape Listings",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1104,
        48
      ],
      "parameters": {
        "color": 7,
        "width": 664,
        "height": 344,
        "content": "## 2. Scrape Subreddit Listings\nBatch through each subreddit and scrape the \"Top of Week\" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests."
      },
      "typeVersion": 1
    },
    {
      "id": "639b418a-dd27-48c6-9946-525c86f2c3c6",
      "name": "Section: Post Enrichment",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1792,
        48
      ],
      "parameters": {
        "color": 7,
        "width": 328,
        "height": 344,
        "content": "## 3. Parse Post Metadata\nExtract title, URL, score, comment count, author, and timestamps from listing HTML into structured JSON."
      },
      "typeVersion": 1
    },
    {
      "id": "b5e1e99e-c8b6-4542-b9b6-0228cce219e0",
      "name": "Parse Listing HTML \u2192 Post Metadata",
      "type": "n8n-nodes-base.code",
      "position": [
        1904,
        224
      ],
      "parameters": {
        "jsCode": "const html = $json.data || $json.body || $json || '';\nconst maxPosts = $json.limit || 20;\n\n// Fallback metadata\nconst now = new Date();\nconst iso = now.toISOString();\nconst run_id = $json.run_id || iso;\nconst run_date = $json.run_date || iso.slice(0,10);\n// If week_range missing, compute current week (Mon\u00e2\u20ac\u201cSun)\nlet week_range = $json.week_range;\nif (!week_range) {\n  const ws = new Date(now);\n  const day = (ws.getUTCDay() + 6) % 7; // Monday=0\n  ws.setUTCDate(ws.getUTCDate() - day);\n  const we = new Date(ws);\n  we.setUTCDate(we.getUTCDate() + 6);\n  const fmt = d => d.toISOString().slice(0,10);\n  week_range = `${fmt(ws)} to ${fmt(we)}`;\n}\n\nconst upstreamSub = $json.subreddit || '';\nconst sort = $json.sort || 'top';\nconst time_range = $json.time_range || 'week';\nconst extracted_at = new Date().toISOString();\nconst crypto = require('crypto');\n\n// Detect subreddit from permalink/URL if upstream missing\nconst detectSubreddit = (block, url) => {\n  const m1 = block.match(/\\/r\\/([A-Za-z0-9_]+)\\/comments\\//i);\n  if (m1) return m1[1];\n  const m2 = url && url.match(/\\/r\\/([A-Za-z0-9_]+)\\//i);\n  if (m2) return m2[1];\n  const m3 = html.match(/<meta property=\"og:url\" content=\"https:\\/\\/old\\.reddit\\.com\\/r\\/([^/]+)/i);\n  if (m3) return m3[1];\n  const m4 = html.match(/<meta property=\"og:url\" content=\"https:\\/\\/www\\.reddit\\.com\\/r\\/([^/]+)/i);\n  if (m4) return m4[1];\n  return upstreamSub;\n};\n\nconst posts = [];\nconst blocks = html.split('<div class=\"thing');\nfor (const blk of blocks) {\n  if (!blk.includes('data-fullname')) continue;\n\n  const idMatch = blk.match(/data-fullname=\"([^\"]+)\"/);\n  const authorMatch = blk.match(/data-author=\"([^\"]*)\"/);\n  const titleMatch = blk.match(/<a[^>]*class=\"title[^>]*\"[^>]*href=\"([^\"]+)\"[^>]*>([\\s\\S]*?)<\\/a>/);\n  if (!titleMatch) continue;\n\n  // Prefer Reddit permalink; emit www.reddit.com\n  const permalinkMatch = blk.match(/data-permalink=\"([^\"]+)\"/);\n  const href = titleMatch[1];\n  const urlPath = permalinkMatch ? permalinkMatch[1] : href;\n  const url = urlPath.startsWith('http')\n    ? urlPath.replace('://old.reddit.com','://www.reddit.com')\n    : `https://www.reddit.com${urlPath.startsWith('/') ? urlPath : `/${urlPath}`}`;\n\n  const subreddit = detectSubreddit(blk, url);\n  const title = titleMatch[2].replace(/<[^>]+>/g,'').trim();\n\n  const scoreMatch = blk.match(/<div class=\"score[^>]*?(?:title=\"([\\d,]+)\")?[^>]*>([\\d,]+|\\u2022)/);\n  const scoreVal = scoreMatch ? (scoreMatch[1] || scoreMatch[2]) : '';\n  const commentsMatch =\n    blk.match(/<a[^>]*class=\"comments[^\"]*\"[^>]*>(\\d+)[^<]*comment/) ||\n    blk.match(/<a[^>]*href=\"[^\"]*comments[^\"]*\"[^>]*>(\\d+)\\s+comment/);\n\n  const flairMatch =\n    blk.match(/class=\"linkflairlabel[^\"]*\"[^>]*>([^<]+)<\\/span>/) ||\n    blk.match(/class=\"linkflair[^\"]*\"[^>]*>([^<]+)<\\/span>/);\n\n  // Selftext on listing is usually absent for link/image posts\n  const selftextMatch = blk.match(/data-selftext-html=\"([^\"]*)\"/);\n  const post_text = selftextMatch\n    ? selftextMatch[1]\n        .replace(/&lt;/g,'<')\n        .replace(/&gt;/g,'>')\n        .replace(/&amp;/g,'&')\n        .replace(/<[^>]+>/g,'')\n        .trim()\n    : '';\n\n  let created_utc = '';\n  const tsMatch = blk.match(/data-timestamp=\"(\\d+)\"/);\n  if (tsMatch) created_utc = new Date(Number(tsMatch[1])).toISOString();\n\n  const post_id = idMatch ? idMatch[1] : (url.split('/').filter(Boolean).pop() || '');\n  const score = scoreVal && scoreVal !== '\\u2022' ? Number(scoreVal.replace(/,/g,'')) : '';\n  const num_comments = commentsMatch ? Number(commentsMatch[1]) : '';\n  const flair = flairMatch ? flairMatch[1] : '';\n  const author = authorMatch ? authorMatch[1] : '';\n  const hash = crypto.createHash('sha1').update(`${subreddit}${title}${url}`).digest('hex');\n\n  posts.push({\n    run_id,\n    run_date,\n    week_range,\n    subreddit,\n    sort,\n    time_range,\n    post_id,\n    post_url: url,\n    post_title: title,\n    post_text,\n    author,\n    created_utc,\n    score,\n    num_comments,\n    flair,\n    extracted_at,\n    content_hash: hash,\n    is_new: true\n  });\n  if (posts.length >= maxPosts) break;\n}\n\nreturn posts.map(p => ({ json: p }));\n"
      },
      "typeVersion": 2,
      "alwaysOutputData": true
    },
    {
      "id": "b5ae003d-5943-4e1d-ad00-636b7e5287ae",
      "name": "Section: Post Enrichment1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2144,
        48
      ],
      "parameters": {
        "color": 7,
        "width": 856,
        "height": 344,
        "content": "## 4. Enrich & Finalize Posts\nFetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record."
      },
      "typeVersion": 1
    },
    {
      "id": "692706d5-37e1-43e0-9d53-20165176691a",
      "name": "Section: Post Enrichment2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1264,
        464
      ],
      "parameters": {
        "color": 7,
        "width": 856,
        "height": 280,
        "content": "## 5. Deduplicate & Save\nCompare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab."
      },
      "typeVersion": 1
    },
    {
      "id": "ca7a04f5-3e93-40c7-aafe-db3adf4237b8",
      "name": "Section: Post Enrichment3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2144,
        464
      ],
      "parameters": {
        "color": 7,
        "width": 856,
        "height": 280,
        "content": "## 6. Weekly Digest & Email\nGenerate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email."
      },
      "typeVersion": 1
    },
    {
      "id": "fd788a62-92a1-46f8-8d8f-300c67b9028e",
      "name": "Weekly Schedule Trigger",
      "type": "n8n-nodes-base.scheduleTrigger",
      "position": [
        720,
        224
      ],
      "parameters": {
        "rule": {
          "interval": [
            {}
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "7514a857-699a-40a5-a111-2d32af2cf38a",
      "name": "Configure Subreddits & Week Range",
      "type": "n8n-nodes-base.code",
      "position": [
        944,
        224
      ],
      "parameters": {
        "jsCode": "// Robust static store helper (works in Code node)\nconst getStatic = () => {\n  if (typeof $getWorkflowStaticData === 'function') return $getWorkflowStaticData('global');\n  if (typeof this !== 'undefined' && this.getWorkflowStaticData) return this.getWorkflowStaticData('global');\n  // fallback (per-execution)\n  if (!globalThis.__staticData) globalThis.__staticData = { global: {} };\n  return globalThis.__staticData.global;\n};\n\nconst global = getStatic();\nglobal.seen = [];\nglobal.newPosts = [];\n\nconst now = new Date();\nconst iso = now.toISOString();\nconst run_date = iso.slice(0,10);\nconst weekStart = new Date(now);\nconst day = (weekStart.getUTCDay() + 6) % 7; // Monday=0\nweekStart.setUTCDate(weekStart.getUTCDate() - day);\nconst weekEnd = new Date(weekStart);\nweekEnd.setUTCDate(weekEnd.getUTCDate() + 6);\nconst fmt = d => d.toISOString().slice(0,10);\n\nconst subs = [\"selfhosted\",\"devops\",\"programming\",\"webdev\"];\nreturn subs.map(sub => ({ \n  json: {\n    run_id: iso,\n    run_date,\n    week_range: `${fmt(weekStart)} to ${fmt(weekEnd)}`,\n    subreddit: sub,\n    sort: \"top\",\n    time_range: \"week\",\n    limit: 20,\n    sheet_id: \"1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI\"\n  }\n}));\n"
      },
      "typeVersion": 2
    },
    {
      "id": "a2cba4ba-859a-4af4-85de-ea86979d3393",
      "name": " Split Subreddits Into Batches",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [
        1168,
        224
      ],
      "parameters": {
        "options": {},
        "batchSize": 1
      },
      "typeVersion": 2
    },
    {
      "id": "c501694f-dcf7-4781-a1fc-f5eebd38e5a0",
      "name": "ScrapeOps: Fetch Subreddit Listing",
      "type": "@scrapeops/n8n-nodes-scrapeops.ScrapeOps",
      "position": [
        1408,
        224
      ],
      "parameters": {
        "url": "={{`https://old.reddit.com/r/${$json.subreddit}/top/?t=week`}}",
        "advancedOptions": {}
      },
      "credentials": {
        "scrapeOpsApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "68756bd8-82da-4204-b4f7-403d2c1f9c03",
      "name": " Polite Delay (1\u20133s)",
      "type": "n8n-nodes-base.wait",
      "position": [
        1616,
        224
      ],
      "parameters": {
        "unit": "seconds",
        "amount": "={{ Math.floor(Math.random()*3)+1 }}"
      },
      "typeVersion": 1
    },
    {
      "id": "4aac1636-f65b-4173-91ea-2479bae04379",
      "name": " ScrapeOps: Fetch Post Details (JSON)",
      "type": "@scrapeops/n8n-nodes-scrapeops.ScrapeOps",
      "position": [
        2208,
        224
      ],
      "parameters": {
        "url": "={{ ($json.post_url || '').replace(/\\?.*$/, '').replace(/\\/$/, '') + '.json?raw_json=1' }}\n",
        "returnType": "json",
        "advancedOptions": {}
      },
      "credentials": {
        "scrapeOpsApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "8a539ed8-a2aa-4bfe-a493-c424ec9b7d39",
      "name": "Extract Selftext & Post Type",
      "type": "n8n-nodes-base.code",
      "position": [
        2432,
        224
      ],
      "parameters": {
        "jsCode": "// n8n Code node: Extract post data from Reddit .json via ScrapeOps\n// Robust for: normal JSON, HTML-encoded JSON, and text/link/image posts.\n\nconst payload = $json ?? {};\n\n// 1) Get response string\nlet raw = '';\nif (typeof payload.body === 'string') raw = payload.body;\nelse if (typeof payload.data === 'string') raw = payload.data;\nelse if (typeof payload.response === 'string') raw = payload.response;\nelse {\n  for (const v of Object.values(payload)) {\n    if (typeof v === 'string' && v.length > raw.length) raw = v;\n  }\n}\n\nconst trimmed = (raw || '').trim();\n\nif (!trimmed) {\n  return [{\n    json: {\n      extracted_ok: false,\n      error: 'No response found in body/data/response.',\n      available_keys: Object.keys(payload),\n      post_text_extracted: ''\n    }\n  }];\n}\n\n// 2) Decode entities (handles &#34; etc)\nconst decodeHtmlEntities = (s = '') =>\n  s\n    .replace(/&#34;/g, '\"')\n    .replace(/&quot;/g, '\"')\n    .replace(/&#39;/g, \"'\")\n    .replace(/&apos;/g, \"'\")\n    .replace(/&#38;/g, '&')\n    .replace(/&amp;/g, '&')\n    .replace(/&lt;/g, '<')\n    .replace(/&gt;/g, '>')\n    .replace(/&nbsp;/g, ' ');\n\nconst decoded = decodeHtmlEntities(trimmed);\n\n// 3) Only treat as HTML if it truly starts like HTML\nif (decoded.startsWith('<')) {\n  return [{\n    json: {\n      extracted_ok: false,\n      error: 'Got HTML (likely blocked/redirect).',\n      debug_first_200_chars: decoded.slice(0, 200),\n      status_code: payload.status_code ?? null,\n      url: payload.url ?? null,\n      post_text_extracted: ''\n    }\n  }];\n}\n\n// 4) If it looks like JSON, parse it\nconst firstChar = decoded[0];\nif (firstChar !== '[' && firstChar !== '{') {\n  return [{\n    json: {\n      extracted_ok: false,\n      error: 'Response is not JSON or HTML (unexpected format).',\n      debug_first_200_chars: decoded.slice(0, 200),\n      status_code: payload.status_code ?? null,\n      url: payload.url ?? null,\n      post_text_extracted: ''\n    }\n  }];\n}\n\nlet parsed;\ntry {\n  parsed = JSON.parse(decoded);\n} catch (e) {\n  return [{\n    json: {\n      extracted_ok: false,\n      error: `JSON parse failed: ${e.message}`,\n      debug_first_200_chars: decoded.slice(0, 200),\n      status_code: payload.status_code ?? null,\n      url: payload.url ?? null,\n      post_text_extracted: ''\n    }\n  }];\n}\n\n// 5) Extract post fields\nconst postListing = Array.isArray(parsed) ? parsed[0] : parsed;\nconst post = postListing?.data?.children?.[0]?.data;\n\nif (!post) {\n  return [{\n    json: {\n      extracted_ok: false,\n      error: 'Parsed JSON but could not find post at data.children[0].data',\n      url: payload.url ?? null,\n      post_text_extracted: ''\n    }\n  }];\n}\n\nconst selftext = (post.selftext || '').trim();\nconst post_type =\n  selftext ? 'text' :\n  (post.is_video ? 'video' :\n   (post.post_hint || (post.url ? 'link_or_image' : 'unknown')));\n\nreturn [{\n  json: {\n    extracted_ok: true,\n    post_text_extracted: selftext,  // can be \"\" for image/link posts\n    post_type,\n\n    post_title: post.title || '',\n    post_id: post.name || (post.id ? `t3_${post.id}` : ''),\n    post_url: post.permalink ? `https://www.reddit.com${post.permalink}` : (post.url || ''),\n    subreddit: post.subreddit || '',\n    score: post.score ?? null,\n    num_comments: post.num_comments ?? null,\n    author: post.author || '',\n    created_utc: post.created_utc ?? null\n  }\n}];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "4b2b0080-c6b9-4265-883a-821450fbdd04",
      "name": "Merge Post Metadata + Text",
      "type": "n8n-nodes-base.merge",
      "position": [
        2640,
        224
      ],
      "parameters": {
        "mode": "combine",
        "options": {},
        "combinationMode": "mergeByPosition"
      },
      "typeVersion": 2
    },
    {
      "id": "afd98102-216a-46da-b40f-8f22f8132063",
      "name": "Finalize & Normalize Post Fields",
      "type": "n8n-nodes-base.code",
      "position": [
        2848,
        224
      ],
      "parameters": {
        "jsCode": "return items.map(item => {\n  const j = { ...item.json };\n  const extracted = (j.post_text_extracted || '').trim();\n  if (extracted) {\n    j.post_text = extracted;\n  } else {\n    j.post_text = (j.post_text || '').trim();\n  }\n  delete j.post_text_extracted;\n  return { json: j };\n});"
      },
      "typeVersion": 2
    },
    {
      "id": "d57d918f-c7ef-4ee1-928f-586ec9ad8509",
      "name": "Read Existing Posts from Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        1360,
        560
      ],
      "parameters": {
        "options": {},
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI/edit#gid=0",
          "cachedResultName": "posts"
        },
        "documentId": {
          "__rl": true,
          "mode": "id",
          "value": "1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3,
      "alwaysOutputData": true
    },
    {
      "id": "ee1034af-bede-4a50-8b60-76eeb894ad39",
      "name": " Merge Scraped + Existing Posts",
      "type": "n8n-nodes-base.merge",
      "position": [
        1584,
        560
      ],
      "parameters": {
        "mode": "combine",
        "options": {},
        "combinationMode": "mergeByPosition"
      },
      "typeVersion": 2
    },
    {
      "id": "678a0f94-b0e1-464b-8a60-2f1fbc51e771",
      "name": "Deduplicate New Posts",
      "type": "n8n-nodes-base.code",
      "position": [
        1776,
        560
      ],
      "parameters": {
        "jsCode": "// Shared store\nconst global =\n  (typeof $getWorkflowStaticData === 'function')\n    ? $getWorkflowStaticData('global')\n    : (globalThis.__static ??= { global: {} }).global;\n\nif (!global.seen) global.seen = [];\nif (!global.newPosts) global.newPosts = [];\n\n// Normalize URL\nconst normUrl = (u = '') => u.replace('://old.reddit.com', '://www.reddit.com').trim();\n\n// Read existing rows\nlet existingRows = [];\ntry { existingRows = $items('Read Existing Posts from Sheet') || []; } catch (e) { existingRows = []; }\n\nconst existingSet = new Set(global.seen || []);\nfor (const row of existingRows) {\n  const r = row?.json || {};\n  const h = r.content_hash || r[15] || r.Q || '';\n  const u = normUrl(r.post_url || r[6] || r.G || '');\n  if (h) existingSet.add(h);\n  if (u) existingSet.add(u);\n}\n\n// Process incoming items; keep all, flag duplicates\nconst out = [];\nfor (const item of items) {\n  const j = { ...item.json };\n  j.post_url = normUrl(j.post_url || '');\n  const seen = existingSet.has(j.content_hash) || existingSet.has(j.post_url);\n  if (!seen) {\n    existingSet.add(j.content_hash);\n    existingSet.add(j.post_url);\n    j.is_new = true;\n    global.newPosts.push(j);\n  } else {\n    j.is_new = false;\n  }\n  out.push({ json: j });\n}\n\n// Update cache\nglobal.seen = Array.from(existingSet);\n\n// Always return items\nreturn out;\n"
      },
      "typeVersion": 2,
      "alwaysOutputData": true
    },
    {
      "id": "76be0f72-2be0-4b00-9549-18d739d62167",
      "name": "Append New Posts to Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        1968,
        560
      ],
      "parameters": {
        "columns": {
          "value": {
            "sort": "={{ $json.sort }}",
            "flair": "={{ $json.flair }}",
            "score": "={{ $json.score }}",
            "author": "={{ $json.author }}",
            "is_new": "={{ $json.is_new }}",
            "run_id": "={{ $json.run_id }}",
            "post_id": "={{ $json.post_id }}",
            "post_url": "={{ $json.post_url }}",
            "run_date": "={{ $json.run_date }}",
            "post_text": "={{ $json.post_text }}",
            "subreddit": "={{ $json.subreddit }}",
            "post_title": "={{ $json.post_title }}",
            "time_range": "={{ $json.time_range }}",
            "created_utc": "={{ $json.created_utc }}",
            "content_hash": "={{ $json.content_hash }}",
            "extracted_at": "={{ $json.extracted_at }}",
            "num_comments": "={{ $json.num_comments }}"
          },
          "schema": [
            {
              "id": "run_id",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "run_id",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "run_date",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "run_date",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "subreddit",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "subreddit",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "sort",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "sort",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "time_range",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "time_range",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "post_id",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "post_id",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "post_url",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "post_url",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "post_title",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "post_title",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "post_text",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "post_text",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "author",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "author",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "created_utc",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "created_utc",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "score",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "score",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "num_comments",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "num_comments",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "flair",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "flair",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "extracted_at",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "extracted_at",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "content_hash",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "content_hash",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "is_new",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "is_new",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "name",
          "value": "posts"
        },
        "documentId": {
          "__rl": true,
          "mode": "id",
          "value": "1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.5
    },
    {
      "id": "08ab7fd2-6dd0-4fd1-8df4-9440f0830178",
      "name": "Build Weekly Digest",
      "type": "n8n-nodes-base.code",
      "position": [
        2240,
        544
      ],
      "parameters": {
        "jsCode": "// Grab shared store (works in Code node); fallback to per-run object\nconst global =\n  (typeof $getWorkflowStaticData === 'function')\n    ? $getWorkflowStaticData('global')\n    : (globalThis.__static ??= { global: {} }).global;\n\nconst posts = global.newPosts || [];\nconst total_posts = posts.length;\nconst subreddits = Array.from(new Set(posts.map(p => p.subreddit))).join(', ');\nconst stop = new Set(['the','a','an','to','for','and','or','of','in','on','with','by','from','is','are','was','were','be','this','that','it','as','at','we','you','i']);\nconst wordCounts = new Map();\nfor (const p of posts) {\n  const text = `${p.post_title || ''} ${p.post_text || ''}`.toLowerCase();\n  for (const w of text.split(/[^a-z0-9+#]+/).filter(Boolean)) {\n    if (stop.has(w) || w.length < 3) continue;\n    wordCounts.set(w, (wordCounts.get(w) || 0) + 1);\n  }\n}\nconst topWords = Array.from(wordCounts.entries()).sort((a,b)=>b[1]-a[1]).slice(0,40).map(([w])=>w);\nconst topics = [];\nfor (const w of topWords) {\n  if (topics.length >= 8) break;\n  const clusterPosts = posts.filter(p => (`${p.post_title} ${p.post_text}`).toLowerCase().includes(w)).slice(0,3);\n  if (!clusterPosts.length) continue;\n  topics.push({ label: w, summary: `Highlights related to ${w}.`, links: clusterPosts.map(p => p.post_url) });\n}\nwhile (topics.length < 5 && topWords[topics.length]) {\n  topics.push({ label: topWords[topics.length], summary: `Topic around ${topWords[topics.length]}.`, links: [] });\n}\nconst sorted = [...posts].sort((a,b)=>{\n  const as = Number(a.score || 0), bs = Number(b.score || 0);\n  if (bs !== as) return bs - as;\n  const ac = Number(a.num_comments || 0), bc = Number(b.num_comments || 0);\n  return bc - ac;\n}).slice(0,10);\nconst topPostsJson = sorted.map(p => ({ title: p.post_title, url: p.post_url, score: p.score ?? '', comments: p.num_comments ?? '', subreddit: p.subreddit }));\nconst brief = [];\nbrief.push(`Weekly Developer Tools Digest (Reddit) - ${posts[0]?.week_range || ''}`);\nbrief.push(`Subreddits: ${subreddits || 'webdev, programming, devops, selfhosted'}`);\nbrief.push(`Total new posts: ${total_posts}`);\nbrief.push('\\nTopics:');\nfor (const t of topics) {\n  brief.push(`- ${t.label}: ${t.summary}`);\n  if (t.links && t.links.length) brief.push(`  Links: ${t.links.join(', ')}`);\n}\nbrief.push('\\nTop posts overall:');\nfor (const p of topPostsJson) {\n  brief.push(`- [${p.subreddit}] ${p.title} (${p.score || 0} pts / ${p.comments || 0} comments) -> ${p.url}`);\n}\nconst created_at = new Date().toISOString();\nreturn [{\n  json: {\n    run_id: posts[0]?.run_id || created_at,\n    week_range: posts[0]?.week_range || '',\n    subreddits: subreddits || 'webdev, programming, devops, selfhosted',\n    total_posts,\n    top_topics_json: JSON.stringify(topics),\n    weekly_brief_text: brief.join('\\n'),\n    top_posts_json: JSON.stringify(topPostsJson),\n    created_at\n  }\n}];\n"
      },
      "typeVersion": 2
    },
    {
      "id": "1532c19d-20d6-4d51-8852-bb8aead8dac2",
      "name": "Append Weekly Digest to Sheet",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        2512,
        544
      ],
      "parameters": {
        "columns": {
          "value": {
            "run_id": "={{ $json.run_id }}",
            "created_at": "={{ $json.created_at }}",
            "subreddits": "={{ $json.subreddits }}",
            "week_range": "={{ $json.week_range }}",
            "total_posts": "={{ $json.total_posts }}",
            "top_posts_json": "={{ $json.top_posts_json }}",
            "top_topics_json": "={{ $json.top_topics_json }}",
            "weekly_brief_text": "={{ $json.weekly_brief_text }}"
          },
          "schema": [
            {
              "id": "run_id",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "run_id",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "week_range",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "week_range",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "subreddits",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "subreddits",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "total_posts",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "total_posts",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "top_topics_json",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "top_topics_json",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "weekly_brief_text",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "weekly_brief_text",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "top_posts_json",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "top_posts_json",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "created_at",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "created_at",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "name",
          "value": "weekly_digest"
        },
        "documentId": {
          "__rl": true,
          "mode": "id",
          "value": "1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.5
    },
    {
      "id": "bca10483-5886-4c01-9440-f075c658efcc",
      "name": "Send Weekly Digest Email",
      "type": "n8n-nodes-base.emailSend",
      "position": [
        2768,
        544
      ],
      "parameters": {
        "text": "={{$json.weekly_brief_text}}",
        "options": {},
        "subject": "={{`Weekly Developer Tools Digest (Reddit) \u00c3\u00a2\u00e2\u201a\u00ac\u00e2\u20ac\u0153 ${$json.week_range}`}}",
        "toEmail": "user@example.com",
        "fromEmail": "you@example.com"
      },
      "executeOnce": true,
      "typeVersion": 2
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "b99cef81-f55c-49f8-940a-02db28ebb6d3",
  "connections": {
    "Build Weekly Digest": {
      "main": [
        [
          {
            "node": "Append Weekly Digest to Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Deduplicate New Posts": {
      "main": [
        [
          {
            "node": "Append New Posts to Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    " Polite Delay (1\u20133s)": {
      "main": [
        [
          {
            "node": "Parse Listing HTML \u2192 Post Metadata",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Weekly Schedule Trigger": {
      "main": [
        [
          {
            "node": "Configure Subreddits & Week Range",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Append New Posts to Sheet": {
      "main": [
        [
          {
            "node": " Split Subreddits Into Batches",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Merge Post Metadata + Text": {
      "main": [
        [
          {
            "node": "Finalize & Normalize Post Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Extract Selftext & Post Type": {
      "main": [
        [
          {
            "node": "Merge Post Metadata + Text",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    "Append Weekly Digest to Sheet": {
      "main": [
        [
          {
            "node": "Send Weekly Digest Email",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    " Split Subreddits Into Batches": {
      "main": [
        [
          {
            "node": "ScrapeOps: Fetch Subreddit Listing",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Build Weekly Digest",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Read Existing Posts from Sheet": {
      "main": [
        [
          {
            "node": " Merge Scraped + Existing Posts",
            "type": "main",
            "index": 1
          }
        ]
      ]
    },
    " Merge Scraped + Existing Posts": {
      "main": [
        [
          {
            "node": "Deduplicate New Posts",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Finalize & Normalize Post Fields": {
      "main": [
        [
          {
            "node": " Merge Scraped + Existing Posts",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Configure Subreddits & Week Range": {
      "main": [
        [
          {
            "node": " Split Subreddits Into Batches",
            "type": "main",
            "index": 0
          },
          {
            "node": "Read Existing Posts from Sheet",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "ScrapeOps: Fetch Subreddit Listing": {
      "main": [
        [
          {
            "node": " Polite Delay (1\u20133s)",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Parse Listing HTML \u2192 Post Metadata": {
      "main": [
        [
          {
            "node": " ScrapeOps: Fetch Post Details (JSON)",
            "type": "main",
            "index": 0
          },
          {
            "node": "Merge Post Metadata + Text",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    " ScrapeOps: Fetch Post Details (JSON)": {
      "main": [
        [
          {
            "node": "Extract Selftext & Post Type",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}