AutomationFlowsAI & RAG › Extract Amazon Book Data & Generate Purchase Reports with Decodo Scraper

Extract Amazon Book Data & Generate Purchase Reports with Decodo Scraper

ByTrung Tran @trungtran on n8n.io

[](https://www.youtube.com/watch?v=9Kn583UJlqY) > This workflow demos how to use Decodo Scraper API to crawl any public web page (headless JS, device emulation: mobile/desktop/tablet), extract structured product data from the returned HTML, generate a purchase-ready report,…

Event trigger★★★★☆ complexityAI-powered22 nodesOpenAI ChatOutput Parser StructuredHTTP RequestGoogle DriveAgentSlack@Decodo/N8N Nodes Decodo
AI & RAG Trigger: Event Nodes: 22 Complexity: ★★★★☆ AI nodes: yes Added:

This workflow corresponds to n8n.io template #8142 — we link there as the canonical source.

This workflow follows the Agent → Google Drive recipe pattern — see all workflows that pair these two integrations.

The workflow JSON

Copy or download the full n8n JSON below. Paste it into a new n8n workflow, add your credentials, activate. Full import guide →

Download .json
{
  "id": "PhWv95onl9vhiYWZ",
  "meta": {
    "templateCredsSetupCompleted": true
  },
  "name": "Decodo Scraper API Workflow Template (n8n Automation Amazon Book Purchase Report)",
  "tags": [
    {
      "id": "vx2KtjCPUPLpD567",
      "name": "Decodo",
      "createdAt": "2025-09-05T06:01:07.861Z",
      "updatedAt": "2025-09-05T06:01:07.861Z"
    }
  ],
  "nodes": [
    {
      "id": "b46505f2-ba00-4ceb-8df7-9deb4854ba94",
      "name": "When clicking \u2018Execute workflow\u2019",
      "type": "n8n-nodes-base.manualTrigger",
      "position": [
        1152,
        736
      ],
      "parameters": {},
      "typeVersion": 1
    },
    {
      "id": "84fa3d8c-3586-41de-b21f-c6be94d58376",
      "name": "OpenAI Chat Model",
      "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi",
      "position": [
        2064,
        960
      ],
      "parameters": {
        "model": {
          "__rl": true,
          "mode": "list",
          "value": "gpt-4.1-mini"
        },
        "options": {}
      },
      "credentials": {
        "openAiApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1.2
    },
    {
      "id": "48732581-32df-4722-839d-19bd63c39d34",
      "name": "Structured Output Parser",
      "type": "@n8n/n8n-nodes-langchain.outputParserStructured",
      "position": [
        2192,
        960
      ],
      "parameters": {
        "jsonSchemaExample": "[{\n  \"asin\": \"0399501487\",\n  \"title\": \"Lord of the Flies\",\n  \"author\": \"William Golding\",\n  \"rank\": 50,\n  \"category\": \"Literature & Fiction\",\n  \"sub_category\": \"Classics\",\n  \"rating\": 4.6,\n  \"ratings_count\": 25600,\n  \"price\": {\n    \"currency\": \"USD\",\n    \"amount\": 9.99,\n    \"format\": \"Paperback\"\n  },\n  \"url\": \"https://www.amazon.com/dp/0399501487\",\n  \"publisher\": \"Penguin\",\n  \"publication_date\": \"1959-04-15\",\n  \"language\": \"English\",\n  \"pages\": 224\n}\n]"
      },
      "typeVersion": 1.3
    },
    {
      "id": "79fffd5f-b6fd-490d-8d7d-9a9ca7eab16d",
      "name": "Create document file",
      "type": "n8n-nodes-base.httpRequest",
      "position": [
        2848,
        736
      ],
      "parameters": {
        "url": "https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart&supportsAllDrives=true",
        "body": "=--foo_bar_baz\nContent-Type: application/json; charset=UTF-8\n\n{\n  \"name\": \"{{ $json.Today }}\",\n  \"mimeType\": \"application/vnd.google-apps.document\",\n  \"parents\": [\"{{ $json['Drive Folder ID'] }}\"]\n}\n\n--foo_bar_baz\nContent-Type: text/markdown; charset=UTF-8\n\n{{ $('Build \ud83d\udcda Book Purchase Report').item.json.markdown }}\n\n--foo_bar_baz--",
        "method": "POST",
        "options": {},
        "sendBody": true,
        "sendQuery": true,
        "contentType": "raw",
        "sendHeaders": true,
        "authentication": "predefinedCredentialType",
        "rawContentType": "multipart/related; boundary=foo_bar_baz",
        "queryParameters": {
          "parameters": [
            {
              "name": "uploadType",
              "value": "multipart"
            },
            {
              "name": "supportsAllDrives",
              "value": "true"
            }
          ]
        },
        "headerParameters": {
          "parameters": [
            {
              "name": "boundary",
              "value": "foo_bar_baz"
            }
          ]
        },
        "nodeCredentialType": "googleDriveOAuth2Api"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.2
    },
    {
      "id": "54015561-60c6-4bad-97b9-a24909d6b017",
      "name": "Convert document to PDF",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        3072,
        736
      ],
      "parameters": {
        "fileId": {
          "__rl": true,
          "mode": "id",
          "value": "={{ $json.id }}"
        },
        "options": {
          "googleFileConversion": {
            "conversion": {
              "docsToFormat": "application/pdf"
            }
          }
        },
        "operation": "download"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "20a0dba3-9c3f-4016-a4a6-ce64807fd7bd",
      "name": "Configure Google Drive Folder ",
      "type": "n8n-nodes-base.set",
      "position": [
        2624,
        736
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "1ff0b9a4-7d60-44ec-b047-e49252f1ace9",
              "name": "Drive Folder ID",
              "type": "string",
              "value": "1IPcko8bzogO3W4mxhrW2Q017QA0Lc5MI"
            },
            {
              "id": "d64a1ac4-15db-4c84-a1db-fbd6b48084f5",
              "name": "Today",
              "type": "string",
              "value": "={{ $now.format(\"ddMMyyyyhhmmss\") }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "911adc36-4b3a-40b7-a192-ff2d329046dc",
      "name": "Product Analyzer Agent",
      "type": "@n8n/n8n-nodes-langchain.agent",
      "position": [
        2048,
        736
      ],
      "parameters": {
        "text": "=Get top 10 best selling book from the below web content:\n{{ $json.text }}",
        "options": {
          "systemMessage": "You are a helpful assistant to parse the HTML content and output as well-structure JSON"
        },
        "promptType": "define",
        "hasOutputParser": true
      },
      "typeVersion": 2.1
    },
    {
      "id": "c6a98520-1943-40b3-99c3-4c8f8b7fefb0",
      "name": "Edit Fields",
      "type": "n8n-nodes-base.set",
      "position": [
        1376,
        736
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "391aaecd-88c0-4943-9417-2d9fc6bc50b9",
              "name": "Authenticate_Token",
              "type": "string",
              "value": "Get token from your Decodo dashboard (https://dashboard.decodo.com/web-scraping-api/scraper)"
            },
            {
              "id": "859e5162-ef18-454a-9819-c1b0f2800b3f",
              "name": "url",
              "type": "string",
              "value": "https://www.amazon.com/Best-Sellers-Books/zgbs/books"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "2f6ad9c8-1340-425b-b71b-2fbe1460078c",
      "name": "HTML Response Parser",
      "type": "n8n-nodes-base.code",
      "position": [
        1824,
        736
      ],
      "parameters": {
        "jsCode": "// n8n Code node (JavaScript)\n// Input:  $input.first().json.results[0].content\n// Output: clean plain text (no HTML/JS/CSS, minimal \\n)\n\nfunction stripAll(html) {\n  if (typeof html !== 'string') return '';\n\n  // Remove scripts, styles, head, comments, svg, noscript, canvas\n  html = html.replace(/<script[\\s\\S]*?<\\/script>/gi, '');\n  html = html.replace(/<style[\\s\\S]*?<\\/style>/gi, '');\n  html = html.replace(/<head[\\s\\S]*?<\\/head>/gi, '');\n  html = html.replace(/<noscript[\\s\\S]*?<\\/noscript>/gi, '');\n  html = html.replace(/<svg[\\s\\S]*?<\\/svg>/gi, '');\n  html = html.replace(/<canvas[\\s\\S]*?<\\/canvas>/gi, '');\n  html = html.replace(/<!--[\\s\\S]*?-->/g, '');\n\n  // Replace block-level tags with a single newline\n  const blockTags = [\n    'p','div','section','article','header','footer','nav','aside','main',\n    'h1','h2','h3','h4','h5','h6','ul','ol','li','table','tr','td','th','br','hr'\n  ];\n  for (const tag of blockTags) {\n    const rxOpen  = new RegExp(`<${tag}[^>]*>`, 'gi');\n    const rxClose = new RegExp(`</${tag}>`, 'gi');\n    html = html.replace(rxOpen, '\\n');\n    html = html.replace(rxClose, '\\n');\n  }\n\n  // Strip all remaining tags\n  let text = html.replace(/<\\/?[^>]+>/g, '');\n\n  // Decode common HTML entities\n  text = text\n    .replace(/&nbsp;/g, ' ')\n    .replace(/&amp;/g, '&')\n    .replace(/&lt;/g, '<')\n    .replace(/&gt;/g, '>')\n    .replace(/&quot;/g, '\"')\n    .replace(/&#39;/g, \"'\");\n\n  // Clean whitespace\n  text = text\n    .replace(/\\r/g, '')\n    .replace(/[ \\t]+/g, ' ')       // collapse spaces/tabs\n    .replace(/\\n[ \\t]+/g,'')    // trim spaces after newlines\n    .replace(/\\n{3,}/g, '')    // collapse 3+ newlines into 2\n    .trim();\n  \n  return text;\n}\n\n// MAIN\nconst html = $input.first().json?.results?.[0]?.content || '';\nif (!html) {\n  return [{ json: { error: 'No HTML found at json.results[0].content' } }];\n}\n\nconst text = stripAll(html);\n\nreturn [{\n  json: {\n    text,\n    chars: text.length\n  }\n}];"
      },
      "typeVersion": 2
    },
    {
      "id": "65633939-53f4-4209-a9d5-653ef04b0bdd",
      "name": "Upload report to Slack ",
      "type": "n8n-nodes-base.slack",
      "position": [
        3296,
        736
      ],
      "parameters": {
        "options": {
          "fileName": "=Book Purchase Report {{ $today.format('yyyy-MM-dd') }}",
          "channelId": "C09E9SDE99P",
          "initialComment": "\ud83d\udcda Book Purchase Report"
        },
        "resource": "file",
        "authentication": "oAuth2"
      },
      "credentials": {
        "slackOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 2.3
    },
    {
      "id": "3619b79e-47b1-4878-96ea-c35d7a51167d",
      "name": "Build \ud83d\udcda Book Purchase Report",
      "type": "n8n-nodes-base.code",
      "position": [
        2400,
        736
      ],
      "parameters": {
        "jsCode": "// n8n Code node (JavaScript)\n// Input shape expected:\n// items[0].json.output = [ { title, author, rank, category, sub_category, rating, ratings_count, price:{currency, amount, format}, ... }, ... ]\n\nfunction median(nums) {\n  if (!nums.length) return 0;\n  const arr = [...nums].sort((a, b) => a - b);\n  const mid = Math.floor(arr.length / 2);\n  return arr.length % 2 ? arr[mid] : (arr[mid - 1] + arr[mid]) / 2;\n}\n\nfunction sum(nums) {\n  return nums.reduce((a, b) => a + (Number.isFinite(b) ? b : 0), 0);\n}\n\nfunction mean(nums) {\n  const valid = nums.filter(n => Number.isFinite(n));\n  return valid.length ? sum(valid) / valid.length : 0;\n}\n\nfunction fmtMoney(n, currency = \"USD\") {\n  if (!Number.isFinite(n)) n = 0;\n  try {\n    return new Intl.NumberFormat(\"en-US\", { style: \"currency\", currency }).format(n);\n  } catch {\n    return `$${n.toFixed(2)}`;\n  }\n}\n\nfunction pad(str, len) {\n  return (str + '').padEnd(len, ' ');\n}\n\nfunction repeat(s, n) {\n  return Array.from({ length: n }, () => s).join('');\n}\n\nfunction safe(v, d='') { return (v === null || v === undefined) ? d : v; }\n\nfunction asBar(value, max, width = 20) {\n  if (max <= 0) return '';\n  const filled = Math.round((value / max) * width);\n  return repeat('\u2589', filled);\n}\n\nconst now = new Date();\nconst dateStr = now.toLocaleString('en-US', { year:'numeric', month:'short', day:'2-digit' });\n\nconst inp = $input.all();\nconst books = (inp?.[0]?.json?.output && Array.isArray(inp[0].json.output))\n  ? inp[0].json.output\n  : Array.isArray(inp?.[0]?.json) ? inp[0].json : [];\n\nconst clean = books.map(b => ({\n  title: safe(b.title, 'N/A'),\n  author: safe(b.author, 'Unknown'),\n  rank: Number.isFinite(b.rank) ? b.rank : null,\n  category: safe(b.category, 'N/A'),\n  sub_category: safe(b.sub_category, 'N/A'),\n  rating: Number.isFinite(b.rating) ? b.rating : 0,\n  ratings_count: Number.isFinite(b.ratings_count) ? b.ratings_count : 0,\n  price_amount: Number.isFinite(b?.price?.amount) ? b.price.amount : 0,\n  price_currency: safe(b?.price?.currency, 'USD'),\n  price_format: safe(b?.price?.format, 'Unknown'),\n}));\n\n// --- Core metrics ---\nconst totalBooks = clean.length;\nconst currency = clean.find(b => b.price_currency)?.price_currency || 'USD';\nconst prices = clean.map(b => b.price_amount).filter(n => Number.isFinite(n) && n >= 0);\nconst totalSpend = sum(prices);\nconst avgPrice = mean(prices);\nconst medPrice = median(prices);\nconst minPrice = prices.length ? Math.min(...prices) : 0;\nconst maxPrice = prices.length ? Math.max(...prices) : 0;\n\nconst rated = clean.filter(b => b.rating > 0);\nconst avgRating = rated.length ? mean(rated.map(b => b.rating)) : 0;\nconst ratingCoverage = rated.length;\nconst unratedCount = totalBooks - ratingCoverage;\n\n// category/subcategory counts\nconst catCount = {};\nconst subCatCount = {};\nfor (const b of clean) {\n  catCount[b.category] = (catCount[b.category] || 0) + 1;\n  subCatCount[b.sub_category] = (subCatCount[b.sub_category] || 0) + 1;\n}\n\n// formats\nconst formatCount = {};\nfor (const b of clean) {\n  formatCount[b.price_format] = (formatCount[b.price_format] || 0) + 1;\n}\n\n// top by popularity (ratings_count) and by rating (>=50 ratings)\nconst topByRatingsCount = [...clean].sort((a,b) => b.ratings_count - a.ratings_count).slice(0,5);\nconst topByRating = [...clean]\n  .filter(b => b.ratings_count >= 50 && b.rating > 0)\n  .sort((a,b) => b.rating - a.rating)\n  .slice(0,5);\n\n// price histogram (simple buckets)\nconst bucketSize = 5; // $5 buckets\nconst buckets = {};\nfor (const p of prices) {\n  const b = Math.floor(p / bucketSize) * bucketSize;\n  const label = `${fmtMoney(b, currency)}\u2013${fmtMoney(b + bucketSize - 0.01, currency)}`;\n  buckets[label] = (buckets[label] || 0) + 1;\n}\nconst bucketEntries = Object.entries(buckets).sort((a,b) => {\n  // sort by numeric lower bound\n  const n = s => Number(s[0].replace(/[^0-9.]/g, ''));\n  return n(a) - n(b);\n});\nconst maxBucket = bucketEntries.length ? Math.max(...bucketEntries.map(([_, v]) => v)) : 0;\n\n// category table (top 5)\nconst topCats = Object.entries(catCount).sort((a,b) => b[1]-a[1]).slice(0,5);\nconst topSubCats = Object.entries(subCatCount).sort((a,b) => b[1]-a[1]).slice(0,5);\nconst topFormats = Object.entries(formatCount).sort((a,b) => b[1]-a[1]).slice(0,5);\n\n// build tables\nfunction makeKpiRow(label, value) {\n  return `| ${label} | ${value} |`;\n}\n\nfunction tableHeader(cols) {\n  return `| ${cols.join(' | ')} |\\n| ${cols.map(()=>'---').join(' | ')} |`;\n}\n\nfunction bookRow(b) {\n  const title = b.title.length > 72 ? b.title.slice(0,69) + '\u2026' : b.title;\n  return `| ${safe(b.rank,'\u2013')} | ${title} | ${b.author} | ${fmtMoney(b.price_amount, b.price_currency)} | ${b.rating || '\u2013'} | ${b.ratings_count} |`;\n}\n\nconst kpiTable = [\n  tableHeader(['Metric','Value']),\n  makeKpiRow('Total Books', totalBooks),\n  makeKpiRow('Total Spend (list prices)', fmtMoney(totalSpend, currency)),\n  makeKpiRow('Avg Price', fmtMoney(avgPrice, currency)),\n  makeKpiRow('Median Price', fmtMoney(medPrice, currency)),\n  makeKpiRow('Price Range', `${fmtMoney(minPrice, currency)} \u2013 ${fmtMoney(maxPrice, currency)}`),\n  makeKpiRow('Avg Rating (rated only)', rated.length ? avgRating.toFixed(2) : '\u2013'),\n  makeKpiRow('Rated Titles', `${ratingCoverage}/${totalBooks}`),\n].join('\\n');\n\nconst popularityTable = [\n  tableHeader(['Rank','Title','Author','Price','Rating','#Ratings']),\n  ...topByRatingsCount.map(bookRow),\n].join('\\n');\n\nconst qualityTable = [\n  tableHeader(['Rank','Title','Author','Price','Rating','#Ratings']),\n  ...topByRating.map(bookRow),\n].join('\\n');\n\nfunction kvTable(title, entries) {\n  const rows = entries.map(([k,v]) => `| ${k} | ${v} |`).join('\\n');\n  return `**${title}**\\n\\n${tableHeader(['Key','Count'])}\\n${rows}`;\n}\n\nconst catTable = kvTable('Top Categories', topCats);\nconst subCatTable = kvTable('Top Sub-Categories', topSubCats);\nconst fmtTable = kvTable('Formats', topFormats);\n\n// histogram block\nconst histLines = bucketEntries.map(([label, count]) => {\n  const bar = asBar(count, maxBucket, 24);\n  return `- ${pad(label, 22)} | ${pad(count, 3)} ${bar}`;\n}).join('\\n');\n\nconst highlights = [];\nif (topByRatingsCount[0]) {\n  highlights.push(`**Most talked-about**: _${topByRatingsCount[0].title}_ (${topByRatingsCount[0].ratings_count.toLocaleString()} ratings).`);\n}\nif (topByRating[0]) {\n  highlights.push(`**Highest rated (\u226550 ratings)**: _${topByRating[0].title}_ (${topByRating[0].rating.toFixed(1)}\u2605).`);\n}\nif (topFormats[0]) {\n  highlights.push(`**Preferred format**: ${topFormats[0][0]} (${topFormats[0][1]} titles).`);\n}\nif (topCats[0]) {\n  highlights.push(`**Top category**: ${topCats[0][0]} (${topCats[0][1]} titles).`);\n}\n\nconst recs = [];\nif (unratedCount > 0) {\n  recs.push(`Fill in missing ratings for ${unratedCount} titles to improve quality insights.`);\n}\nif (prices.some(p => p === 0)) {\n  const zeroPrices = prices.filter(p => p === 0).length;\n  recs.push(`Assign prices to ${zeroPrices} title(s) with \\$0 to avoid skewing spend/price stats.`);\n}\nif (clean.some(b => !b.rank)) {\n  recs.push(`Confirm ranks for all titles to refine top-N comparisons.`);\n}\n\nconst markdown = `\n# \ud83d\udcda Book Purchase Report\n**Date:** ${dateStr}\n\n> A quick, readable snapshot of your current book list with pricing, ratings, and category insights.\n\n---\n\n## \ud83d\udd0e Executive KPIs\n${kpiTable}\n\n---\n\n## \u2b50 Most Popular (by #Ratings)\n${popularityTable}\n\n---\n\n## \ud83c\udfc6 Highest Rated (\u2265 50 ratings)\n${qualityTable}\n\n---\n\n## \ud83e\udde9 Breakdown\n${catTable}\n\n${subCatTable}\n\n${fmtTable}\n\n---\n\n## \ud83d\udcb2 Price Distribution (bucket = \\$${bucketSize})\n\\`\\`\\`\n${histLines || 'No price data available.'}\n\\`\\`\\`\n\n---\n\n## \u2728 Highlights\n${highlights.length ? highlights.map(x => `- ${x}`).join('\\n') : '- No highlights available yet.'}\n\n---\n\n## \u2705 Suggestions\n${(recs.length ? recs : ['All good!']).map(x => `- ${x}`).join('\\n')}\n`.trim();\n\n// Also expose structured stats if you want to reuse downstream\nconst stats = {\n  totalBooks,\n  currency,\n  price: {\n    totalSpend,\n    avgPrice,\n    medianPrice: medPrice,\n    minPrice,\n    maxPrice,\n  },\n  ratings: {\n    avgRating: rated.length ? Number(avgRating.toFixed(3)) : null,\n    ratedCount: ratingCoverage,\n    unratedCount,\n  },\n  counts: {\n    byCategory: catCount,\n    bySubCategory: subCatCount,\n    byFormat: formatCount,\n  },\n};\n\nreturn [\n  {\n    json: {\n      markdown,\n      stats,\n    }\n  }\n];"
      },
      "typeVersion": 2
    },
    {
      "id": "48fe9e27-67d0-4d86-95be-c4efddbb5a07",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        0,
        0
      ],
      "parameters": {
        "width": 1040,
        "height": 2672,
        "content": "# Decodo Scraper API Workflow Template (n8n Automation Amazon Book Purchase Report)\n### Watch the demo video below:\n[![Watch the video](https://s3.ap-southeast-1.amazonaws.com/automatewith.me/how-to-use-scraper-api-with-n8n.jpg)](https://www.youtube.com/watch?v=9Kn583UJlqY)\n> This workflow demos how to use **Decodo Scraper API** to crawl any public web page (headless JS, device emulation: mobile/desktop/tablet), extract structured product data from the returned HTML, generate a **purchase-ready report**, and automatically deliver it as a **Google Doc + PDF** to Slack/Drive.\n## Who\u2019s it for\n- **Creators / Analysts** who need quick product lists (books, gadgets, etc.) with prices/ratings.\n- **Ops & Marketing teams** building weekly \u201ctop picks\u201d reports.\n- **Engineers** validating the Decodo Scraper API + LLM extraction pattern before scaling.\n\n## How it works / What it does\n\n1. **Trigger** \u2013 Manually run the workflow.\n2. **Edit Fields (manual)** \u2013 Provide inputs:\n   - `targetUrl` (e.g., an Amazon category/search/listing page)\n   - `deviceType` (`desktop` | `mobile` | `tablet`)\n   - Optional: `maxItems`, `notes`, `reportTitle`, `reportOwner`\n3. **Scraper API Request (HTTP Request \u2192 POST)**  \n   Calls **Decodo Scraper API** with:\n   - URL to crawl, **headless JS** enabled\n   - **Device emulation** (UA + viewport)\n   - Optional **waitFor / executeJS** to ensure late-loading content is captured\n4. **HTML Response Parser (Code/Function or HTML node)**  \n   Pulls the HTML string from Decodo response and normalizes it (strip scripts/styles, collapse whitespace).\n5. **Product Analyzer Agent (LLM + Structured Output Parser)**  \n   Prompts an LLM to extract **structured \u201cbook\u201d objects** from the HTML:\n   The **Structured Output Parser** enforces a strict JSON schema and drops malformed items.\n6. **Build \ud83d\udcda Book Purchase Report (Code/LLM)**  \n   Converts the JSON array into a **Markdown** (or HTML) report with:\n   - Executive summary (top picks, average price/rating)\n   - Table of items (rank, title, author, price, rating, link)\n   - \u201cRecommended to buy\u201d shortlist (rules configurable)\n   - Notes / owner / timestamp\n7. **Configure Google Drive Folder (manual)**  \n   Choose/create a Drive folder for output artifacts.\n8. **Create Document File (Google Docs API)**  \n   Creates a Doc from the generated Markdown/HTML.\n9. **Convert Document to PDF (Google Drive export)**  \n   Exports the Doc to PDF.\n10. **Upload report to Slack**  \n   Sends the PDF (and/or Doc link) to a chosen Slack channel with a short summary.\n\n## How to set up\n\n### 1 Prerequisites\n- **n8n** (self-hosted or Cloud)\n- **Decodo Scraper API** key\n- **OpenAI (or compatible) API key** for the Analyzer Agent\n- **Google Drive/Docs** credentials (OAuth2)\n- **Slack** Bot/User token (files:write, chat:write)\n\n### 2 Environment variables (recommended)\n- `DECODO_API_KEY`\n- `OPENAI_API_KEY`\n- `DRIVE_FOLDER_ID` (optional default)\n- `SLACK_CHANNEL_ID`\n\n### 3 Nodes configuration (high level)\n**Edit Fields (Set node)**\n**Scraper API Request (HTTP Request \u2192 POST)**\n**HTML Response Parser (Code node)**\n**Product Analyzer Agent**\n**Build Book Purchase Report (Code/LLM)**\n**Create Document File**\n**Convert to PDF**\n**Upload to Slack**\n\n## Requirements\n\n- **Decodo**: Active API key and endpoint access. Be mindful of concurrency/rate limits.\n- **Model**: GPT-4o/4.1-mini or similar for reliable structured extraction.\n- **Google**: OAuth client (Docs/Drive scopes). Ensure n8n can write to the target folder.\n- **Slack**: Bot token with `files:write` + `chat:write`.\n\n## How to customize the workflow\n\n- **Target site**: Change `targetUrl` to any **public** page (category, search, or listing).  \n  For other domains (not Amazon), tweak the **LLM guidance** (e.g., price/label patterns).\n- **Device emulation**: Switch `deviceType` to `mobile` to fetch mobile-optimized markup (often simpler DOMs).\n- **Late-loading pages**: Adjust `waitFor.selector` or use `waitUntil: \"networkidle\"` (if supported) to ensure full content loads.\n- **Client-side JS**: Extend `executeJS` if you need to interact (scroll, click \u201cnext\u201d, expand sections). You can also loop over pagination by iterating URLs.\n- **Extraction schema**: Add fields (e.g., `discount_percent`, `bestseller_badge`, `prime_eligible`) and update the Structured Output schema accordingly.\n- **Filtering rules**: Modify recommendation logic (e.g., min ratings count, price bands, languages).\n- **Report branding**: Add logo, cover page, footer with company info; switch to HTML + inline CSS for richer Docs formatting.\n- **Destinations**: Besides Slack & Drive, add Email, Notion, Confluence, or a database sink.\n- **Scheduling**: Add a **Cron** trigger for weekly/monthly auto-reports."
      },
      "typeVersion": 1
    },
    {
      "id": "2d72b07e-8000-4690-99dc-88aa113f1f1c",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1120,
        544
      ],
      "parameters": {
        "color": 5,
        "width": 272,
        "content": "### 1. Trigger Workflow Execution  \nThe workflow starts manually by clicking **Execute workflow**. This allows users to control when the Amazon book data scraping and report generation begins.\n"
      },
      "typeVersion": 1
    },
    {
      "id": "6ed3a761-226d-4255-8862-8ed5a94af538",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1280,
        928
      ],
      "parameters": {
        "color": 5,
        "width": 272,
        "content": "### 2. Edit Input Fields  \nSet the required fields such as `targetUrl` (Amazon book listing page), `deviceType` (desktop or mobile), and report details (title, owner, notes). These values define the scope and context of the report."
      },
      "typeVersion": 1
    },
    {
      "id": "69b52aea-2e5f-4cd9-aafb-0f2f2000f8b5",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1504,
        544
      ],
      "parameters": {
        "color": 5,
        "width": 272,
        "content": "### 3. Send Scraper API Request (Decodo)  \nAn HTTP POST request is sent to **Decodo Scraper API**, which crawls the target Amazon page using headless JavaScript and device emulation. This ensures all product data loads as it appears to real users."
      },
      "typeVersion": 1
    },
    {
      "id": "c3c1856c-e9e7-4372-9a34-a93b4a2009f6",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1728,
        944
      ],
      "parameters": {
        "color": 5,
        "width": 272,
        "content": "### 4. Parse HTML Response  \nThe raw HTML returned by Decodo is cleaned and normalized. Scripts, styles, and unnecessary tags are removed, leaving only the meaningful page content for analysis."
      },
      "typeVersion": 1
    },
    {
      "id": "5a99ff12-c13d-4cb5-9852-1edb9009a634",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2048,
        528
      ],
      "parameters": {
        "color": 5,
        "width": 288,
        "content": "### 5. Product Analyzer Agent (LLM)  \nAn AI agent processes the cleaned HTML and extracts **structured book data** (title, author, price, rating, ASIN, etc.) into JSON format. The structured output parser guarantees consistent schema."
      },
      "typeVersion": 1
    },
    {
      "id": "b0ada9bd-85f2-48a7-b7f3-ad4c4591ac0b",
      "name": "Sticky Note6",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2448,
        912
      ],
      "parameters": {
        "color": 5,
        "width": 288,
        "content": "### 6. Build Book Purchase Report  \nThe extracted JSON is converted into a **human-readable purchase report**. The report includes a summary, detailed book table, top recommendations, and additional notes."
      },
      "typeVersion": 1
    },
    {
      "id": "6fa824c0-ff52-43b1-8684-8db9e13ac3ee",
      "name": "Sticky Note7",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        2768,
        560
      ],
      "parameters": {
        "color": 5,
        "width": 352,
        "height": 128,
        "content": "### Create ceport Book Purchase Report PDF\n- Configure Google Drive Folder  \n- Create Google Document  \n-  Convert Document to PDF  "
      },
      "typeVersion": 1
    },
    {
      "id": "a6b954e4-7c92-49cf-8637-543adb2fdc22",
      "name": "Sticky Note8",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3248,
        912
      ],
      "parameters": {
        "color": 5,
        "width": 288,
        "height": 144,
        "content": "### 10. Upload Report to Slack  \nFinally, the PDF report is uploaded to a Slack channel. This enables instant distribution to teams, ensuring everyone has access to the latest Amazon book purchase insights."
      },
      "typeVersion": 1
    },
    {
      "id": "b3defbf8-85b4-4484-bec7-74e060f9b4d3",
      "name": "Sticky Note9",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        3600,
        704
      ],
      "parameters": {
        "width": 320,
        "height": 144,
        "content": "## Sample output report from crawl data\nhttps://s3.ap-southeast-1.amazonaws.com/automatewith.me/Book+Purchase+Report+2025-09-02"
      },
      "typeVersion": 1
    },
    {
      "id": "6baab7ab-2283-4285-a689-014440243f1a",
      "name": "Decodo",
      "type": "@decodo/n8n-nodes-decodo.decodo",
      "position": [
        1600,
        736
      ],
      "parameters": {
        "geo": "=",
        "url": "=https://www.amazon.com/Best-Sellers-Books/zgbs/books"
      },
      "credentials": {
        "decodoApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    }
  ],
  "active": false,
  "settings": {
    "executionOrder": "v1"
  },
  "versionId": "98edc8a3-130c-4c16-8471-59f0c0dd30e5",
  "connections": {
    "Decodo": {
      "main": [
        [
          {
            "node": "HTML Response Parser",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Edit Fields": {
      "main": [
        [
          {
            "node": "Decodo",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "OpenAI Chat Model": {
      "ai_languageModel": [
        [
          {
            "node": "Product Analyzer Agent",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Create document file": {
      "main": [
        [
          {
            "node": "Convert document to PDF",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "HTML Response Parser": {
      "main": [
        [
          {
            "node": "Product Analyzer Agent",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Product Analyzer Agent": {
      "main": [
        [
          {
            "node": "Build \ud83d\udcda Book Purchase Report",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Convert document to PDF": {
      "main": [
        [
          {
            "node": "Upload report to Slack ",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Structured Output Parser": {
      "ai_outputParser": [
        [
          {
            "node": "Product Analyzer Agent",
            "type": "ai_outputParser",
            "index": 0
          }
        ]
      ]
    },
    "Configure Google Drive Folder ": {
      "main": [
        [
          {
            "node": "Create document file",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Build \ud83d\udcda Book Purchase Report": {
      "main": [
        [
          {
            "node": "Configure Google Drive Folder ",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "When clicking \u2018Execute workflow\u2019": {
      "main": [
        [
          {
            "node": "Edit Fields",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Credentials you'll need

Each integration node will prompt for credentials when you import. We strip credential IDs before publishing — you'll add your own.

Pro

For the full experience including quality scoring and batch install features for each workflow upgrade to Pro

About this workflow

[](https://www.youtube.com/watch?v=9Kn583UJlqY) &gt; This workflow demos how to use Decodo Scraper API to crawl any public web page (headless JS, device emulation: mobile/desktop/tablet), extract structured product data from the returned HTML, generate a purchase-ready report,…

Source: https://n8n.io/workflows/8142/ — original creator credit. Request a take-down →

More AI & RAG workflows → · Browse all categories →

Related workflows

Workflows that share integrations, category, or trigger type with this one. All free to copy and import.

AI & RAG

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

Form Trigger, HTTP Request, Agent +6
AI & RAG

Decodo is a powerful public data access platform offering managed web scraping APIs and proxy infrastructure to collect structured web data at scale. It handles proxies, anti-bot protection, JavaScrip

OpenAI Chat, Output Parser Structured, HTTP Request +4
AI & RAG

🎯 Create viral TikToks, Shorts, Reels, podcasts, and ASMR videos in minutes — all on autopilot.

OpenAI, HTTP Request, Form Trigger +7
AI & RAG

Generate AI viral videos with NanoBanana & VEO3, shared on socials via Blotato 2. Uses @blotato/n8n-nodes-blotato, googleSheets, lmChatOpenAi, toolThink. Event-driven trigger; 94 nodes.

@Blotato/N8N Nodes Blotato, Google Sheets, OpenAI Chat +9
AI & RAG

The best content automation template in the market is now even better—with “deep research” on time-sensitive topics\! Unlike most n8n content automation templates that are mainly for “demo purposes,”

OpenAI, HTTP Request, XML +11