AutomationFlows › Methodology

How the catalog is built.

How AutomationFlows finds, deduplicates, scores, and publishes the n8n workflow catalog. Written so you can reproduce it, audit it, or push back on it.

Where workflows come from

The miner runs every 12 hours and pulls from eight sources of public, openly-shared n8n content:

State persists at ~/.automationflows_miner_state.json so daily runs are incremental — already-indexed workflows aren't re-fetched. Author-fan-out and LLM-generated query expansion (Haiku, persisted across runs) extend the search frontier each pass.

Privacy strip — applied to every workflow before publish

Workflow JSONs in upstream archives often contain real test data, credential UUIDs, and active webhook URLs. We remove all of it before any workflow is published:

Implemented in scripts/automation_miner.py::strip_privacy(). Not bypassable. If you find a workflow that still leaks something, flag it and we'll re-strip + republish.

Deduplication

The same workflow often shows up in three places: the upstream Zie619 archive, a forked GitHub repo, and a forum post linking the raw URL. We deduplicate on the SHA-256 hash of the canonical workflow content (nodes + connections + name, with privacy fields stripped first). Identical hashes collapse to one record; the first-seen source_repo stays as the attribution.

Near-duplicates with minor parameter changes are kept as separate entries — the workflow graph itself is the unit of identity.

Tagging

Every workflow gets normalised metadata derived from its node graph:

The Pro QualityScore

Pro / Lifetime users see a multi-signal QualityScore (0-100) on every workflow. The current formula:

quality_pro = rescale(0.80 × structural + 0.20 × metadata, [10, 75] → [50, 100])

Structural (0-100) blends six axes scored from the node graph alone:

Metadata (0-100) is a smaller signal covering schema correctness and integration breadth, blended at 20%.

Validation: hand-rated 20 stratified samples against this scoring; Spearman ρ = +0.941, Pearson r = +0.950. The harness re-runs after any weight change at scripts/run_quality_spike.py.

Popularity signals (GitHub stars, recency, install / uninstall behaviour) are NOT in the blend. They were originally — we removed them because they're popularity proxies, not quality measures, and they unfairly penalise non-GitHub workflows (Codeberg, GitLab, gists, n8n.io templates, dev.to embeds). Content-only formula scores all sources identically.

What we don't republish

If a workflow's meta.templateId matches a template on n8n.io's official template gallery, the AutomationFlows page links to n8n.io with rel="canonical" — we surface the template for discovery but don't compete with the canonical home.

Update cadence

Miner runs at 04:00 and 16:00 local time daily via launchd. Each run discovers new workflows, regenerates the sitemap, refreshes the atom feed, and pushes the diff to GitHub; Netlify auto-deploys. The "What's new" page at /recent/ shows the 100 most recent additions.

Take-down

Every workflow links back to its original source_repo with full attribution. If you're the original creator and want a workflow removed, email acreatorstore@translatea.com with the URL — same-day removal is the standard.

Reproducibility

The miner code is in scripts/automation_miner.py; the page generator is scripts/generate_pages.py; the QualityScore implementation is scripts/workflow_quality.py with unit tests in scripts/test_workflow_quality.py. The full data feed is at /data/workflows.json and per-workflow stripped JSONs at /data/workflow_jsons/ — both CDN-served, both publicly readable.