{
  "nodes": [
    {
      "id": "dbf53eba-3804-4593-9600-48ef5f817e60",
      "name": "Sticky Note",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        -288,
        -176
      ],
      "parameters": {
        "width": 464,
        "height": 864,
        "content": "## Sanitize text with a local Ollama model\n\nThis n8n workflow sanitizes text from markdown documents stored in a google drive folder. It uses a local Ollama model to identify and remove Personally Identifiable Information (PII).\n\n**Use case**:\nTo clean private documents before using them with public LLM models.\n\n## How It Works\n* A google drive trigger monitors a folder for the creation of new subfolders.\n* The HTTP node retrieves the filenames of all documents in the new folder.\n* The workflow filters and downloads only markdown files, extracting their text content.\n* A basic LLM chain using Ollama 3.1 processes chunks of text to redact any detected PII. If no PII is found, the text is returned unchanged.\n* The cleaned text chunks are merged and saved back to a file in the same google drive folder.\n* A spreadsheet named chunk logs is updated with each text chunk and includes a flag indicating whether PII was detected.\n\n## Requirements\n* Google drive OAuth [credentials](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-generic/#enable-apis) (to be configured in the Trigger, HTTP, and Google Sheets nodes)\n* A local n8n instance\n* A local Ollama model (currently using version 3.1)\n\n## Customising This Workflow\n* Review the spreadsheet to identify additional types of PII and update the prompt to suit your specific use cases.\n* Modify the filter condition from text/markdown if you want to extract text from other document formats."
      },
      "typeVersion": 1
    },
    {
      "id": "e6a18b46-5902-4887-910e-8b2e17b47804",
      "name": "return_files_in_folder",
      "type": "n8n-nodes-base.httpRequest",
      "notes": "GET https://www.googleapis.com/drive/v3/files?q=folder_id",
      "position": [
        512,
        -16
      ],
      "parameters": {
        "url": "=https://www.googleapis.com/drive/v3/files?q=\"{{ $json.id }}\"+in+parents&fields=files(id,name,mimeType,modifiedTime,size)",
        "options": {},
        "authentication": "predefinedCredentialType",
        "nodeCredentialType": "googleDriveOAuth2Api"
      },
      "credentials": {
        "googleAdsOAuth2Api": {
          "name": "<your credential>"
        },
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "notesInFlow": true,
      "typeVersion": 4.3
    },
    {
      "id": "14e8fdc1-80f0-4e08-be95-b8122406202d",
      "name": "split_files_item",
      "type": "n8n-nodes-base.splitOut",
      "position": [
        720,
        -16
      ],
      "parameters": {
        "options": {},
        "fieldToSplitOut": "files"
      },
      "typeVersion": 1
    },
    {
      "id": "5b62fd9e-0a0f-4b02-a50b-4987a5204ac8",
      "name": "select_markdown_files",
      "type": "n8n-nodes-base.filter",
      "position": [
        912,
        -16
      ],
      "parameters": {
        "options": {},
        "conditions": {
          "options": {
            "version": 2,
            "leftValue": "",
            "caseSensitive": true,
            "typeValidation": "strict"
          },
          "combinator": "and",
          "conditions": [
            {
              "id": "dca87f04-fdab-4c57-9de6-2be6ea735101",
              "operator": {
                "type": "string",
                "operation": "equals"
              },
              "leftValue": "={{ $json.mimeType }}",
              "rightValue": "=text/markdown"
            }
          ]
        }
      },
      "typeVersion": 2.2
    },
    {
      "id": "91290eff-9838-48d2-9ff6-d3d831efd827",
      "name": "download_drive_files",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        464,
        320
      ],
      "parameters": {
        "fileId": {
          "__rl": true,
          "mode": "id",
          "value": "={{ $json.id }}"
        },
        "options": {},
        "operation": "download"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "f8721b54-3fce-49d1-906c-8f5d66f523dd",
      "name": "text_from_markdown",
      "type": "n8n-nodes-base.extractFromFile",
      "position": [
        640,
        320
      ],
      "parameters": {
        "options": {},
        "operation": "text"
      },
      "notesInFlow": false,
      "typeVersion": 1.1
    },
    {
      "id": "1661df62-85f1-4401-bd93-2c2e73360631",
      "name": "combine_all_text",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        832,
        320
      ],
      "parameters": {
        "options": {},
        "fieldsToAggregate": {
          "fieldToAggregate": [
            {
              "fieldToAggregate": "data"
            }
          ]
        }
      },
      "typeVersion": 1
    },
    {
      "id": "a27810ee-39be-4e77-b85e-5a884406de3b",
      "name": "chunk_text_for_local_llm",
      "type": "n8n-nodes-base.code",
      "notes": "Change `text_chunk_size` to modify text chunk size.",
      "position": [
        480,
        608
      ],
      "parameters": {
        "language": "python",
        "pythonCode": "text_chunk_size = 250\n\n# for item_index, item in enumerate(_input.all()):\n#   item_string = item.json.data[0]\n\nfor item_index, item in enumerate(_input.all()):\n  item_string = ''.join(item.json.data)\n  string_array = [\n     { 'data' : item_string[i:i+text_chunk_size]} \n    for i in range(0, len(item_string), text_chunk_size)\n    \n  ]\n\nreturn string_array"
      },
      "notesInFlow": true,
      "typeVersion": 2
    },
    {
      "id": "348cc5c3-6783-44ac-b7d9-96867f72ec10",
      "name": "Basic LLM Chain",
      "type": "@n8n/n8n-nodes-langchain.chainLlm",
      "position": [
        672,
        608
      ],
      "parameters": {
        "text": "=You are a text sanitization model. Your task is to remove all personal, private, or sensitive information (PII) from the text below without changing the meaning.\n\nDo NOT create new text. Keep all non-PII content intact.\n\nPII to remove includes (but is not limited to):\n- Names\n- Emails\n- Phone numbers\n- Addresses\n- Exact locations\n- Passport/ID numbers\n- Company employee identifiers\n- Birthdates or ages\n- Financial numbers\n- Keys (such as API keys, tokens, passwords, access keys)\n- Codes that could be used to access systems or identify a person\n\n### OUTPUT REQUIREMENTS\n\nYou must return ONLY a single JSON object.\n\nIf PII was found and removed, respond with exactly:\n{\n  \"sanitized_text\": \"<sanitized version>\",\n  \"pii_found\": true\n}\n\nIf NO PII was found, respond with exactly:\n{\n  \"sanitized_text\": \"<original text, unchanged>\",\n  \"pii_found\": false\n}\n\nRules:\n- Use double quotes for all JSON keys and string values.\n- Do NOT add extra fields.\n- Do NOT add explanations, comments, or markdown.\n- Do NOT wrap the JSON in backticks.\n- Output only valid JSON.\n\n### INPUT TEXT\n{{ $json.data }}",
        "batching": {},
        "promptType": "define"
      },
      "typeVersion": 1.7
    },
    {
      "id": "b53b7ecf-ece9-42bf-8364-8b9e35559d7f",
      "name": "Ollama Model",
      "type": "@n8n/n8n-nodes-langchain.lmOllama",
      "position": [
        768,
        768
      ],
      "parameters": {
        "model": "llama3.1:latest",
        "options": {}
      },
      "credentials": {
        "ollamaApi": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "a60822c2-6826-4e5e-a114-99f1fb315063",
      "name": "parse_json_text_with_flag",
      "type": "n8n-nodes-base.set",
      "position": [
        1024,
        608
      ],
      "parameters": {
        "options": {
          "ignoreConversionErrors": true
        },
        "assignments": {
          "assignments": [
            {
              "id": "07d02b76-75b9-4d6b-8a99-539a89bd362d",
              "name": "output",
              "type": "object",
              "value": "={{ $json.text.replaceAll(\"\\r\",\"\").replaceAll(\"\\n\",\"\").replaceAll(\"```\",\"\").toJsonString().parseJson()}}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "b6e6bac8-9f88-4eaf-bb95-2896a760a064",
      "name": "join_text_chunks",
      "type": "n8n-nodes-base.set",
      "position": [
        1664,
        608
      ],
      "parameters": {
        "options": {},
        "assignments": {
          "assignments": [
            {
              "id": "d6c9+1234567890b11-b663945cf38f",
              "name": "sanitized_text[0]",
              "type": "string",
              "value": "={{ $json.sanitized_text.join() }}"
            }
          ]
        }
      },
      "typeVersion": 3.4
    },
    {
      "id": "e8390461-09ef-4f88-be0a-c0ed4f9923b1",
      "name": "create_cleaned_text_file",
      "type": "n8n-nodes-base.googleDrive",
      "position": [
        1856,
        608
      ],
      "parameters": {
        "name": "={{ $now.toISO() }}_cleaned_text.md ",
        "content": "={{ $json.sanitized_text[0] }}",
        "driveId": {
          "__rl": true,
          "mode": "list",
          "value": "My Drive"
        },
        "options": {},
        "folderId": {
          "__rl": true,
          "mode": "list",
          "value": "folder_id",
          "cachedResultUrl": "https://drive.google.com/drive/folders/folder_id",
          "cachedResultName": "test_case_1"
        },
        "operation": "createFromText"
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 3
    },
    {
      "id": "dc8e675a-6240-4fdb-89b9-26b973dd465d",
      "name": "on_subfolder_created",
      "type": "n8n-nodes-base.googleDriveTrigger",
      "position": [
        256,
        -16
      ],
      "parameters": {
        "event": "folderCreated",
        "pollTimes": {
          "item": [
            {
              "mode": "everyMinute"
            }
          ]
        },
        "triggerOn": "specificFolder",
        "folderToWatch": {
          "__rl": true,
          "mode": "list",
          "value": "folder_id",
          "cachedResultUrl": "https://drive.google.com/drive/folders/folder_id",
          "cachedResultName": "blog_generator"
        }
      },
      "credentials": {
        "googleDriveOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 1
    },
    {
      "id": "8c1f00ca-f188-4536-8542-d4d8d5736aba",
      "name": "log_text_with_pii_flag (optional)",
      "type": "n8n-nodes-base.googleSheets",
      "position": [
        1552,
        272
      ],
      "parameters": {
        "columns": {
          "value": {
            "pii found": "={{ $json.output.pii_found }}",
            "created_at": "={{ $now.toISO() }}",
            "cleaned text": "={{ $json.output.sanitized_text }}"
          },
          "schema": [
            {
              "id": "cleaned text",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "cleaned text",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "pii found",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "pii found",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            },
            {
              "id": "created_at",
              "type": "string",
              "display": true,
              "required": false,
              "displayName": "created_at",
              "defaultMatch": false,
              "canBeUsedToMatch": true
            }
          ],
          "mappingMode": "defineBelow",
          "matchingColumns": [],
          "attemptToConvertTypes": false,
          "convertFieldsToString": false
        },
        "options": {},
        "operation": "append",
        "sheetName": {
          "__rl": true,
          "mode": "list",
          "value": "gid=0",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/spreadsheet_id/edit#gid=0",
          "cachedResultName": "logs"
        },
        "documentId": {
          "__rl": true,
          "mode": "list",
          "value": "spreadsheet_id",
          "cachedResultUrl": "https://docs.google.com/spreadsheets/d/spreadsheet_id/edit?usp=drivesdk",
          "cachedResultName": "logs"
        }
      },
      "credentials": {
        "googleSheetsOAuth2Api": {
          "name": "<your credential>"
        }
      },
      "typeVersion": 4.7
    },
    {
      "id": "83e25cee-b0ab-4454-9fb9-bc570a1ab6fc",
      "name": "Sticky Note1",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        432,
        -96
      ],
      "parameters": {
        "color": 5,
        "width": 672,
        "height": 272,
        "content": "### 1. Retrieves filenames from the google drive folder and filters for markdown (.md) files."
      },
      "typeVersion": 1
    },
    {
      "id": "d5f6145a-ea46-4ecd-871c-dd865bea2e1f",
      "name": "Sticky Note2",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        432,
        544
      ],
      "parameters": {
        "color": 4,
        "width": 688,
        "height": 352,
        "content": "### 3. Chunks the text, sends it to the Ollama model, and parses the JSON response."
      },
      "typeVersion": 1
    },
    {
      "id": "2fc8a3a2-2f8f-4f42-a217-bba89c2010cd",
      "name": "Sticky Note3",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        432,
        240
      ],
      "parameters": {
        "color": 2,
        "width": 688,
        "height": 240,
        "content": "### 2. Downloads the markdown files and combines the extracted text."
      },
      "typeVersion": 1
    },
    {
      "id": "0fb33e0d-6378-4005-a959-8418b6a79900",
      "name": "Sticky Note4",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1424,
        160
      ],
      "parameters": {
        "color": 7,
        "width": 384,
        "height": 288,
        "content": "### 4b. Logs the model responses, including the flag indicating whether PII information was found. (optional)."
      },
      "typeVersion": 1
    },
    {
      "id": "dbee8ecd-a8c4-4fca-9490-ba77b3fde993",
      "name": "Sticky Note5",
      "type": "n8n-nodes-base.stickyNote",
      "position": [
        1440,
        528
      ],
      "parameters": {
        "color": 7,
        "width": 576,
        "height": 288,
        "content": "### 4a. Combines the model responses and creates a new markdown file in the google drive folder."
      },
      "typeVersion": 1
    },
    {
      "id": "19daaf82-f162-4c9d-823b-eee7f1d10500",
      "name": "combine_chunk_text",
      "type": "n8n-nodes-base.aggregate",
      "position": [
        1472,
        608
      ],
      "parameters": {
        "options": {},
        "fieldsToAggregate": {
          "fieldToAggregate": [
            {
              "fieldToAggregate": "output.sanitized_text"
            }
          ]
        }
      },
      "typeVersion": 1
    }
  ],
  "connections": {
    "Ollama Model": {
      "ai_languageModel": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "ai_languageModel",
            "index": 0
          }
        ]
      ]
    },
    "Basic LLM Chain": {
      "main": [
        [
          {
            "node": "parse_json_text_with_flag",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "combine_all_text": {
      "main": [
        [
          {
            "node": "chunk_text_for_local_llm",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "join_text_chunks": {
      "main": [
        [
          {
            "node": "create_cleaned_text_file",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "split_files_item": {
      "main": [
        [
          {
            "node": "select_markdown_files",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "combine_chunk_text": {
      "main": [
        [
          {
            "node": "join_text_chunks",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "text_from_markdown": {
      "main": [
        [
          {
            "node": "combine_all_text",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "download_drive_files": {
      "main": [
        [
          {
            "node": "text_from_markdown",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "on_subfolder_created": {
      "main": [
        [
          {
            "node": "return_files_in_folder",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "select_markdown_files": {
      "main": [
        [
          {
            "node": "download_drive_files",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "return_files_in_folder": {
      "main": [
        [
          {
            "node": "split_files_item",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "chunk_text_for_local_llm": {
      "main": [
        [
          {
            "node": "Basic LLM Chain",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "create_cleaned_text_file": {
      "main": [
        []
      ]
    },
    "parse_json_text_with_flag": {
      "main": [
        [
          {
            "node": "log_text_with_pii_flag (optional)",
            "type": "main",
            "index": 0
          },
          {
            "node": "combine_chunk_text",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "log_text_with_pii_flag (optional)": {
      "main": [
        []
      ]
    }
  }
}