Sensitive data exposure rises with employee use of GenAI tools
Harmonic Security has released its quarterly analysis finding that a significant proportion of data shared with Generative AI (GenAI) tools and AI-enabled SaaS applications by employees contains sensitive information.
The analysis was conducted on a dataset comprising 1 million prompts and 20,000 files submitted to 300 GenAI tools and AI-enabled SaaS applications between April and June. According to the findings, 22% of files (total 4,400) and 4.37% of prompts (total 43,700) included sensitive data. The categories of sensitive data encompassed source code, access credentials, proprietary algorithms, merger and acquisition (M&A) documents, customer or employee records, and internal financial information.
Use of new GenAI tools
The data highlights that in the second quarter alone, organisations on average saw employees begin using 23 previously unreported GenAI tools. This expanding variety of tools increases the administrative load on security teams, who are required to vet each tool to ensure it meets security standards.
A notable proportion of AI tool use occurs through personal accounts, which may be unsanctioned or lack sufficient safeguards. Almost half (47.42%) of sensitive uploads to Perplexity were made via standard, non-enterprise accounts. The numbers were lower for other platforms, with 26.3% of sensitive data entering ChatGPT through personal accounts, and just 15% for Google Gemini.
Data exposure by platform
Analysis of sensitive prompts identified ChatGPT as the most common origin point in Q2, accounting for 72.6%, followed by Microsoft Copilot with 13.7%, Google Gemini at 5.0%, Claude at 2.5%, Poe at 2.1%, and Perplexity at 1.8%.
Code leakage represented the most prevalent form of sensitive data exposure, particularly within ChatGPT, Claude, DeepSeek, and Baidu Chat.
File uploads and risks
The report found that, on average, organisations uploaded 1.32GB of files in the second quarter, with PDFs making up approximately half of all uploads. Of these files, 21.86% contained sensitive data. The concentration of sensitive information was higher in files compared to prompts. For example, files accounted for 79.7% of all stored credit card exposure incidents, 75.3% of customer profile leaks, and 68.8% of employee personally identifiable information (PII) incidents. Files accounted for 52.6% of exposure volume related to financial projections.
Less visible sources of risk
GenAI risk does not only arise from well-known chatbots. Increasingly, regular SaaS tools that integrate large language models (LLMs) - often without clear labelling as GenAI - are becoming sources of risk as they access and process sensitive information. Canva was reportedly used for documents containing legal strategy, M&A planning, and client data. Replit and Lovable.dev were involved with proprietary code and access keys, while Grammarly and Quillbot edited contracts, client emails, and internal legal content.
International exposure
Use of Chinese GenAI applications was cited as a concern. The study found that 7.95% of employees in the average enterprise engaged with a Chinese GenAI tool, leading to 535 distinct sensitive exposure incidents. Within these, 32.8% were related to source code, access credentials, or proprietary algorithms, 18.2% involved M&A documents and investment models, 17.8% exposed customer or employee PII, and 14.4% contained internal financial data.
Preventative measures
"The good news for Harmonic Security customers is that this sensitive customer data, personally identifiable information (PII), and proprietary file contents never actually left any customer tenant, it was prevented from doing so. But had organizations not had browser based protection in place, sensitive information could have ended up training a model, or worse, in the hands of a foreign state. AI is now embedded in the very tools employees rely on every day and in many cases, employees have little knowledge they are exposing business data."
Harmonic Security Chief Executive Officer and Co-founder Alastair Paterson made this statement, referencing the protections offered to their customers and the wider risks posed by the pervasive nature of embedded AI within workplace tools.
Harmonic Security advises enterprises to seek visibility into all tool usage – including tools available on free tiers and those with embedded AI – to monitor the types of data being entered into GenAI systems and to enforce context-aware controls at the data level.
The recent analysis utilised the Harmonic Security Browser Extension, which records usage across SaaS and GenAI platforms and sanitises the information for aggregate study. Only anonymised and aggregated data from customer environments was used in the analysis.