Your Secrets Inside AI: Where Do Your Prompts Actually Go?
Every time you paste a piece of code into Cursor, upload a report to Gemini, or ask Claude to summarize your internal strategy memo, somewhere a cybersecurity attorney quietly weeps.
AI tools are genuinely useful. I use them myself. But there’s a question most professionals skip right past in their rush to save time: has anyone actually read the legal documents on these platforms?
I did. Let me tell you what I found.
Your Data Is Fuel
For most of these companies, your inputs are not just code snippets or business reports. They are training data. Almost every major provider says this directly in their documentation.
OpenAI states that they may use content you provide to improve their services, including training the models that power ChatGPT. Google applies similar language to Gemini, noting that your data supports and improves their services explicitly including generative AI models.
This means your proprietary algorithm, your draft acquisition memo, your client risk assessment any of it could theoretically become part of a model’s weights. And if a provider’s anonymization process is imperfect during training, that information could surface in responses to someone else. Including a competitor.
A Human May Be Reading This
Think only an AI sees your conversations? Not exactly.
Google’s documentation for Gemini explicitly states that some chats are reviewed by human specialists employed by Google and its service providers, for the stated purpose of improving their models. Anthropic and OpenAI carry similar language companies reserve the right to conduct human moderation when their security filters are triggered.
So if you’re feeding private financial reports, internal correspondence, or documents containing personally identifiable information into a consumer AI tool, you should know: a human reviewer may see it. In practice, those reviewers are often contractors in offshore locations. The chain of custody for that information after that point is not something any of these companies can fully guarantee.
Where Does Your Data Actually Live?
This is the question compliance officers and general counsel should be asking before any AI rollout.
US-based providers OpenAI, Anthropic, Google store data under US jurisdiction, which means it falls under the Cloud Act. That gives US intelligence agencies a legal pathway to request access, and these companies process data on servers across multiple countries.
Chinese providers DeepSeek and Qwen are a different category entirely. DeepSeek’s documentation states directly that your information may be transferred to the People’s Republic of China. Chinese law gives the state broad access to data held by domestic technology companies. If you are sending anything sensitive to these platforms, you should treat that information as potentially accessible to Chinese government authorities. That is not speculation it is what the legal documentation says.
Why Governments Are Nervous About This
Before generative AI, states had a fairly linear mechanism for controlling information. A regulator could send a request to a search engine or social platform, get a link removed or an IP blocked, and limit access for users in a specific geography.
That model does not work on large language models.
A trained model does not provide a link to information it generates text from billions of internal weights. You cannot simply ban a fact from inside a neural network. You cannot selectively block it for users in a specific region. Filter layers can be built around LLMs, and they are but those filters can be bypassed through prompt engineering, and they add cost and fragility to the product.
Beyond information control, states increasingly recognize that LLMs carry the cultural and political values of the countries where they were trained. That is why we are seeing a race toward sovereign AI models in multiple countries simultaneously. Your conversations with these platforms are stored on servers in the provider’s home jurisdiction. From a state perspective, that represents a meaningful transfer of information and potential influence to a foreign power.
Practical Hygiene: What You Should Actually Do
I recognize that banning AI tools outright in a professional environment is not realistic. These tools improve the speed and quality of work, and the business benefits are real. The goal is not prohibition it is discipline.
Here is what I recommend, both for individual practitioners and for organizations thinking about policy.
Turn off training. OpenAI and Anthropic both offer settings that prevent your conversations from being used for model training. Find the setting. Turn it on. This should be the default state for any professional use.
Anonymize before you paste. Before sending any sensitive document to an AI tool, strip identifying information manually. Replace employee names with Employee_1 or Manager. Replace project or brand names with Project_X or Brand_Alpha. Replace revenue figures with proportional stand-ins or placeholders like [REVENUE_DATA]. This takes two minutes and materially reduces your exposure.
Use temporary chat modes. ChatGPT offers a Temporary Chat setting where history is not saved and training is disabled by default. For quick one-off questions involving sensitive context, this is the right tool.
Watch your access keys. If you use agent-based tools like Cursor or Claude Code, restrict the agent’s file access through the tool’s settings. Your environment files and API key configs should not be in scope for an AI agent unless you have a specific reason.
For high-sensitivity work, run locally. If you hold client data, financial secrets, or information subject to regulatory protection, local models are the only genuinely safe option. Tools like Ollama, LM Studio, or AnythingLLM let you run open models lama 3, Mistral, and others on your own hardware. The data never leaves your machine. No network call, no exposure.
The Bottom Line
AI tools are not going away, and they should not. But the professional and legal exposure from undisciplined use is real, documented in the providers’ own terms, and not yet well understood by most of the people making these decisions in organizations.
Read the terms. Know where your data goes. Apply the hygiene. And if a client engagement or matter involves genuinely sensitive material, have a policy in place before the data leaves the building.

