Tech Companies Are Stealing Your Data: Real Cases & Solutions

Some recent changes in tech companies’ policies are making this a reality. If it’s just a simple chit-chat about the weather, we can deal with it. But if your conversation contains any of your secrets… that’s not something you should leave to chance. From therapy apps sharing intimate details with advertisers to AI tools learning from your personal chats, a series of real-world incidents starting back in 2020 has effectively shown us how user data – including even highly sensitive medical and personal information – is increasingly up for grabs.

Why is this happening now? In part, the race to develop artificial intelligence (AI) has turned our data into a precious resource. Companies are updating their privacy policies and software licenses to grant themselves more access to user content — whether for training AI models (a recent trend) or for targeted advertising (a long-standing practice). Unfortunately, these quiet changes often come at the expense of our privacy. Let’s explore some eye-opening examples – and their consequences – to understand this trend.

The Signal Slip-Up: Even Secure Apps Have Weak Human Users

Encrypted messengers like Signal are synonymous with privacy. Signal’s end-to-end encryption ensures that only chat participants can read messages – not even Signal’s servers can spy on you. However, in 2025, a U.S. government official accidentally added a journalist to a confidential Signal group chat discussing military plans. It wasn’t a technical failure, but a human one. Signal’s openness – anyone can register – made this possible.
This real-life case was widely covered by the media as 'Signal Gate'; the coverage includes a piece by The Guardian in April 2025 - you can read it here.

Zoom’s TOS and Backlash: Are We Meeting to Feed AI?

In 2023, Zoom quietly updated its Terms of Service with new clauses that allowed it to use audio, video, and chat transcripts to train AI models. The change sparked outrage across the internet and tech blogs when it came to light. Many users threatened to cancel their subscriptions. In response, Zoom promised that it would not use customer content for AI training without explicit consent.

You can find out more about it in the article on the Stack Diary and on The Verge.

Therapy Apps That Betrayed Trust: BetterHelp and Cerebral

BetterHelp, an online therapy app, promised users their data would be kept confidential. In practice, it was quietly sharing data with platforms like Facebook and Snapchat for advertising purposes. The FTC found this practice deceptive and fined the company 7.8 million dollars.

Read the official FTC press release.

Another app, Cerebral, used tracking tools like Meta Pixel and Google Analytics. These tools sent patient data – including names, birth dates, insurance info, and even mental health screening results – to third parties like Facebook.

The Verge reported on the breach, which affected over 3.1 million patients.

ChatGPT’s Hunger for Data

When ChatGPT was released by OpenAI, many users didn’t realize that their conversations were being stored and potentially used to train the AI by default. This became a problem when Samsung employees pasted proprietary source code into the chatbot – and that code may have become part of the AI’s training data. Of course You can opt-out, but who really knows about this option? And can it be trusted?

We can learn something about companies’ approach from another case: OpenAI was sued by The New York Times and other major news organizations for using copyrighted material without permission to train its models. These legal cases challenge the very foundation of current AI training practices.

More details are available in the New York Times article.

When Your Code Becomes AI’s Textbook: GitHub Copilot

GitHub Copilot, launched by Microsoft, uses AI to suggest lines of code. But developers soon realized it had been trained on billions of lines of open-source code – and in some cases, it reproduced code under restrictive licenses, sometimes even including copyright notices and strict copy of original code (which especially has place when GPT was asked for generation of some non-generic and more sophisticated code...)

Developers filed a class-action lawsuit in late 2022, claiming this was a violation of open-source licensing terms. Though parts of the lawsuit were dismissed, the debate continues about what AI should or shouldn’t be allowed to learn from... However judge dismissed majority of the claims, leaves only two claims standing: one accusing GitHub of breach of contract and another alleging violations of the Digital Millennium Copyright Act (DMCA). The judge determined that the code allegedly copied by GitHub was not sufficiently similar to the developers’ original work.

You can read more on InfoWorld.

Why These Examples Matter

Whether it’s messaging apps, therapy platforms, video calls, chatbots, or coding assistants – many modern tools are quietly collecting more data than we expect. These examples show how your conversations, health records, or intellectual property can be used without your knowledge or meaningful consent. Moreover, it highlights how easy it is for a company managing your data to assume they have the right to take ownership of it.

These actions have led to regulatory fines, lawsuits, and a growing mistrust in the services we once took for granted. But they’ve also sparked a broader conversation about privacy and AI ethics.

What You Can Do as a User

1. Read terms of service and privacy policies – especially after updates.

2. Use settings that limit data sharing (e.g., disable chat history in ChatGPT).

3. Prefer services that offer local processing or clear privacy guarantees.

4. Think twice before pasting sensitive information into AI tools.

The tools we use every day are evolving fast – and so are the risks to our data. But awareness is the first line of defense. By learning from real examples like Signal, Zoom, BetterHelp, Cerebral, ChatGPT, and GitHub Copilot, we can make more informed decisions about how and where we share our information.

And what if you are the Owner of the service?

How can you earn your users’ trust? You can always take care of your users’ data — and protect it not only from others, but even from yourself. Some companies already follow this principle. For example, Apple offers Advanced Data Protection for iCloud. Privacy isn’t dead. But it does need our protection – now more than ever.

When creating a service, make sure to implement proper authorization and access management. But still, the best thing you can do is encrypt all user data (whenever possible) — and then throw away the key… or better yet, let only the users hold the key. There are plenty of libraries and solutions available on the market to help with end-to-end encryption of Your users' communication. From free and simple options like basic encryption functions (e.g., OpenSSL), to widely used and open-source protocols that require some effort and expertise (like Matrix or the Signal Protocol), all the way to complete, easy-to-use, cross-platform solutions for all important communication types like our PrivMX Platform.

Conclusion
If you truly value your users’ privacy, encrypt their data and let them hold the keys — that’s the foundation of real digital trust. End-to-end encryption with user-held keys is not just a feature — it’s the most effective way to ensure data privacy, security, and user trust by design.
If you are a User, choose your tools with care, like never before...

When Your Apps Turn Against Your (or Your Users') Privacy.