How to Protect Privacy Security with AI Technology?

baoshi.rao

In the era of rapid development of AI technology and related products, a large amount of user privacy is being used for AI machine learning without consent, endangering user privacy security. AI giants both domestically and internationally have recognized this issue and are actively developing products to protect privacy using AI technology.

We are in an era of intelligent transformation, where artificial intelligence technology is "empowering" various industries. Big data is like new energy, AI algorithms are like engines, and companies equipped with big data and AI technology are like boarding a fast train to the future, leaving competitors far behind.

However, such rapid development does not come without a cost—

Our phone numbers, email addresses, home and work address coordinates, device IDs, purchase records, app usage logs, browsing history, search engine click habits, facial recognition data, fingerprints, heartbeats, and other such information are all private data we are unwilling to share lightly. Yet, in the AI era, this data may already be part of a dataset used by some company to train AI algorithms.

It is these seemingly insignificant pieces of personal privacy data that form a sufficiently large training set, enabling AI to learn cognitive abilities, allowing AI algorithms that have never met us to recognize, understand, and even predict our preferences and motivations, as well as know our family and friends. Our privacy is the "price" paid for achieving this intelligence.

Of course, this is not necessarily a price you are willing to pay willingly.

So how can we protect our privacy? Can we simply not use these technologies?

Do you think turning off GPS on your phone prevents your location from being tracked? Your phone still has a gyroscope, built-in compass, barometer, and other sensors that can be used to locate you. As long as you use a phone, absolute privacy protection does not exist.

For many mobile apps, the choice is either not to use them or to accept the risk of privacy leaks. For example, many apps require phone number registration, mobile verification to continue use, or even facial recognition verification. So, what can individuals do to protect their privacy? Almost nothing. Coupled with the black-box nature of AI algorithms, we are often completely unaware of the logic and motivations behind AI decisions.

Privacy protection is extremely difficult to achieve through individual efforts alone; it requires strong legal regulations to enforce limits.

On May 25, 2018, the EU's General Data Protection Regulation (GDPR) came into effect. This is a data protection regulatory framework within the EU and is currently the most comprehensive and stringent privacy protection regulation. According to data released by DLA Piper, within less than two years, GDPR has resulted in fines totaling €114 million, with the largest fine being €50 million imposed on Google by France under GDPR.

The reason was Google's lack of transparency, insufficient information, and failure to obtain valid user consent when targeting ads. Below is a distribution map of fines imposed by EU countries from GDPR's enactment to January 2020.

For businesses, GDPR requires that before collecting users' personal information, they must clearly explain in a "concise, transparent, and understandable form, using plain language" what information will be collected, how it will be stored, and how it will be used, while also providing contact details.

For individuals, GDPR grants data subjects seven rights: the right to be informed, the right of access, the right to rectification, the right to erasure (the right to be forgotten), the right to restrict processing (the right to object), the right to data portability, and the right to refuse.

Currently, GDPR is genuinely impacting everyone's lives. The most visible effect is the frequent pop-up notifications on websites asking for consent to collect data, as required by transparency regulations.

The EU's GDPR has global influence, giving users absolute control over their personal data and forcing the world to address privacy issues while developing new technologies. Countries worldwide have begun enacting their own data protection laws.

Regarding privacy protection, everything has just begun.

The EU recently launched a new strategy called "Shaping Europe’s Digital Future," aiming to become a global leader in AI development by introducing a series of regulations on AI, privacy, and security. This strategy is also seen as a response to the rise of AI in the U.S. and China.

It is foreseeable that AI privacy security and regulation will gradually become a key topic. As Margrethe Vestager, Vice President of the European Commission, stated:

"AI is neither good nor bad in itself—it all depends on why and how people use it. Let’s do our best to control the risks AI may pose to our values—no harm, no discrimination."

Privacy protection has become an unavoidable "hurdle" in AI development—a challenge for AI technology and an opportunity for its healthy growth.

It can be said that the introduction of various privacy protection regulations is an inevitable future trend, which will significantly increase the compliance costs for companies in data collection, usage, and circulation. It may also lead to data silos within or between companies, limiting their ability to derive value from data. Therefore, the practical application of privacy-preserving AI technologies has become the most urgent goal in the AI field.

Privacy-preserving AI primarily combines technologies such as data encryption, distributed computing, edge computing, and machine learning to protect data security. Recently popular methods include Differential Privacy and Federated Learning (also known as collaborative learning or shared learning).

Privacy protection does not mean completely avoiding data collection but rather using technical means to prevent the leakage of personal privacy data.

Differential Privacy is a mathematical technique. For example, if we analyze a dataset and compute its statistics (e.g., mean, variance, median, mode), and if the output does not reveal whether any individual's data was included in the original dataset, the algorithm is considered differentially private.

A simple example: Suppose your department uses a spreadsheet every month to record everyone’s salary. Only the creator can view the spreadsheet; others can only query the total salary via a function S.

If you transfer to another department one month, others can deduce your salary by comparing the previous month’s spreadsheet A with the current month’s spreadsheet B—simply by calculating S(A) minus S(B).

Spreadsheet B is called an adjacent dataset to A, differing by only one record. Differential Privacy ensures that queries on adjacent datasets yield similar results, making it impossible to infer individual information. The degree of similarity reflects the strength of privacy protection.

Apple and Facebook already use this method to collect aggregated data without identifying specific users. MIT Technology Review listed Differential Privacy as one of the top 10 breakthrough technologies of 2020.

Federated Learning employs distributed machine learning methods and has gained popularity in recent years. This technology assumes that user data is not stored on centralized servers but remains private and confidential, stored only on individual edge devices like smartphones.

Thus, compared to traditional machine learning, Federated Learning fundamentally enhances user privacy. Instead of relying on data collected from user devices for training, it trains AI models locally on user devices and transmits the learned parameters to a global model—without the user data ever leaving the device.

As seen from the number of papers submitted to arXiv (a platform for preprint papers) in recent years, the technology is developing rapidly:

Since last year, the two most popular machine learning frameworks globally, TensorFlow and PyTorch, have added Federated Learning solutions to protect privacy.

The concept of Federated Learning was first introduced by Google in 2017. Last year, Google released the TensorFlow Federated (TFF) framework to simplify Federated Learning using TensorFlow.

As shown below, a learning model built on the TFF framework performs localized training on multiple devices (e.g., Phone A), updates and aggregates weights (Step B), and then updates the improved global model (Model C), which is reapplied to devices to enhance algorithm performance.

To advance privacy-preserving machine learning, Facebook's renowned deep learning framework PyTorch and OpenMined announced plans last year to develop a joint platform to accelerate research in privacy-preserving technologies.

OpenMined is an open-source community focused on researching, developing, and enhancing secure, privacy-preserving AI tools. OpenMined released PySyft, the first open-source federated learning framework designed for building secure and privacy-preserving systems.

PySyft has gained significant popularity, boasting 5.2k stars on GitHub. It currently supports major deep learning frameworks (PyTorch, TensorFlow) and implements federated learning, differential privacy, and cryptographic computations (such as multi-party computation and homomorphic encryption) to decouple private data from model training.

In China, major AI players have already begun deploying privacy-preserving technologies, particularly in the financial sector. Due to strict regulations, financial institutions face high demands for data privacy. On one hand, they encounter technical challenges in protecting sensitive data; on the other, the isolation of financial data creates 'data silos,' preventing institutions from unlocking the full value of their data.

Several Chinese financial institutions and fintech companies have started using federated learning in customer acquisition, credit approval, and risk management to address compliance issues related to data privacy and overcome data silos, thereby maximizing the value of financial data.

Currently, China's regulatory framework for privacy protection remains underdeveloped, and awareness of privacy protection among individuals and businesses is still relatively low. However, as global attention to privacy protection grows and privacy-preserving AI technologies advance, I believe AI will ultimately evolve in a more positive direction. Through the efforts of scientists, we hope the 'black box' of AI will not become a Pandora's box.