When AI Begins to Have 'Subconsciousness', Do We Still Have Privacy?

baoshi.rao

Will AI affect people's privacy? This article analyzes and discusses the issue from algorithmic perspectives for reference.

It's been a while since we last talked about new research in algorithms. The reason is certainly not a lack of news in academia, as top conferences are flooded with papers. However, broadly speaking, it’s hard to deny that theoretical research in deep learning has been stuck in a bottleneck. Meanwhile, the integration of deep learning with traditional industries has led to an unprecedented explosion in AI applications.

But as Stanford Professor Fei-Fei Li pointed out, (deep learning) still has a long way to go in terms of intelligence, human resources, or hardware. Learning is endless, but for a long time, there have been few significant breakthroughs in the field of algorithms. This has led to inherent shortcomings in model deployment and kept AI under constant scrutiny.

For example, the privacy issues arising from the proliferation of AI not only require self-regulation by tech companies but also necessitate optimization and refinement of algorithms.

How will AI impact people's privacy? A single article may not answer this complex question, but we hope to start the discussion now.

Before delving into privacy issues, let’s revisit the oft-discussed LSTM model. Its function has been explained many times—simply put, it introduces the concept of memory into neural networks, enabling models to retain long-sequence information and make predictions.

AI’s remarkable abilities, such as writing coherent articles or engaging in smooth multi-turn conversations with humans, are built on this foundation.

For a long time afterward, scientists expanded and supplemented neural network memory. For instance, the introduction of attention mechanisms allowed LSTM networks to track information over long periods with precision. External memory was also used to enhance sequential generation models and improve convolutional network performance.

Overall, the enhancement of memory capabilities has, on one hand, endowed neural networks with the ability to perform complex relational reasoning, significantly boosting their intelligence. On the application side, experiences with intelligent systems like writing, translation, and customer service have also seen major upgrades.

To some extent, memory marks the beginning of AI shedding its 'artificial stupidity' label.

However, having memory also brings two issues: First, neural networks must learn to forget to free up storage space and retain only important information. For example, when a chapter in a novel ends, the model should reset related information and keep only the relevant outcomes.

Second, the 'subconsciousness' of neural networks must be guarded against. Simply put, after training on sensitive user data, could machine learning models inadvertently reveal that sensitive information when released to the public?

In this era of ubiquitous digital data collection, does this mean privacy risks are escalating?

Researchers at Berkeley conducted a series of experiments on this question, and the answer might shock many—your data may already be etched in AI’s 'mind'.

To understand neural networks' 'unintentional memorization,' we must first introduce the concept of overfitting.

In deep learning, when a model performs well on training data but fails to achieve the same accuracy or error rate on external datasets, overfitting occurs. This discrepancy between lab and real-world performance is mainly due to noise in the training data or insufficient data volume.

As a common side effect of deep neural network training, overfitting is a global phenomenon affecting the entire dataset. To test whether a neural network secretly 'remembers' sensitive information from training data, we must examine local details, such as whether the model has a particular affinity for specific examples (e.g., credit card numbers, account passwords).

To explore this 'unintentional memorization,' Berkeley researchers conducted a three-phase experiment:

First, they prevented overfitting by using gradient descent on the training data and minimizing the neural network’s loss, ensuring the final model achieved near-100% accuracy on the training data.

Next, they tasked the machine with understanding the underlying structure of language. This is typically done by training a classifier on a sequence of words or characters to predict the next token based on preceding context tokens.

Finally, they ran a controlled experiment. They inserted a random number, '281265017,' as a security marker into the standard Penn Treebank (PTB) dataset. They then trained a small language model on this augmented dataset to predict the next character given the preceding context.

Theoretically, the model’s size was much smaller than the dataset, so it couldn’t memorize all the training data. But could it remember that specific string?

The answer was YES.

When researchers input the prefix 'The random number is 2812,' the model happily and correctly predicted the remaining suffix: '65017.'

More surprisingly, when the prefix was changed to 'The random number is,' the model didn’t output '281265017.' Researchers calculated the probabilities of all possible 9-digit suffixes and found the inserted security marker was more likely to be selected than others.

From this, we can cautiously conclude that deep neural networks do unintentionally memorize sensitive data fed to them during training.

Today, AI has become a cross-scenario, cross-industry societal movement. From recommendation systems and medical diagnostics to citywide surveillance cameras, more user data—potentially containing sensitive information—is being collected to train algorithmic models.

Previously, developers often anonymized sensitive columns in datasets. But this doesn’t guarantee absolute safety, as malicious actors could still reverse-engineer the original data using methods like table lookup.

Since sensitive data in models is unavoidable, measuring how much a model memorizes its training data is essential for assessing future algorithmic safety.

Three key questions arise:

Berkeley’s research found that 'unintentional memorization' begins as soon as the model is first trained on the inserted security marker. However, test data showed that the peak exposure rate of unintentionally memorized data often occurs before overfitting begins and then declines.

Thus, we can conclude that while 'unintentional memorization' poses risks, it isn’t more dangerous than overfitting.

Of course, 'not more dangerous' doesn’t mean it’s harmless.

In fact, researchers discovered that using an improved search algorithm, attackers could extract 16-digit credit card numbers and 8-digit passwords with just tens of thousands of queries. The attack details have been made public.

This means if sensitive data is inserted into training data and the model is released, the risk of exposure is high—even if the model shows no signs of overfitting. Worse, such leaks may go unnoticed, significantly increasing security risks.

Currently, the 'security markers' inserted by researchers are more likely to be exposed than random data, following a normal distribution.

This implies that not all data in the model faces equal exposure risks—deliberately inserted data is more vulnerable.

Additionally, extracting sequences from a model’s 'unintentional memory' isn’t easy. It requires brute-force methods, i.e., unlimited computing power.

For example, enumerating all 9-digit social security numbers takes just a few GPUs a few hours, while all 16-digit credit card numbers would require thousands of GPU-years.

At present, as long as we quantify this 'unintentional memory,' we can control the security of sensitive training data within a certain range. This means understanding how much training data a model stores and how much is over-memorized, thereby training a model that leads to an optimal solution. This helps people assess the sensitivity of the data and the likelihood of the model leaking information.

In the past, discussions about AI industrialization mostly focused on macro-level issues, such as how to eliminate algorithmic bias, avoid the black-box nature of complex neural networks, and how to 'ground' technological dividends in practical applications.

Now, with the gradual completion of infrastructure upgrades and the popularization of concepts, the future that the industry eagerly anticipates may lie in AI's transition toward refined, micro-level iterative upgrades.