How 'Deep' is Deep Learning in Machines?

baoshi.rao

Every time deep learning is mentioned, those unfamiliar with it tend to imagine it as something distant and unattainable. This article aims to demystify deep learning by using simplified models and relatable analogies to explain its concepts.

In 1980, Kunihiko Fukushima proposed the perceptron, but due to high computational costs and the biological connotations of the term 'neural network,' investors rejected it, preventing practical applications.

After a long period of dormancy and quiet development, in 2006, Geoffrey Hinton and others published an article on Deep Belief Networks in the journal Science. To better attract attention, they introduced the term 'deep,' ushering in a new era of deep neural networks.

At its core, deep learning is the application of neural networks in machine learning, a subset of machine learning, as shown below:

In our daily lives, solving math problems involves knowing the formula (rules) and applying it to the problem (data) to find the answer. Machine learning, however, works the opposite way: it is given data and answers and must learn the rules.

Machine learning, especially deep learning, involves relatively little mathematical theory and is more engineering-oriented. It is a hands-on science where ideas are proven through practice rather than theoretical derivation.

Categories of Machine Learning:
We can use the analogy of raising a child to understand machine learning.

Unsupervised Learning: The child is left to explore the world on their own, developing values like propriety and integrity. For example, if a child is placed with cats and dogs, they may eventually distinguish between them but won’t know the labels 'cat' or 'dog' without guidance.

Due to the multifaceted and complex nature of things, unsupervised learning based on limited features can produce results that don’t align with human expectations, such as 'long-haired people and long-haired dogs.' This method groups similar things together (clustering), but the results can be skewed and may not meet expectations.

Supervised Learning: In contrast, the child is taught everything by strict parents, with clear answers provided for every question. This leads to excellent performance (supervised learning generally has higher accuracy than unsupervised learning). However, when faced with slightly unfamiliar problems, the child may fail—this is called 'overfitting.' Additionally, labeling data requires significant manual effort, so this method is often used for datasets with clear, limited outcomes.
Semi-Supervised Learning: This is a middle ground, where the child is taught basic principles early on and later allowed to explore the world based on those foundations. Similarly, semi-supervised learning uses a small labeled dataset for initial learning and a larger unlabeled dataset for self-learning.

Recall high school biology: the process of neural reflex involves receptors (muscles) → reflex arc → central nervous system. We recognize a person through features like clothing, hair, face, eyes, and eyebrows, each judged by a neuron. Deep learning works by learning through layers of features to identify objects.

The 'depth' in deep learning refers to its gradual, layered approach, which helps filter useful datasets early and yields more accurate results.

As mentioned earlier, deep learning is the application of neural networks in machine learning, technically defined as a multi-level method for learning data representations. Deep networks can be seen as multi-stage information distillation: information passes through successive filters, becoming increasingly refined (more useful for the task), i.e., with higher weights.

Understanding Weights:
Suppose we judge gender based on clothing, divided into four categories: girls wearing pants, girls wearing skirts, boys wearing pants, and boys wearing skirts. Experience tells us that skirts are more likely worn by girls, so resources shouldn’t be evenly distributed—weights differ.

A deep network might first filter 'clothing' as one layer. If the filter identifies a skirt, the sample is already 90% likely to be female.

Humans adjust based on feedback, and so does deep learning. The function measuring the error between actual and expected outcomes is called the loss function. The loss value is used to optimize weights, achieving a local optimum.

Common Deep Learning Models:
Different models have strengths: some excel at classification, others at sequential or grid-structured data. Developers choose based on needs.

For those interested, resources on loss functions (e.g., mean squared error), weight optimization (e.g., gradient descent), and deep learning types are available on platforms like Zhihu or in books.

Application Example: Identifying Vulgar Content
Low-quality, clickbait content harms user experience. Deep learning can help filter such content.

Step 1: Define vulgarity and set standards (with case examples).
Step 2: Provide seed words (scored keywords), features, rules, and training data (titles, summaries, text).
Step 3: Train the model (e.g., using CNN) and adjust parameters.
Step 4: Evaluate performance on validation sets.
Vulgar content detection is a binary classification task (YES/NO). Key metrics: accuracy, precision, recall.

Since vulgar samples are rare, precision and recall matter more than accuracy. Classic examples include disease or pregnancy tests.

Step 5: Deploy or refine based on bad cases.
If the model meets expectations (balanced precision and recall, measured by F Score), deploy it. Otherwise, optimize further.

F Score: Ranges [0,1], higher is better.

F1 Score: Equal weight for precision and recall.
Fβ Score: Unequal weight (β > 0).

For details on F Score, see: https://stats.stackexchange.com/questions/221997/why-f-beta-score-define-beta-like-that/221999#221999

Reference: Deep Learning with Python by François Chollet, translated by Zhang Liang.