Don't Understand the Training Principles of AI Large Models? No Wonder Your Prompts Always 'Fail'

baoshi.rao

The effectiveness of prompts is underpinned by a comprehensive understanding of the model. This article, from a product manager's perspective, breaks down the key principles of large model training to help you establish a foundational mindset for prompt design, turning 'failures' into 'successes'.

An AI large model is like a super-knowledgeable friend. But this friend has a quirk—they need you to be very clear in your instructions to give you the best response.

If you casually say, "Write something for me," they might produce something completely unrelated to what you want. But if you say, "Help me write a formal job application email for a software engineer position," they can give you a decent reply.

In reality, it’s not entirely the large model’s fault for being 'dumb,' because the model isn’t actually 'thinking.' Instead, it relies on a 'word-guessing instinct' trained on massive amounts of text, predicting one word after another.

The prompts you write set the starting point for this 'guessing.' Understanding this underlying logic allows you to craft prompts that help the AI grasp your needs, so you won’t have to sigh at the screen and say, "Why doesn’t it understand me again?"

For example, if you give it the beginning of a sentence: "I like," it will predict the next most likely word based on its training. It might be "code," "models," or something else. It calculates the probability of each word appearing and selects the most suitable one. Below is a schematic diagram I created.

This process of predicting the next word is called 'autoregressive generation.' Large models use an autoregressive mechanism to generate sentences word by word.

For each time step t, the decoder uses previously generated words to predict the t-th target word. Then, at the next time step t+1, the large model combines the newly predicted t-th target word to predict the t+1-th target word. It’s like writing an essay—you write the first word, then the second based on the previous content, and so on.

So, how does the large model rely on this autoregressive mechanism to become so knowledgeable?

The capabilities of large language models are enhanced through two training phases: pretraining and fine-tuning.

Phase 1: Pretraining

Pretraining forms the foundation of a large model’s capabilities. The input consists of large-scale unannotated text fragments, such as books, web pages, and research papers, and the output is the prediction of the "next word."

For example, the model’s input might be a continuous text segment: "Artificial intelligence refers to enabling computers to simulate human intelligence." The training goal is for the model to predict the most likely word to follow "intelligence" (e.g., "technology"). Through training on vast amounts of data, the model learns:

Human language rules like grammar and sentence structure: e.g., "的" (a possessive particle in Chinese) is usually followed by a noun;
Basic knowledge of concepts and logic: e.g., "artificial intelligence" is often associated with "machine learning" and "deep learning";
Task instructions: e.g., when seeing "Summary:" the subsequent text tends to concisely summarize the preceding content; when seeing "Question:" it should be followed by an answer.

Through this learning, the model masters language patterns, relationships between concepts, and how to express ideas in different contexts.

Phase 2: Fine-Tuning

This is like training a person in a specific skill. For example, fine-tuning a large model to specialize in literature or fiction improves its performance in specific writing tasks.

To make the model better suited for specialized tasks like professional Q&A or content creation, we can fine-tune it. The common input here consists of potential user questions and expert-standard answers. For example, in an e-commerce customer service scenario:

[
  {
    "question": "What should I do if the received product is damaged?",
    "answer": "We sincerely apologize for the inconvenience! Please take clear photos of the damaged area and the shipping label, then contact our online客服 to upload the images. We will arrange a replacement or refund for you, with shipping costs covered by us."
  },
  {
    "question": "Can I exchange or return clothes that don’t fit?",
    "answer": "Hello, we support 7-day no-reason returns! Please ensure the product tags are intact and the item hasn’t been worn or washed. You can apply for a return or exchange on the order page, and the system will guide you through the process. The return shipping fee will be automatically refunded after the product is inspected."
  },
  {
    "question": "How do I use a coupon?",
    "answer": "During checkout, click the 'Coupon' option on the payment page and select the coupon you’d like to use to deduct the corresponding amount. Note that each coupon has usage conditions and an expiration date."
  }
]

Such data helps the large model learn professional phrasing, problem-solving processes, and communication skills for customer service scenarios. After fine-tuning, the model can handle user inquiries more naturally, reducing the workload for human客服.

Now you understand that large models essentially guess what to write next based on preceding text. But here’s the problem: it has learned too much! It can write poetry, code, answer questions, create stories… Without your guidance, it has no idea which type of response you want. It’s like walking into a massive library without an index or guide—you’d struggle to find the book you need.

Prompts act as your 'library index,' clearly and specifically describing user needs to guide the large model in activating the right knowledge and generation patterns. The power of prompts manifests in three aspects:

1. Defining Task Objectives and Selecting Knowledge Sources

Prompts clarify the task type—such as writing a poem or translating a sentence—so the model knows which human knowledge to draw on. For example, given the input "moon," the model might produce:

Scientific knowledge: "The moon is Earth’s satellite…"
Poetry: "The moon is like a curved sickle…"
Story: "The moon is home to Chang’e…"

But if you say, "Explain the formation process of the moon from a scientific perspective," the AI knows to draw on astrophysics knowledge rather than literary creativity.

2. Providing Constraints to Narrow the Generation Scope

Prompts can further constrain the model’s output by specifying details like format, length, or style. For example:

Input: "Write a short essay on environmental protection, divided into 3 paragraphs, each under 50 words"—clarifies structure and length.
Input: "Explain quantum mechanics in a humorous tone"—clarifies style.

3. Aligning with the Model’s Prior Knowledge to Reduce Errors

Large models are highly sensitive to the training data from pretraining and fine-tuning. Good prompts mimic the text structures the model was trained on, making it easier for the model to retrieve knowledge and organize language, thereby improving response quality.

For instance, during pretraining, large models encounter many list-style summaries like "1. … 2. … 3. …" Thus, a prompt like "Summarize the following content in 3 points: XXX" is more effective than "Casually summarize XXX," because the model is more familiar with the former’s structure and produces more organized output.

Understanding how large models work helps you see why prompts are so crucial. They aren’t some mysterious technology but a method for effective communication. Remember these key points:

Large models generate responses based on the prompts you provide.
Clear, specific instructions help the model better understand your needs.
Good prompts activate the model’s relevant knowledge and capabilities.
With practice and adjustments, you’ll get better at conversing with large models.

In this AI era, writing good prompts is as essential as using a search engine. It not only helps you complete tasks more efficiently but also unlocks the full potential of AI as a powerful tool.