Identical! Stanford AI Team Accused of Plagiarizing Chinese Domestic Large Model: Directly Deleted Repository and Fled

baoshi.rao

Recently, the Stanford AI team was exposed for plagiarism, specifically copying the achievements of a Chinese domestic large model. The model structure and code were almost identical.

Stanford's Llama3-V project, released on May 29, claimed that for just $500, it could train a multimodal large model surpassing the performance of GPT-4V, Gemini Ultra, Claude Opus, and others.

However, a netizen discovered that the model structure and code of Llama3-V were nearly identical to those of MiniCPM-Llama3-V 2.5, developed by the Tsinghua-affiliated star startup FaceWall Intelligence, with only variable names changed.

Faced with accusations of plagiarism, the Stanford team chose to delete the repository and flee. Currently, the related projects on GitHub and HuggingFace display 404 errors and are inaccessible.

FaceWall Intelligence's MiniCPM-Llama3-V 2.5 project boasts unique features, such as recognizing Tsinghua Bamboo Slips, a rare form of ancient Chinese script.

Llama3-V demonstrates strikingly similar behavior to MiniCPM-Llama3-V 2.5 in unreleased experimental features, which were trained on MiniCPM-Llama3-V 2.5's internal data.

Facing plagiarism allegations, the Stanford team initially claimed that their work predated MiniCPM by FaceWall Intelligence and only used their tokenizer.

However, their subsequent statement on Medium was deleted, and their latest response was also retracted.

Li Dahai, CEO of FaceWall Intelligence, issued an official response to the incident, demanding a formal explanation from the Llama3-V authors. The matter continues to unfold.