Chinese Prodigy Girl Establishes AI Company in Silicon Valley, Valuation Exceeds $1 Billion in Six Months

baoshi.rao

It was a serendipitous moment. At the end of November last year, ChatGPT burst onto the scene, igniting a global frenzy for artificial intelligence and amassing over 100 million active users at an unprecedented speed. A year later, at the same time, another 'AI star' emerged in Silicon Valley, skyrocketing to fame overnight.

On November 29, AI video generation company Pika Labs announced its latest video generation model, Pika 1.0, and launched a new website. Pika 1.0 can generate and edit 3D animations, anime, cartoons, and films. With almost no barriers, users only need to input a sentence to generate videos in various desired styles. Additionally, users can utilize Pika for editing needs such as canvas expansion, local modifications, and video duration extension.

In the promotional video, Pika 1.0 demonstrated impressive semantic understanding capabilities. For example, when given the keywords 'Elon Musk wearing a spacesuit, 3D animation,' a cartoon version of Musk in a spacesuit appeared, complete with a SpaceX rocket in the background. The clarity and coherence of the text-to-video output far surpass other AI video generation products currently on the market.

The founding team of Pika Labs is equally legendary—currently consisting of only four people. Co-founder Guo Wenjing, a 'post-95s' girl from Hangzhou, was once hailed as a 'prodigy.' Guo Wenjing and another co-founder, Meng Chenlin, are both Chinese female Ph.D. candidates at Stanford AI Lab. In April this year, the two dropped out of Stanford to start their entrepreneurial journey.

Guo Wenjing's father is Guo Huaqiang, the actual controller of the A-share listed company Xinyada. After Pika's rise to fame, Xinyada saw two consecutive trading limit-ups on November 30 and December 1, humorously dubbed 'father benefiting from daughter' and 'daughter concept stock.'

Subsequently, Sunyard issued an announcement acknowledging the father-daughter relationship between Guo Wenjing and Guo Huaqiang, but stated, "Apart from this relationship, the company has no other connections with Pika. As of now, Guo Wenjing does not hold any position in the company, Sunyard has not invested in Pika, nor has it engaged in any business dealings with Pika."

Currently, the official web version of Pika 1.0 requires queuing for access, and no user has yet conducted an actual review. This has led some to question whether Pika's overnight fame is a marketing scam. After all, before November, Pika was virtually unknown.

In reality, the first version of Pika underwent public testing on Discord in late April this year. By July, it officially launched its server on Discord and gained 500,000 users within a few months. With a lean team and leveraging Discord's platform, Pika minimized development efforts.

Source: Screenshot from Pika's official website

Initially, Pika only supported text-to-video generation but gradually added features like image-to-video conversion, camera control, and embedding text and logos into videos. Many functions showcased in Pika 1.0's promotional video are not yet available on the Discord version and will only be verifiable once the web version opens for testing.

This isn't Pika's first public appearance. In early November, during the The Wandering Earth 3 press conference, the film's industrialization lab G!Lab announced its establishment. Director Guo Fan introduced several strategic tech partners including SenseTime, Xiaomi, Huawei, and notably Pika Labs.

Within just 6 months of founding, Pika has completed three funding rounds totaling $55 million, reaching a valuation exceeding 1 billion RMB. Its investor lineup is stellar—featuring OpenAI board member Adam D'Angelo, former Tesla AI director Andrej Karpathy, ex-GitHub CEO Nat Friedman, YC partner Daniel Gross, and prominent Silicon Valley investor Elad Gil.

01 All-Star Team

Pika's founding team is composed of prodigies.

Guo Wenjing gained fame in 2015 as Zhejiang's first student admitted early to Harvard undergrad, spotlighted by CCTV as a "genius girl." During high school, she won first prize in the National Olympiad in Informatics (Zhejiang division), followed by two math Olympiad championships. Later, invited by MIT, she ranked 2nd in a North American programming contest, outperforming teams from Harvard, Stanford, and Carnegie Mellon.

After entering Harvard, Guo Wenjing balanced her studies with internships at Meta AI Research, Microsoft, Google Brain, and Epic Games. By her sophomore year, she became the youngest full-time employee at Meta AI Research and won several international software development awards. After earning her bachelor's degree in mathematics and a master's in computer science, Guo pursued a Ph.D. at Stanford University.

Co-founder Meng Chenlin was Guo Wenjing's classmate at Stanford. Over the past three years, Meng has published multiple research papers, including the denoising diffusion implicit model (DDIM), which has become the default method for content generation and is widely used by OpenAI's DALLE-2, Google's Imagen, and Stability AI's Stable Diffusion.

Another founder, Chen Siyu, was revealed to be Guo Wenjing's classmate at Hangzhou No. 2 High School. Chen was a member of both the informatics and physics national training teams, later admitted to Peking University, and one of the first members of the Turing Class. The fourth employee, Matan Cohen-Grumi, has extensive experience in the creative field.

In an interview, Guo Wenjing stated that Pika would continue its lightweight development approach. With rapid user growth, the team plans to expand to 20 members by 2024.

The idea for Pika originated from an unrecognized competition. In 2022, Guo and several Ph.D. classmates decided to use generative AI to create a movie during winter break for Runway's first "AI Film Festival" competition. Confident about winning, they prepared diligently but ultimately failed to qualify.

During the preparation, Guo found existing AI video tools cumbersome. She spent hours using Runway and Adobe Photoshop with minimal results. This sparked her entrepreneurial idea—why not create a more user-friendly AI video generator for the average person?

After the idea struck, Guo Wenjing immediately took action. In April this year, Guo Wenjing and Meng Chenlin dropped out of Stanford together to develop Pika.

Before venturing into video generation, Guo considered entering the gaming industry due to its easier commercialization. During her Ph.D. at Stanford, she interned at Epic Games to understand the industry's pain points. However, she later found the gaming sector too competitive, while AI video generation was still a blue ocean with more opportunities. The current entrepreneurial direction "can last at least ten years."

02

The "GPT Moment" for Video Generation

In the AI boom sparked by ChatGPT this year, chatbots based on large language models became the hottest entrepreneurial direction. Among content-generating AI applications, image generation is the primary scenario, followed by writing tools and video generation tools.

Compared to language models, AI-generated video is a completely different type of model. It shares similarities with AI-generated image models but is more challenging.

Guo Wenjing mentioned in media interviews that videos present many unique issues compared to images, such as ensuring smoothness and motion. Videos are larger than images, requiring more GPU memory. Video generation involves logical considerations—whether to generate frame by frame or all at once. Many current models generate videos all at once, resulting in very short clips. But if generating frame by frame, how should it be done? These are new technical challenges not faced in image generation.

Meng Chenlin added that each frame of a video is an image, making it much more challenging than generating a single image. Not only does each frame need to be of high quality, but there must also be coherence between adjacent frames. Ensuring consistency across all frames in a long video is a highly complex problem.

During training, processing video data involves handling multiple images, and the model must adapt to this. For example, transferring 100 frames to a GPU poses a significant challenge. During inference, generating a large number of frames results in slower speeds compared to single-image generation and increases computational costs.

Moreover, controlling video generation is more difficult because the model must generate what happens in each frame without requiring users to provide detailed descriptions for every frame. Video generation also involves more spatial dimensions, further complicating the problem. Additionally, the scarcity of high-quality training datasets for video generation on the internet adds to the difficulty.

Source: Pika official website screenshot

Currently, startups in the AI video generation field are accelerating. In November alone, besides the release of Pika 1.0, several companies launched new video generation tools.

On November 16, social media giant Meta released Emu Video, a tool capable of generating video clips based on text and image inputs. Almost simultaneously, ByteDance introduced the PixelDance model, which can generate videos containing complex scenes and actions through text descriptions + first frame guidance (image) + last frame guidance (image).

On November 21, Runway launched Motion Brush, which stands out by allowing users to turn any static image into a dynamic video with just a brush stroke. As one of the earliest pioneers in video generation, Runway has previously released Gen1 and Gen2 over the past year.

On November 24, AI startup Stable AI released its latest model, Stable Video Diffusion, which can generate videos from existing images. This model is an extension of their previously released Stable Diffusion text-to-image model.

Additionally, Adobe has made moves in the text-to-video space by acquiring AI video generation startup Rephrase.ai, which specializes in converting text into virtual avatar videos. Ashley Still, Adobe's Senior Vice President and General Manager, stated that Rephrase.ai's expertise in generative AI audio-visual technology and text-to-video tools will expand Adobe's generative video capabilities.

With Pika's successful application in text-to-video technology, the industry generally believes that listed companies in gaming and media sectors may be the first to benefit. Notably, several gaming and media companies have already successfully applied AIGC technology to video or game material creation.

Clearly, competition in the video domain is intensifying and may become the primary battleground for AI in the next phase. According to Meng Chenlin, the competition in the video field could resemble the landscape of language model competition—when a company releases a new model, they likely already have a more advanced version internally, leading other companies by one to two years. In the future, the video field will likely see one company leading by one to two years and charging ahead while others follow closely behind.

Pika's sudden rise to fame might signify that the 'GPT moment' for AI video generation is just around the corner.