Skip to content
  • Categories
  • Newsletter
  • Recent
  • AI Insights
  • Tags
  • Popular
  • World
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
  1. Home
  2. AI Insights
  3. How Are the Popular AI Digital Humans Actually Created?
uSpeedo.ai - AI marketing assistant
Try uSpeedo.ai — Boost your marketing

How Are the Popular AI Digital Humans Actually Created?

Scheduled Pinned Locked Moved AI Insights
techinteligencia-ar
1 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • baoshi.raoB Offline
    baoshi.raoB Offline
    baoshi.rao
    wrote on last edited by
    #1

    While the concept of the metaverse has somewhat faded, digital human technology related to the metaverse has found a significant place in short videos and live streaming, potentially even disrupting the current live-streaming e-commerce industry.

    In 2021, the virtual beauty influencer Liu Yexi, created by Chuangyi Technology, released her first video on Douyin (TikTok). This single video went viral, garnering over 3 million likes and gaining millions of followers, earning her the title of the 'phenomenon-level' virtual influencer of 2021. Since then, an increasing number of virtual idols or digital humans have appeared on short video and live-streaming platforms like Douyin, not only attracting massive traffic and followers but also enabling 24/7 live-streaming sales, replacing human hosts.

    Digital humans, also known as virtual humans, involve a wide range of underlying technologies, including 3D modeling, facial expression recognition, motion capture, real-time animation, real-time rendering engines, deep learning, and speech recognition. Different types of digital humans require different technologies, such as 3D cartoons, anime-style characters, or hyper-realistic avatars, and can be either half-body or full-body representations.

    The visual creation of digital humans primarily revolves around three key stages: character modeling, motion driving, and visual rendering.

    The first step in creating a digital human involves preliminary character design and modeling. 2D digital humans require concept art and design, while 3D digital humans rely on 3D modeling techniques to generate digital avatars, which can be based on existing IPs or real-life individuals.

    3D modeling currently mainly includes static scanning modeling and dynamic modeling, with camera array scanning for static reconstruction being the mainstream technology. Dynamic light field reconstruction can not only rebuild the geometric model of a character but also acquire dynamic character model data at once to reproduce the lighting effects of the human body from different viewing angles. This is a key technological direction currently being developed by major companies and research institutions worldwide.

    DSLR Camera Array Human Portrait Scanner

    After obtaining the image and model, the next step is to animate the model, which requires intelligent synthesis and motion capture technologies. Currently, intelligent synthesis technology mainly achieves lip-syncing for upper-body digital human avatars. For full-body digital humans, motion capture is essential. Motion capture technologies are divided into optical, inertial, and computer vision-based systems, with optical and inertial systems currently dominating the field.

    Noitom's MEMS Inertial Sensor-Based Motion Capture System

    Ultimately, digital humans are presented in video form, which involves visual rendering. Rendering technology refers to the process of calculating visual images of models under factors such as viewpoints, lighting, and motion trajectories. The main rendering technologies are offline rendering and real-time rendering. Offline rendering offers superior aesthetics and visual effects but lacks real-time control, commonly using engines like Maya and 3DMax, primarily for the film industry. Real-time rendering allows for real-time control of digital human movements, with commonly used engines including Unreal and Unity.

    The previous sections covered visual technologies for digital humans, but that's not enough—digital humans also need to speak. The simplest method is post-production dubbing, which is time-consuming and labor-intensive. However, with the introduction of text-to-speech (TTS), lip-syncing, and speech recognition technologies, not only can automatic dubbing be achieved, but digital humans' lip movements can also be synchronized with the audio, making the video appear almost indistinguishable from a real human host.

    TTS and speech recognition technologies are already very mature and widely used in daily life, with such features available on smartphones. Lip-syncing technology is relatively niche but highly suitable for digital human applications.

    In the field of digital humans, lip synchronization (Lip Sync) is a crucial aspect that directly affects the realism of digital characters. Existing digital human lip sync technologies include Wav2Lip, DeepFake, PaddleGAN, Audio2Face, FaceSwap, LSTM, Audio2Lip, Lip Generation, and Talking Head Synthesis.

    Audio2Face combines audio signals with digital human facial motion information.

    Silicon Valley company TwinSync has proposed a novel training-free zcm model that eliminates the need for tedious model training. Simply uploading source videos and audio files can yield high-quality lip sync results. Platforms like D-ID have now implemented similar functionalities where uploading audio files automatically achieves lip synchronization.

    Beyond speaking, digital humans must also understand and respond to others. The advent of ChatGPT technology has simplified this process. Programs can extract live-stream comments in real-time, obtain responses via ChatGPT's API, convert them to speech through TTS, and drive virtual human lip sync for broadcasting.

    AI-powered digital human live streaming.

    Applying AI technology can significantly reduce live streaming operational costs. For instance, while live streaming scripts were previously manually written, they are now almost entirely generated by AI models. Inputting the required content into a large model instantly produces a script.

    Previously, digital human live streaming required a moderator to monitor content and answer questions. Now, pre-configuring responses in the backend allows digital hosts to intelligently answer audience-triggered queries.

    Current digital human platforms include Tencent Zhiying, WarpEngine, and HeyGen alongside D-ID.

    HeyGen

    The digital human effects on the HenGen platform are excellent, offering features such as digital human editing, text editing, and audio-video editing.

    HenGen Editing Features

    With the continuous improvement of digital human platform technology, the effects of digital humans will become increasingly realistic. Combined with AI technology, more and more virtual digital humans will be applied in fields such as e-commerce, education, personal or corporate promotion, healthcare, and customer service.

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Newsletter
    • Recent
    • AI Insights
    • Tags
    • Popular
    • World
    • Groups