OpenAI Embeds C2PA Metadata in ChatGPT and API-Generated Images

baoshi.rao

In the early hours of February 7th, OpenAI announced on social media that it has embedded C2PA metadata in images generated by ChatGPT and its API to prevent illegal misuse.

C2PA is an open data standard that allows publishers, businesses, developers, and others to track and verify the origin, authenticity, and integrity of digital content such as images, videos, and documents through metadata.

Recently, AI-generated inappropriate images of celebrities like Taylor Swift spread wildly online, causing significant impact. OpenAI hopes this measure will reduce abuse and improve people's ability to identify AI-generated images. Currently, only images are embedded with C2PA metadata, while text and audio generated via ChatGPT or APIs remain unaffected. OpenAI will also roll out this feature to all app users.

In addition to embedding C2PA metadata, OpenAI has introduced a verification website for identifying AI-generated content. Users can simply drag and drop JPG, MP3, MP4, or PDF files for quick authentication.

AI image verification website: https://contentcredentials.org/verify However, the user experience is not very good. Even when a clearly DALL·E3-generated image was uploaded, it couldn't be recognized and matched. This is probably due to insufficient training data.

AI Image Embedding C2PA Metadata Example

When we generate an image using ChatGPT, the system displays detailed metadata of the image, including the system that generated it, publication time, description, issuer, etc. If the image is generated using DALL-E3's API, it can also be accurately displayed. Impact of C2PA Metadata on Image Size

OpenAI states that embedding C2PA metadata affects the size of generated images.

Through API, PNG 3.1MB → 3.2MB (3% increase)

Through API, WebP 287k → 302k (5% increase) Through ChatGPT, WebP size increased from 287k to 381k (a 32% increase).

Although the image size has changed, there will be minimal impact on quality, inference, and interaction latency.

What is C2PA Metadata

C2PA is an industry alliance jointly initiated by Adobe, Microsoft, Sony, Intel, ARM and other industry leaders, primarily aimed at combating online misinformation and improving the credibility of digital content. C2PA assists users in verifying the authenticity of content by establishing standardized metadata (which records various detailed information from content creation to distribution).

C2PA's metadata primarily includes: Proof of content origin, creating a trustworthy "birth certificate" for content through technical means, explaining how, when, and by whom the content was created;

Records of editing and processing history, documenting all modification history after content creation, including edits, conversions, or other processing steps, to ensure traceability and transparency; Support for multiple media types, while initially focusing primarily on images and videos, the goal is to support as many digital content types as possible, including audio and documents.

Protect privacy and security, when designing standards, considerations are made for the protection of personal privacy and data security, ensuring that sensitive information is not exposed due to the implementation of content traceability.

However, OpenAI points out that even if AI-generated images are embedded with C2PA metadata, it cannot fundamentally solve the challenge of identifying the true source of images. Because this metadata can be easily removed. For instance, most social media platforms currently strip metadata from uploaded images, and operations like taking screenshots can also eliminate it. However, this can enhance users' AI security awareness and provide a certain level of security protection.