Ghostbuster: A Highly Accurate AI-Generated Text Detection Tool

baoshi.rao

Large language models like ChatGPT, with their exceptional writing capabilities, have raised concerns. Students are increasingly using these models to complete assignments, prompting some schools to take measures to ban ChatGPT. Additionally, these models have a tendency to generate text with factual errors, so cautious readers may wonder whether generative AI tools were used to produce certain news articles or other sources before placing trust in them.

To address this issue, researchers have proposed Ghostbuster, an advanced AI-generated text detection method. This approach evaluates the probability of each token in a document being generated under multiple weaker language models, then combines these probability functions as input for the final classifier. Ghostbuster doesn't require knowledge of the specific model used to generate the document or the document's generation probability under that particular model. This makes Ghostbuster particularly suitable for detecting text generated by unknown or black-box models, such as popular commercial models like ChatGPT and Claude, where probabilities are unavailable. The researchers emphasized ensuring Ghostbuster's strong generalization performance, so they evaluated it across different domains (using newly collected datasets of essays, news, and stories), language models, and prompts.

Why choose this method?

Current Challenges in AI-Generated Text Detection Systems

Many AI-generated text detection systems struggle with classifying different types of text, such as various writing styles or outputs from different text generation models or prompts. Simple models using perplexity often fail to capture more complex features, performing particularly poorly in new writing domains. In contrast, classifiers based on large language models (like RoBERTa) can easily capture complex features but are prone to overfitting training data and exhibit poor generalization. Ghostbuster's approach strikes a balance between these two, capable of capturing complex features while being less susceptible to overfitting.

How It Works

Ghostbuster employs a three-stage training process: probability calculation, feature selection, and classifier training.

Calculating probabilities: By computing the probability of each word in a document under a series of weaker language models (a unigram model, a trigram model, and two non-instruction-tuned GPT-3 models, Ada and DaVinci), each document is transformed into a series of vectors.

Feature selection: Use a structured search process to select features, which works by (1) defining a set of vector and scalar operations that combine probabilities, and (2) using forward feature selection to search for useful combinations of these operations, iteratively adding the best remaining features.

Classifier Training: A linear classifier was trained based on the optimal probability-based features and some additional manually selected features.

Ghostbuster accuracy performance

When trained and tested in the same domain, Ghostbuster achieved an F1 score of 99.0 across all three datasets, surpassing GPTZero by 5.9 F1 and DetectGPT by 41.6 F1. In out-of-domain scenarios, Ghostbuster maintained an average F1 score of 97.0 under all conditions, outperforming DetectGPT by 39.6 F1 and GPTZero by 7.5 F1. In comparison, our RoBERTa baseline achieved 98.1 F1 in in-domain evaluations across all datasets, but showed inconsistent generalization performance. Ghostbuster outperformed RoBERTa across all domains, only slightly lagging behind in creative writing out-of-domain scenarios, but overall performing better than RoBERTa in out-of-domain contexts (with a 13.8 F1 gap).

To ensure Ghostbuster's robustness against various ways users might prompt the model, such as requesting different writing styles or reading levels, researchers evaluated Ghostbuster's performance across multiple prompt variations. Ghostbuster outperformed all other tested methods on these prompt variations, achieving an F1 score of 99.5. To test cross-domain generalization, researchers evaluated Ghostbuster's performance on Claude-generated text, where Ghostbuster again surpassed all other tested methods with an F1 score of 92.2.

AI-generated text detectors were deceived by slightly edited texts. Researchers examined Ghostbuster's robustness against edits such as swapping sentences or paragraphs, rearranging characters, or replacing words with synonyms. Most changes at the sentence or paragraph level did not significantly affect performance, although performance steadily declined if the text was repeatedly rewritten, used commercial detection evaders (like Undetectable AI), or underwent extensive word or character-level changes. Performance was optimal on longer documents.

Since AI-generated text detectors may misclassify non-native English writing as AI-generated, researchers evaluated Ghostbuster's performance on non-native English texts. All tested models achieved over 95% accuracy on two of the three test datasets, but performed worse on the third set of shorter essays. However, document length appears to be the primary factor, as Ghostbuster's performance on these documents was nearly as good as its performance on out-of-domain documents of similar length (74.7 F1), with the latter ranging between 75.6 and 93.1 F1.

Users who wish to apply Ghostbuster to potentially prohibited uses of text generation should note that errors are more likely to occur with shorter texts, domains far removed from Ghostbuster's training scope (e.g., different English variants), non-native English texts, manually edited model-generated content, or texts modified by prompting AI models based on human creations. To avoid perpetuating algorithmic harm, we strongly advise against automatically penalizing alleged text generation uses without human supervision. Instead, we recommend cautious human-AI collaborative use of Ghostbuster when classifying someone's writing as AI-generated might cause them harm. Ghostbuster can also play a role in a range of lower-risk applications, including filtering AI-generated text from language model training data and verifying whether online information sources are AI-generated.

Ghostbuster is an advanced AI-generated text detection model that achieves an F1 performance of 99.0 in tested domains, marking significant progress compared to existing models. It performs well across different fields, prompts, and models, making it particularly suitable for identifying texts from black-box or unknown models since it doesn't require access to the specific model's probability distribution that generated the document.

Ghostbuster's future directions include providing explanations for model decisions and improving robustness against attacks attempting to deceive the detector. AI-generated text detection methods can also be used alongside alternative approaches like watermarking. Researchers also hope Ghostbuster can play a role in various applications such as filtering language model training data or flagging AI-generated content online.

Tool URL: https://ghostbuster.app/

Paper URL: https://arxiv.org/abs/2305.15047

GitHub project URL: https://github.com/vivek3141/ghostbuster

Try guessing if the text is AI-generated here: ghostbuster.app/experiment