How to Use AI in Scientific Research

baoshi.rao

Over the past decade, artificial intelligence has permeated every field of science. Machine learning models have been used to predict protein structures, estimate deforestation rates in the Amazon rainforest, and even classify distant galaxies that may harbor exoplanets. However, while AI can accelerate scientific discovery, it can also mislead scientists. Similar to how chatbots sometimes 'hallucinate' or fabricate information, machine learning models can occasionally produce misleading or outright incorrect results.

Researchers at the University of California, Berkeley, proposed a new statistical technique in an online-published Science paper for safely using machine learning model predictions to verify scientific hypotheses. This technique, called 'Prediction-Powered Inference' (PPI), leverages a small amount of real-world data to correct the outputs of large, general-purpose models—such as AlphaFold for protein structure prediction—within the context of specific scientific questions.

Image description: AI-generated image, licensed from Midjourney

These models are designed for generality—they can answer many questions, but we don’t know which questions they handle well and which they don’t. If you use them blindly, unaware of which scenario you’re in, you might get incorrect answers. PPI allows you to use these models while also correcting for errors without needing to understand their nature.

In scientific experiments, researchers typically seek not just a single answer but a range of possible answers. This is achieved by calculating 'confidence intervals,' which, in the simplest cases, can be derived by repeating experiments and observing variations in results. However, in most scientific research, confidence intervals usually refer to summarized or aggregated statistics rather than individual data points. Unfortunately, machine learning systems focus on individual data points and thus cannot provide the uncertainty assessments scientists care about. For example, AlphaFold predicts the structure of a single protein but does not offer a confidence measure for that structure or a way to derive confidence intervals related to general protein properties.

Scientists might attempt to use AlphaFold’s predictions as data to compute classical confidence intervals, overlooking the fact that these predictions are not actual data. The problem with this approach is that machine learning systems carry many potential biases, which can skew results. These biases partly stem from their training data, which often consists of existing scientific studies that may not align with the focus of the current research.

PPI technology enables scientists to integrate predictions from models such as AlphaFold without making any assumptions about how the models are built or trained. To achieve this, PPI requires a small set of unbiased data unrelated to the specific hypothesis under investigation, along with corresponding machine learning predictions for that data. By combining these two sources of evidence, PPI can form valid confidence intervals.

The research team applied PPI technology to algorithms that use satellite images to identify deforestation areas in the Amazon rainforest. While these models are generally accurate when tested separately in different regions of the forest, combining these assessments to estimate deforestation across the entire Amazon rainforest results in highly skewed confidence intervals. This may be due to the models' difficulty in recognizing certain newer deforestation patterns. Using PPI, the team was able to correct biases in the confidence intervals with a small amount of manually labeled deforestation data.

The team also demonstrated how this technology can be applied to various other research fields, including protein folding, galaxy classification, gene expression levels, plankton counting, and the relationship between income and private health insurance. This approach is versatile and serves as an essential component of modern data-intensive, model-intensive, and collaborative science.