Ground Truth for AI-Enabled Proteomics

A/Prof Andrew Webb, Ph.D May 22, 2026 11:35:18 AM

Alix Battison, Postdoctoral Researcher at Cold Spring Harbor Laboratory

How Alix Battison uses AI without losing scientific trust

Alix Battison has thousands of metabolite features sitting in a MetaboScape export, and she has been working through them by hand. Sometimes for days. Deciding which compound IDs are confident, which isoforms are plausible, which lipid names actually mean what they appear to mean. She is the first to admit it is painful. In five years, she said, she will probably be kicking herself for doing it manually.

She is not, however, going to hand it to an AI just yet.

That tension, between the mechanical labour AI could absorb and the scientific judgment it would be reckless to outsource, is the shape of her workflow right now. She uses Claude Code to write the analysis pipeline. She uses Mass Dynamics to check whether the pipeline got the answer right.

"I've actually used the results from Mass Dynamics like ground truth, to see if it's working or not."

That sentence underpins this article. Speed comes from AI. Trust comes from somewhere else.

What Alix is actually working on

Alix is the first author on a recent Scientific Reports paper comparing two of the dominant proximity-labeling chemistries, APEX2 and TurboID, across the cytosol, nucleus, and membrane in HEK293 cells. Proximity labeling lets cell biologists map who is near whom inside a living cell, including transient or spatially restricted interactions that no immunoprecipitation would survive.

The paper's headline is uncomfortable in a useful way: APEX2 and TurboID don't see the same cell. Both enzymes successfully enriched the intended compartments, but TurboID leaned toward broader membrane-associated profiles while APEX2 came back with more metabolic-pathway hits. Switching from trypsin to GluC closed some of the gap but did not erase it. The biology you see depends on the chemistry you chose, the protease you ran, and the search settings you trusted.

That is the realistic ceiling of measurement-driven biology. Tools shape what you find. The job of an analytical platform is to make those choices visible enough that another scientist (a reviewer, a collaborator three time zones away, the version of yourself going back to the data eighteen months later) can inspect them.

The bottleneck after the mass spec

For Alix's proteomics work, the post-acquisition curation is mostly familiar terrain: pull the contaminants, pull the reverses, run the QC, run differential expression, look at the PCA, start asking questions. Mass Dynamics covers most of that, and the QC and differential expression modules are the two she uses most.

Metabolomics is where the floor falls out.

She runs raw data through MetaboScape, gets a giant table back, and works through compound identifications by hand because she does not yet trust an AI to make those judgement calls for her. Then she takes whatever survives that filter and sends it through pathway tools: KEGG, HMDB, WikiPathways, METLIN. Most of her samples come from livers and tumors. Most of the reference metabolomes were built from plasma. The mismatch shows up in the results. So does another problem:

"I'm getting a lot of enrichment, but it's really enrichment from noise."

Highly abundant metabolites (NADH, NADP) keep hitting many pathways at once, lighting up false positives across the board. Lipids are worse: identifiers don't agree, naming conventions imply different levels of structural confidence, and it isn't always clear which database to trust for which tissue. The data exists. The biology is interesting. The path between them runs through too many disconnected systems.

Where Alix lets the AI think, and where she doesn't

Alix has been building a more unified analysis pipeline in R and Bioconductor, with Claude Code writing the scaffolding. The goal is the obvious one: take a clean, curated dataset and run PCA, differential expression, and pathway enrichment without bouncing between four web tools and two scripts.

She has rules about how much rope she gives the model. For routine computational steps (write the PCA, write the differential expression), the AI can drive. For biological interpretation, she handholds it. She names the package. She pins the method. She checks each output. By her own description, once the work crosses into biological interpretation, she is mostly using the AI to save herself the labour of typing the code.

Which is exactly why she keeps coming back to Mass Dynamics.

Mass Dynamics as ground truth

When her Claude-written PCA comes out, she opens Mass Dynamics next to it. If the two views agree, she has more confidence in the AI-written workflow. If they disagree, she has a problem worth investigating.

That is a small thing and a large thing at the same time. The small version is a sanity check. The large version is the question of what gets to count as a publishable analysis.

Alix said outright that she would rather publish a PCA from Mass Dynamics than one from Claude Code. Not because the AI-written code is necessarily wrong, but because she does not yet know what reviewer pushback looks like for AI-generated analysis pipelines, and the platform-generated figure carries provenance she can defend. She also expects reviewers to start trying to rerun GitHub repos. AI-generated code that doesn't survive that rerun will be a bigger problem than no code at all.

This is the durable role for Mass Dynamics in an AI-saturated workflow: the place a scientist can point to when someone asks "where did this come from?" and have a real answer. AI helps her produce. Mass Dynamics helps her defend.

A second thing happens in that same place. Many of the biologists Alix works with (the cancer and neuroscience people, the people who are not proteomics specialists by training) arrive at proteomics tables uneasily. They want to read them like RNA-seq tables. They are confused by the zeros. They need somewhere structured to land. A platform that ingests the data, validates it, presents the QC, and frames the analyses in a defensible way is doing a different and more important job than another script. It is letting non-specialists join the conversation without having to become specialists first.

Toward true multiomics

The strongest reaction Alix had in the conversation was to entity mapping, the in-development capability where Mass Dynamics links proteins, peptides, and (soon) transcripts and metabolites as connected entities across datasets, not just rows in separate tables. She called it true multiomics. Her reasoning was sharp:

"Every multiomics paper I see, we actually just have a bunch of different graphs that say the same thing, and that's how it's multiomics. It's more self-referential than it is truly helping the biology."

A platform that maps entities at the protein, peptide, transcript, and metabolite level, and that lets a hit in one modality light up its neighbours in another, is the precondition for the kind of integrated analysis people gesture at when they say multiomics. It is also work a local Claude Code script cannot do alone. A script can match identifiers in two files. It cannot sustain an indexed, queryable, growing graph of biological entities across thousands of datasets and many users.

When data is mapped that way, every dataset becomes more than a one-off. A signal from a five-year-old experiment can rejoin the conversation because it shares a protein, a metabolite, or a phenotype with the new one. That changes what AI is for: not generating biology from nothing, but helping a scientist navigate evidence the organisation has already accumulated. It is also the part of the work that is hardest for any individual scientist to assemble alone with their own tools.

What this story is about

It is tempting to read Alix as either an AI-forward scientist or an AI-cautious one. She is both, and that is the point. She uses Claude Code because the manual labour is real and the productivity gains are real. She uses Mass Dynamics because publishable science needs a defensible chain of custody for every figure, and because the colleagues she analyses for need somewhere structured to land when the data isn't theirs by training.

AI for the labour. Mass Dynamics for the ground truth.

Reference

Battison AS, Balsbaugh JL, Borniger JC. "APEX2 and TurboID define unique subcellular proteomes." Sci Rep 15:43547 (2025). DOI: 10.1038/s41598-025-27545-1.

--------------------------------------

This Customer Story post was produced by Assoc. Prof. Andrew Webb using a combination of original thoughts, alongside an interview with Alix Battison. Final compilation was completed with assistance from Claude. Any errors or omissions are unintentional, and the content is provided for informational purposes only. The views, thoughts, and opinions expressed in this text belong solely to the author, and not necessarily to the author's employers, organization, committees or other group or individual.